Compare commits

...

26 Commits

Author SHA1 Message Date
amlrelsa-ms
34a67c1f8b update samples from Release-55 as a part of SDK release 2020-06-08 22:28:25 +00:00
Harneet Virk
34898828be Merge pull request #992 from Azure/release_update/Release-54
update samples from Release-54 as a part of  SDK release
2020-06-02 14:42:02 -07:00
vizhur
a7c3a0fdb8 update samples from Release-54 as a part of SDK release 2020-06-02 21:34:10 +00:00
Harneet Virk
6d11cdfa0a Merge pull request #984 from Azure/release_update/Release-53
update samples from Release-53 as a part of  SDK release
2020-05-26 19:59:58 -07:00
vizhur
11e8ed2bab update samples from Release-53 as a part of SDK release 2020-05-27 02:45:07 +00:00
Harneet Virk
12c06a4168 Merge pull request #978 from ahcan76/patch-1
Fix image paths in tutorial-1st-experiment-sdk-train.ipynb
2020-05-18 12:58:21 -07:00
ahcan76
1f75dc9725 Update tutorial-1st-experiment-sdk-train.ipynb
Fix the image path
2020-05-18 22:40:54 +03:00
Harneet Virk
1a1a42d525 Merge pull request #977 from Azure/release_update/Release-52
update samples from Release-52 as a part of  SDK release
2020-05-18 12:22:48 -07:00
vizhur
879a272a8d update samples from Release-52 as a part of SDK release 2020-05-18 19:21:05 +00:00
Harneet Virk
bc65bde097 Merge pull request #971 from Azure/release_update/Release-51
update samples from Release-51 as a part of  SDK release
2020-05-13 22:17:45 -07:00
vizhur
690bdfbdbe update samples from Release-51 as a part of SDK release 2020-05-14 05:03:47 +00:00
Harneet Virk
3c02bd8782 Merge pull request #967 from Azure/release_update/Release-50
update samples from Release-50 as a part of  SDK release
2020-05-12 19:57:40 -07:00
vizhur
5c14610a1c update samples from Release-50 as a part of SDK release 2020-05-13 02:45:40 +00:00
Harneet Virk
4e3afae6fb Merge pull request #965 from Azure/release_update/Release-49
update samples from Release-49 as a part of  SDK release
2020-05-11 19:25:28 -07:00
vizhur
a2144aa083 update samples from Release-49 as a part of SDK release 2020-05-12 02:24:34 +00:00
Harneet Virk
0e6334178f Merge pull request #963 from Azure/release_update/Release-46
update samples from Release-46 as a part of  SDK release
2020-05-11 14:49:34 -07:00
vizhur
4ec9178d22 update samples from Release-46 as a part of SDK release 2020-05-11 21:48:31 +00:00
Harneet Virk
2aa7c53b0c Merge pull request #962 from Azure/release_update_stablev2/Release-11
update samples from Release-11 as a part of 1.5.0 SDK stable release
2020-05-11 12:42:32 -07:00
vizhur
553fa43e17 update samples from Release-11 as a part of 1.5.0 SDK stable release 2020-05-11 18:59:22 +00:00
Harneet Virk
e98131729e Merge pull request #949 from Azure/release_update_stablev2/Release-8
update samples from Release-8 as a part of 1.4.0 SDK stable release
2020-04-27 11:00:37 -07:00
vizhur
fd2b09e2c2 update samples from Release-8 as a part of 1.4.0 SDK stable release 2020-04-27 17:44:41 +00:00
Harneet Virk
7970209069 Merge pull request #930 from Azure/release_update/Release-44
update samples from Release-44 as a part of  SDK release
2020-04-17 12:46:29 -07:00
vizhur
24f8651bb5 update samples from Release-44 as a part of SDK release 2020-04-17 19:45:37 +00:00
Harneet Virk
b881f78e46 Merge pull request #918 from Azure/release_update_stablev2/Release-6
update samples from Release-6 as a part of 1.3.0 SDK stable release
2020-04-13 09:23:38 -07:00
vizhur
057e22b253 update samples from Release-6 as a part of 1.3.0 SDK stable release 2020-04-13 16:22:23 +00:00
Harneet Virk
c520bd1d41 Merge pull request #884 from Azure/release_update/Release-43
update samples from Release-43 as a part of  SDK release
2020-03-23 16:49:27 -07:00
196 changed files with 16705 additions and 5820 deletions

View File

@@ -40,6 +40,7 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions - [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks - [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
- [Monitor Models](./how-to-use-azureml/monitor-models) - Examples showing how to enable model monitoring services such as DataDrift - [Monitor Models](./how-to-use-azureml/monitor-models) - Examples showing how to enable model monitoring services such as DataDrift
- [Reinforcement Learning](./how-to-use-azureml/reinforcement-learning) - Examples showing how to train reinforcement learning agents
--- ---
## Documentation ## Documentation

View File

@@ -103,7 +103,7 @@
"source": [ "source": [
"import azureml.core\n", "import azureml.core\n",
"\n", "\n",
"print(\"This notebook was created using version 1.2.0 of the Azure ML SDK\")\n", "print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
] ]
}, },

View File

@@ -1,8 +1,8 @@
# Table of Contents # Table of Contents
1. [Automated ML Introduction](#introduction) 1. [Automated ML Introduction](#introduction)
1. [Setup using Azure Notebooks](#jupyter) 1. [Setup using Compute Instances](#jupyter)
1. [Setup using Azure Databricks](#databricks)
1. [Setup using a Local Conda environment](#localconda) 1. [Setup using a Local Conda environment](#localconda)
1. [Setup using Azure Databricks](#databricks)
1. [Automated ML SDK Sample Notebooks](#samples) 1. [Automated ML SDK Sample Notebooks](#samples)
1. [Documentation](#documentation) 1. [Documentation](#documentation)
1. [Running using python command](#pythoncommand) 1. [Running using python command](#pythoncommand)
@@ -21,13 +21,13 @@ Below are the three execution environments supported by automated ML.
<a name="jupyter"></a> <a name="jupyter"></a>
## Setup using Notebook VMs - Jupyter based notebooks from a Azure VM ## Setup using Compute Instances - Jupyter based notebooks from a Azure Virtual Machine
1. Open the [ML Azure portal](https://ml.azure.com) 1. Open the [ML Azure portal](https://ml.azure.com)
1. Select Compute 1. Select Compute
1. Select Notebook VMs 1. Select Compute Instances
1. Click New 1. Click New
1. Type a name for the Vm and select a VM type 1. Type a Compute Name, select a Virtual Machine type and select a Virtual Machine size
1. Click Create 1. Click Create
<a name="localconda"></a> <a name="localconda"></a>
@@ -117,7 +117,7 @@ jupyter notebook
- Simple example of using automated ML for regression - Simple example of using automated ML for regression
- Uses azure compute for training - Uses azure compute for training
- [auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb](regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb) - [auto-ml-regression-explanation-featurization.ipynb](regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb)
- Dataset: Hardware Performance Dataset - Dataset: Hardware Performance Dataset
- Shows featurization and excplanation - Shows featurization and excplanation
- Uses azure compute for training - Uses azure compute for training
@@ -144,7 +144,7 @@ jupyter notebook
- Dataset: forecasting for a bike-sharing - Dataset: forecasting for a bike-sharing
- Example of training an automated ML forecasting model on multiple time-series - Example of training an automated ML forecasting model on multiple time-series
- [auto-ml-forecasting-function.ipynb](forecasting-high-frequency/auto-ml-forecasting-function.ipynb) - [auto-ml-forecasting-function.ipynb](forecasting-forecast-function/auto-ml-forecasting-function.ipynb)
- Example of training an automated ML forecasting model on multiple time-series - Example of training an automated ML forecasting model on multiple time-series
- [auto-ml-forecasting-beer-remote.ipynb](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb) - [auto-ml-forecasting-beer-remote.ipynb](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb)
@@ -152,7 +152,7 @@ jupyter notebook
- Beer Production Forecasting - Beer Production Forecasting
- [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb) - [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb)
- Continous retraining using Pipelines and Time-Series TabularDataset - Continuous retraining using Pipelines and Time-Series TabularDataset
- [auto-ml-classification-text-dnn.ipynb](classification-text-dnn/auto-ml-classification-text-dnn.ipynb) - [auto-ml-classification-text-dnn.ipynb](classification-text-dnn/auto-ml-classification-text-dnn.ipynb)
- Classification with text data using deep learning in AutoML - Classification with text data using deep learning in AutoML

View File

@@ -4,35 +4,28 @@ dependencies:
# Currently Azure ML only supports 3.5.2 and later. # Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1 - pip<=19.3.1
- python>=3.5.2,<3.6.8 - python>=3.5.2,<3.6.8
- wheel==0.30.0
- nb_conda - nb_conda
- matplotlib==2.1.0 - matplotlib==2.1.0
- numpy>=1.16.0,<=1.16.2 - numpy>=1.16.0,<=1.16.2
- cython - cython
- urllib3<1.24 - urllib3<1.24
- scipy>=1.0.0,<=1.1.0 - scipy==1.4.1
- scikit-learn>=0.19.0,<=0.20.3 - scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<=0.23.4 - pandas>=0.22.0,<=0.23.4
- py-xgboost<=0.90 - py-xgboost<=0.90
- fbprophet==0.5 - conda-forge::fbprophet==0.5
- pytorch=1.1.0 - pytorch::pytorch=1.4.0
- cudatoolkit=9.0 - cudatoolkit=10.1.243
- pip: - pip:
# Required packages for AzureML execution, history, and data preparation. # Required packages for AzureML execution, history, and data preparation.
- azureml-defaults - azureml-defaults
- azureml-dataprep[pandas]
- azureml-train-automl - azureml-train-automl
- azureml-train - azureml-train
- azureml-widgets - azureml-widgets
- azureml-pipeline - azureml-pipeline
- azureml-contrib-interpret
- pytorch-transformers==1.0.0 - pytorch-transformers==1.0.0
- spacy==2.1.8 - spacy==2.1.8
- onnxruntime==1.0.0 - pyarrow==0.17.0
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz - https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
channels:
- anaconda
- conda-forge
- pytorch

View File

@@ -5,35 +5,27 @@ dependencies:
- pip<=19.3.1 - pip<=19.3.1
- nomkl - nomkl
- python>=3.5.2,<3.6.8 - python>=3.5.2,<3.6.8
- wheel==0.30.0
- nb_conda - nb_conda
- matplotlib==2.1.0 - matplotlib==2.1.0
- numpy>=1.16.0,<=1.16.2 - numpy>=1.16.0,<=1.16.2
- cython - cython
- urllib3<1.24 - urllib3<1.24
- scipy>=1.0.0,<=1.1.0 - scipy==1.4.1
- scikit-learn>=0.19.0,<=0.20.3 - scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<0.23.0 - pandas>=0.22.0,<=0.23.4
- py-xgboost<=0.80 - py-xgboost<=0.90
- fbprophet==0.5 - conda-forge::fbprophet==0.5
- pytorch=1.1.0 - pytorch::pytorch=1.4.0
- cudatoolkit=9.0 - cudatoolkit=9.0
- pip: - pip:
# Required packages for AzureML execution, history, and data preparation. # Required packages for AzureML execution, history, and data preparation.
- azureml-defaults - azureml-defaults
- azureml-dataprep[pandas]
- azureml-train-automl - azureml-train-automl
- azureml-train - azureml-train
- azureml-widgets - azureml-widgets
- azureml-pipeline - azureml-pipeline
- azureml-contrib-interpret
- pytorch-transformers==1.0.0 - pytorch-transformers==1.0.0
- spacy==2.1.8 - spacy==2.1.8
- onnxruntime==1.0.0 - pyarrow==0.17.0
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz - https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
channels:
- anaconda
- conda-forge
- pytorch

View File

@@ -41,7 +41,7 @@
"\n", "\n",
"In this example we use the UCI Bank Marketing dataset to showcase how you can use AutoML for a classification problem and deploy it to an Azure Container Instance (ACI). The classification goal is to predict if the client will subscribe to a term deposit with the bank.\n", "In this example we use the UCI Bank Marketing dataset to showcase how you can use AutoML for a classification problem and deploy it to an Azure Container Instance (ACI). The classification goal is to predict if the client will subscribe to a term deposit with the bank.\n",
"\n", "\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n", "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n", "\n",
"Please find the ONNX related documentations [here](https://github.com/onnx/onnx).\n", "Please find the ONNX related documentations [here](https://github.com/onnx/onnx).\n",
"\n", "\n",
@@ -92,6 +92,23 @@
"from azureml.explain.model._internal.explanation_client import ExplanationClient" "from azureml.explain.model._internal.explanation_client import ExplanationClient"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -132,7 +149,6 @@
"experiment=Experiment(ws, experiment_name)\n", "experiment=Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -160,35 +176,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your CPU cluster\n",
"amlcompute_cluster_name = \"cpu-cluster-4\"\n", "cpu_cluster_name = \"cpu-cluster-4\"\n",
"\n", "\n",
"found = False\n", "# Verify that cluster does not exist already\n",
"# Check if this compute target already exists in the workspace.\n", "try:\n",
"cts = ws.compute_targets\n", " compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " print('Found existing cluster, use it.')\n",
" found = True\n", "except ComputeTargetException:\n",
" print('Found existing compute target.')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_target = cts[amlcompute_cluster_name]\n", " max_nodes=6)\n",
" \n", " compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n", "\n",
" # Create the cluster.\n", "compute_target.wait_for_completion(show_output=True)"
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
"# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -394,8 +397,6 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"#from azureml.train.automl.run import AutoMLRun\n", "#from azureml.train.automl.run import AutoMLRun\n",
"#experiment_name = 'automl-classification-bmarketing'\n",
"#experiment = Experiment(ws, experiment_name)\n",
"#remote_run = AutoMLRun(experiment=experiment, run_id='<run_ID_goes_here')\n", "#remote_run = AutoMLRun(experiment=experiment, run_id='<run_ID_goes_here')\n",
"#remote_run" "#remote_run"
] ]
@@ -642,7 +643,7 @@
"\n", "\n",
"### Retrieve the Best Model\n", "### Retrieve the Best Model\n",
"\n", "\n",
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
] ]
}, },
{ {

View File

@@ -6,5 +6,3 @@ dependencies:
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- onnxruntime==1.0.0 - onnxruntime==1.0.0
- azureml-explain-model
- azureml-contrib-interpret

View File

@@ -42,7 +42,7 @@
"\n", "\n",
"This notebook is using remote compute to train the model.\n", "This notebook is using remote compute to train the model.\n",
"\n", "\n",
"If you are using an Azure Machine Learning [Notebook VM](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup), you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n", "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n", "\n",
"In this notebook you will learn how to:\n", "In this notebook you will learn how to:\n",
"1. Create an experiment using an existing workspace.\n", "1. Create an experiment using an existing workspace.\n",
@@ -80,6 +80,23 @@
"from azureml.train.automl import AutoMLConfig" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -94,7 +111,6 @@
"experiment=Experiment(ws, experiment_name)\n", "experiment=Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -306,7 +322,7 @@
"\n", "\n",
"### Retrieve the Best Model\n", "### Retrieve the Best Model\n",
"\n", "\n",
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
] ]
}, },
{ {

View File

@@ -5,4 +5,3 @@ dependencies:
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- azureml-explain-model

View File

@@ -47,8 +47,8 @@
"Notebook synopsis:\n", "Notebook synopsis:\n",
"1. Creating an Experiment in an existing Workspace\n", "1. Creating an Experiment in an existing Workspace\n",
"2. Configuration and remote run of AutoML for a text dataset (20 Newsgroups dataset from scikit-learn) for classification\n", "2. Configuration and remote run of AutoML for a text dataset (20 Newsgroups dataset from scikit-learn) for classification\n",
"3. Evaluating the final model on a test set\n", "3. Registering the best model for future use\n",
"4. Deploying the model on ACI" "4. Evaluating the final model on a test set"
] ]
}, },
{ {
@@ -84,6 +84,23 @@
"from sklearn.datasets import fetch_20newsgroups" "from sklearn.datasets import fetch_20newsgroups"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -105,7 +122,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n", "output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -132,34 +148,25 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"dnntext-cluster\"\n", "amlcompute_cluster_name = \"dnntext-cluster\"\n",
"\n", "\n",
"found = False\n", "# Verify that cluster does not exist already\n",
"# Check if this compute target already exists in the workspace.\n", "try:\n",
"cts = ws.compute_targets\n", " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " print('Found existing cluster, use it.')\n",
" found = True\n", "except ComputeTargetException:\n",
" print('Found existing compute target.')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\", # CPU for BiLSTM, such as \"STANDARD_D2_V2\" \n",
" compute_target = cts[amlcompute_cluster_name]\n", " # To use BERT (this is recommended for best performance), select a GPU such as \"STANDARD_NC6\" \n",
" # or similar GPU option\n",
" # available in your workspace\n",
" max_nodes = 1)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n", "\n",
"if not found:\n", "compute_target.wait_for_completion(show_output=True)"
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\", # CPU for BiLSTM, such as \"STANDARD_D2_V2\" \n",
" # To use BERT (this is recommended for best performance), select a GPU such as \"STANDARD_NC6\" \n",
" # or similar GPU option\n",
" # available in your workspace\n",
" max_nodes = 1)\n",
"\n",
" # Create the cluster\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -187,8 +194,8 @@
" '''\n", " '''\n",
" remove = ('headers', 'footers', 'quotes')\n", " remove = ('headers', 'footers', 'quotes')\n",
" categories = [\n", " categories = [\n",
" 'alt.atheism',\n", " 'rec.sport.baseball',\n",
" 'talk.religion.misc',\n", " 'rec.sport.hockey',\n",
" 'comp.graphics',\n", " 'comp.graphics',\n",
" 'sci.space',\n", " 'sci.space',\n",
" ]\n", " ]\n",
@@ -338,7 +345,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"You can test the model locally to get a feel of the input/output. This step may require additional package installations such as pytorch." "You can test the model locally to get a feel of the input/output. When the model contains BERT, this step will require pytorch and pytorch-transformers installed in your local environment. The exact versions of these packages can be found in the **automl_env.yml** file located in the local copy of your MachineLearningNotebooks folder here:\n",
"MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/automl_env.yml"
] ]
}, },
{ {
@@ -373,8 +381,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Deploying the model\n", "### Registering the best model\n",
"We now use the best fitted model from the AutoML Run to make predictions on the test set. " "We now register the best fitted model from the AutoML Run for use in future deployments. "
] ]
}, },
{ {
@@ -474,7 +482,7 @@
"source": [ "source": [
"script_folder = os.path.join(os.getcwd(), 'inference')\n", "script_folder = os.path.join(os.getcwd(), 'inference')\n",
"os.makedirs(script_folder, exist_ok=True)\n", "os.makedirs(script_folder, exist_ok=True)\n",
"shutil.copy2('infer.py', script_folder)" "shutil.copy('infer.py', script_folder)"
] ]
}, },
{ {
@@ -483,8 +491,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run, test_dataset,\n", "test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run,\n",
" target_column_name, model_name)" " train_dataset, test_dataset, target_column_name, model_name)"
] ]
}, },
{ {

View File

@@ -5,7 +5,6 @@ dependencies:
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- azurmel-train
- https://download.pytorch.org/whl/cpu/torch-1.1.0-cp35-cp35m-win_amd64.whl - https://download.pytorch.org/whl/cpu/torch-1.1.0-cp35-cp35m-win_amd64.whl
- sentencepiece==0.1.82 - sentencepiece==0.1.82
- pytorch-transformers==1.0 - pytorch-transformers==1.0

View File

@@ -6,7 +6,7 @@ from azureml.core.run import Run
def run_inference(test_experiment, compute_target, script_folder, train_run, def run_inference(test_experiment, compute_target, script_folder, train_run,
test_dataset, target_column_name, model_name): train_dataset, test_dataset, target_column_name, model_name):
train_run.download_file('outputs/conda_env_v_1_0_0.yml', train_run.download_file('outputs/conda_env_v_1_0_0.yml',
'inference/condafile.yml') 'inference/condafile.yml')
@@ -22,7 +22,10 @@ def run_inference(test_experiment, compute_target, script_folder, train_run,
'--target_column_name': target_column_name, '--target_column_name': target_column_name,
'--model_name': model_name '--model_name': model_name
}, },
inputs=[test_dataset.as_named_input('test_data')], inputs=[
train_dataset.as_named_input('train_data'),
test_dataset.as_named_input('test_data')
],
compute_target=compute_target, compute_target=compute_target,
environment_definition=inference_env) environment_definition=inference_env)

View File

@@ -1,9 +1,11 @@
import numpy as np
import argparse import argparse
from azureml.core import Run
import numpy as np
from sklearn.externals import joblib from sklearn.externals import joblib
from azureml.automl.core._vendor.automl.client.core.common import metrics
from automl.client.core.common import constants from azureml.automl.runtime.shared.score import scoring, constants
from azureml.core import Run
from azureml.core.model import Model from azureml.core.model import Model
@@ -30,22 +32,26 @@ model = joblib.load(model_path)
run = Run.get_context() run = Run.get_context()
# get input dataset by name # get input dataset by name
test_dataset = run.input_datasets['test_data'] test_dataset = run.input_datasets['test_data']
train_dataset = run.input_datasets['train_data']
X_test_df = test_dataset.drop_columns(columns=[target_column_name]) \ X_test_df = test_dataset.drop_columns(columns=[target_column_name]) \
.to_pandas_dataframe() .to_pandas_dataframe()
y_test_df = test_dataset.with_timestamp_columns(None) \ y_test_df = test_dataset.with_timestamp_columns(None) \
.keep_columns(columns=[target_column_name]) \ .keep_columns(columns=[target_column_name]) \
.to_pandas_dataframe() .to_pandas_dataframe()
y_train_df = test_dataset.with_timestamp_columns(None) \
.keep_columns(columns=[target_column_name]) \
.to_pandas_dataframe()
predicted = model.predict_proba(X_test_df) predicted = model.predict_proba(X_test_df)
# use automl metrics module # Use the AutoML scoring module
scores = metrics.compute_metrics_classification( class_labels = np.unique(np.concatenate((y_train_df.values, y_test_df.values)))
np.array(predicted), train_labels = model.classes_
np.array(y_test_df), classification_metrics = list(constants.CLASSIFICATION_SCALAR_SET)
class_labels=model.classes_, scores = scoring.score_classification(y_test_df.values, predicted,
metrics=list(constants.Metric.SCALAR_CLASSIFICATION_SET) classification_metrics,
) class_labels, train_labels)
print("scores:") print("scores:")
print(scores) print(scores)

View File

@@ -20,7 +20,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Automated Machine Learning \n", "# Automated Machine Learning \n",
"**Continous retraining using Pipelines and Time-Series TabularDataset**\n", "**Continuous retraining using Pipelines and Time-Series TabularDataset**\n",
"## Contents\n", "## Contents\n",
"1. [Introduction](#Introduction)\n", "1. [Introduction](#Introduction)\n",
"2. [Setup](#Setup)\n", "2. [Setup](#Setup)\n",
@@ -75,6 +75,23 @@
"from azureml.train.automl import AutoMLConfig" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -112,7 +129,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -143,33 +159,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute, ComputeTarget\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your CPU cluster\n",
"amlcompute_cluster_name = \"cpu-cluster-42\"\n", "amlcompute_cluster_name = \"cont-cluster\"\n",
"\n", "\n",
"found = False\n", "# Verify that cluster does not exist already\n",
"# Check if this compute target already exists in the workspace.\n", "try:\n",
"cts = ws.compute_targets\n", " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " print('Found existing cluster, use it.')\n",
" found = True\n", "except ComputeTargetException:\n",
" print('Found existing compute target.')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_target = cts[amlcompute_cluster_name]\n", " max_nodes=4)\n",
" \n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 4)\n",
"\n", "\n",
" # Create the cluster.\n", "compute_target.wait_for_completion(show_output=True)"
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = 0, timeout_in_minutes = 10)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {

View File

@@ -101,6 +101,23 @@
"from azureml.train.estimator import Estimator" "from azureml.train.estimator import Estimator"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": { "metadata": {
@@ -128,7 +145,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -163,7 +179,7 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your CPU cluster\n", "# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n", "cpu_cluster_name = \"beer-cluster\"\n",
"\n", "\n",
"# Verify that cluster does not exist already\n", "# Verify that cluster does not exist already\n",
"try:\n", "try:\n",
@@ -218,19 +234,18 @@
"import pandas as pd\n", "import pandas as pd\n",
"from pandas import DataFrame\n", "from pandas import DataFrame\n",
"from pandas import Grouper\n", "from pandas import Grouper\n",
"from matplotlib import pyplot\n",
"from pandas import concat\n", "from pandas import concat\n",
"from matplotlib import pyplot\n",
"from pandas.plotting import register_matplotlib_converters\n", "from pandas.plotting import register_matplotlib_converters\n",
"\n",
"register_matplotlib_converters()\n", "register_matplotlib_converters()\n",
"plt.tight_layout()\n",
"plt.figure(figsize=(20, 10))\n", "plt.figure(figsize=(20, 10))\n",
"plt.tight_layout()\n",
"\n", "\n",
"plt.subplot(2, 1, 1)\n", "plt.subplot(2, 1, 1)\n",
"plt.title('Beer Production By Year')\n", "plt.title('Beer Production By Year')\n",
"df = pd.read_csv(\"Beer_no_valid_split_train.csv\", parse_dates=True, index_col= 'DATE').drop(columns='grain')\n", "df = pd.read_csv(\"Beer_no_valid_split_train.csv\", parse_dates=True, index_col= 'DATE').drop(columns='grain')\n",
"test_df = pd.read_csv(\"Beer_no_valid_split_test.csv\", parse_dates=True, index_col= 'DATE').drop(columns='grain')\n", "test_df = pd.read_csv(\"Beer_no_valid_split_test.csv\", parse_dates=True, index_col= 'DATE').drop(columns='grain')\n",
"pyplot.plot(df)\n", "plt.plot(df)\n",
"\n", "\n",
"plt.subplot(2, 1, 2)\n", "plt.subplot(2, 1, 2)\n",
"plt.title('Beer Production By Month')\n", "plt.title('Beer Production By Month')\n",
@@ -239,7 +254,8 @@
"months = DataFrame(months)\n", "months = DataFrame(months)\n",
"months.columns = range(1,13)\n", "months.columns = range(1,13)\n",
"months.boxplot()\n", "months.boxplot()\n",
"pyplot.show()\n" "\n",
"plt.show()"
] ]
}, },
{ {
@@ -538,7 +554,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.compute_targets['cpu-cluster']\n", "compute_target = ws.compute_targets['beer-cluster']\n",
"test_experiment = Experiment(ws, experiment_name + \"_test\")" "test_experiment = Experiment(ws, experiment_name + \"_test\")"
] ]
}, },
@@ -556,7 +572,7 @@
"\n", "\n",
"script_folder = os.path.join(os.getcwd(), 'inference')\n", "script_folder = os.path.join(os.getcwd(), 'inference')\n",
"os.makedirs(script_folder, exist_ok=True)\n", "os.makedirs(script_folder, exist_ok=True)\n",
"shutil.copy2('infer.py', script_folder)" "shutil.copy('infer.py', script_folder)"
] ]
}, },
{ {

View File

@@ -4,6 +4,7 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy==1.16.2 - numpy==1.16.2
- pandas==0.23.4
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

View File

@@ -1,12 +1,14 @@
import pandas as pd
import numpy as np
import argparse import argparse
from azureml.core import Run
import numpy as np
import pandas as pd
from pandas.tseries.frequencies import to_offset
from sklearn.externals import joblib from sklearn.externals import joblib
from sklearn.metrics import mean_absolute_error, mean_squared_error from sklearn.metrics import mean_absolute_error, mean_squared_error
from azureml.automl.core._vendor.automl.client.core.common import metrics
from automl.client.core.common import constants from azureml.automl.runtime.shared.score import scoring, constants
from pandas.tseries.frequencies import to_offset from azureml.core import Run
def align_outputs(y_predicted, X_trans, X_test, y_test, def align_outputs(y_predicted, X_trans, X_test, y_test,
@@ -300,12 +302,11 @@ print(df_all[target_column_name])
print("predicted values:::") print("predicted values:::")
print(df_all['predicted']) print(df_all['predicted'])
# use automl metrics module # Use the AutoML scoring module
scores = metrics.compute_metrics_regression( regression_metrics = list(constants.REGRESSION_SCALAR_SET)
df_all['predicted'], y_test = np.array(df_all[target_column_name])
df_all[target_column_name], y_pred = np.array(df_all['predicted'])
list(constants.Metric.SCALAR_REGRESSION_SET), scores = scoring.score_regression(y_test, y_pred, regression_metrics)
None, None, None)
print("scores:") print("scores:")
print(scores) print(scores)

View File

@@ -74,6 +74,23 @@
"from datetime import datetime" "from datetime import datetime"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -95,7 +112,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['SKU'] = ws.sku\n", "output['SKU'] = ws.sku\n",
@@ -124,35 +140,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpu-cluster-bike\"\n", "amlcompute_cluster_name = \"bike-cluster\"\n",
"\n", "\n",
"found = False\n", "# Verify that cluster does not exist already\n",
"# Check if this compute target already exists in the workspace.\n", "try:\n",
"cts = ws.compute_targets\n", " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " print('Found existing cluster, use it.')\n",
" found = True\n", "except ComputeTargetException:\n",
" print('Found existing compute target.')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_target = cts[amlcompute_cluster_name]\n", " max_nodes=4)\n",
" \n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 4)\n",
"\n", "\n",
" # Create the cluster.\n", "compute_target.wait_for_completion(show_output=True)"
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
"# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -450,8 +453,8 @@
"\n", "\n",
"script_folder = os.path.join(os.getcwd(), 'forecast')\n", "script_folder = os.path.join(os.getcwd(), 'forecast')\n",
"os.makedirs(script_folder, exist_ok=True)\n", "os.makedirs(script_folder, exist_ok=True)\n",
"shutil.copy2('forecasting_script.py', script_folder)\n", "shutil.copy('forecasting_script.py', script_folder)\n",
"shutil.copy2('forecasting_helper.py', script_folder)" "shutil.copy('forecasting_helper.py', script_folder)"
] ]
}, },
{ {
@@ -507,17 +510,16 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.automl.core._vendor.automl.client.core.common import metrics\n", "from azureml.automl.core.shared import constants\n",
"from azureml.automl.runtime.shared.score import scoring\n",
"from sklearn.metrics import mean_absolute_error, mean_squared_error\n", "from sklearn.metrics import mean_absolute_error, mean_squared_error\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"from automl.client.core.common import constants\n",
"\n", "\n",
"# use automl metrics module\n", "# use automl metrics module\n",
"scores = metrics.compute_metrics_regression(\n", "scores = scoring.score_regression(\n",
" df_all['predicted'],\n", " y_test=df_all[target_column_name],\n",
" df_all[target_column_name],\n", " y_pred=df_all['predicted'],\n",
" list(constants.Metric.SCALAR_REGRESSION_SET),\n", " metrics=list(constants.Metric.SCALAR_REGRESSION_SET))\n",
" None, None, None)\n",
"\n", "\n",
"print(\"[Test data scores]\\n\")\n", "print(\"[Test data scores]\\n\")\n",
"for key, value in scores.items(): \n", "for key, value in scores.items(): \n",

View File

@@ -4,6 +4,7 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy==1.16.2 - numpy==1.16.2
- pandas==0.23.4
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

View File

@@ -1,6 +1,6 @@
import argparse import argparse
import azureml.train.automl import azureml.train.automl
from azureml.automl.runtime._vendor.automl.client.core.runtime import forecasting_models from azureml.automl.runtime.shared import forecasting_models
from azureml.core import Run from azureml.core import Run
from sklearn.externals import joblib from sklearn.externals import joblib
import forecasting_helper import forecasting_helper

View File

@@ -28,7 +28,6 @@
"1. [Setup](#Setup)\n", "1. [Setup](#Setup)\n",
"1. [Data and Forecasting Configurations](#Data)\n", "1. [Data and Forecasting Configurations](#Data)\n",
"1. [Train](#Train)\n", "1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"\n", "\n",
"Advanced Forecasting\n", "Advanced Forecasting\n",
"1. [Advanced Training](#advanced_training)\n", "1. [Advanced Training](#advanced_training)\n",
@@ -43,7 +42,7 @@
"\n", "\n",
"In this example we use the associated New York City energy demand dataset to showcase how you can use AutoML for a simple forecasting problem and explore the results. The goal is predict the energy demand for the next 48 hours based on historic time-series data.\n", "In this example we use the associated New York City energy demand dataset to showcase how you can use AutoML for a simple forecasting problem and explore the results. The goal is predict the energy demand for the next 48 hours based on historic time-series data.\n",
"\n", "\n",
"If you are using an Azure Machine Learning [Notebook VM](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup), you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first, if you haven't already, to establish your connection to the AzureML Workspace.\n", "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first, if you haven't already, to establish your connection to the AzureML Workspace.\n",
"\n", "\n",
"In this notebook you will learn how to:\n", "In this notebook you will learn how to:\n",
"1. Creating an Experiment using an existing Workspace\n", "1. Creating an Experiment using an existing Workspace\n",
@@ -85,6 +84,23 @@
"from datetime import datetime" "from datetime import datetime"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -109,7 +125,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -140,35 +155,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"aml-compute\"\n", "amlcompute_cluster_name = \"energy-cluster\"\n",
"\n", "\n",
"found = False\n", "# Verify that cluster does not exist already\n",
"# Check if this compute target already exists in the workspace.\n", "try:\n",
"cts = ws.compute_targets\n", " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " print('Found existing cluster, use it.')\n",
" found = True\n", "except ComputeTargetException:\n",
" print('Found existing compute target.')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" compute_target = cts[amlcompute_cluster_name]\n", " max_nodes=6)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n", "\n",
"if not found:\n", "compute_target.wait_for_completion(show_output=True)"
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_DS12_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -463,7 +465,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Forecast Function\n", "### Forecast Function\n",
"For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see notebook on [high frequency forecasting](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb)." "For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see the [forecast function notebook](../forecasting-forecast-function/auto-ml-forecasting-function.ipynb)."
] ]
}, },
{ {
@@ -505,16 +507,15 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.automl.core._vendor.automl.client.core.common import metrics\n", "from azureml.automl.core.shared import constants\n",
"from azureml.automl.runtime.shared.score import scoring\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"from automl.client.core.common import constants\n",
"\n", "\n",
"# use automl metrics module\n", "# use automl metrics module\n",
"scores = metrics.compute_metrics_regression(\n", "scores = scoring.score_regression(\n",
" df_all['predicted'],\n", " y_test=df_all[target_column_name],\n",
" df_all[target_column_name],\n", " y_pred=df_all['predicted'],\n",
" list(constants.Metric.SCALAR_REGRESSION_SET),\n", " metrics=list(constants.Metric.SCALAR_REGRESSION_SET))\n",
" None, None, None)\n",
"\n", "\n",
"print(\"[Test data scores]\\n\")\n", "print(\"[Test data scores]\\n\")\n",
"for key, value in scores.items(): \n", "for key, value in scores.items(): \n",
@@ -666,16 +667,15 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.automl.core._vendor.automl.client.core.common import metrics\n", "from azureml.automl.core.shared import constants\n",
"from azureml.automl.runtime.shared.score import scoring\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"from automl.client.core.common import constants\n",
"\n", "\n",
"# use automl metrics module\n", "# use automl metrics module\n",
"scores = metrics.compute_metrics_regression(\n", "scores = scoring.score_regression(\n",
" df_all['predicted'],\n", " y_test=df_all[target_column_name],\n",
" df_all[target_column_name],\n", " y_pred=df_all['predicted'],\n",
" list(constants.Metric.SCALAR_REGRESSION_SET),\n", " metrics=list(constants.Metric.SCALAR_REGRESSION_SET))\n",
" None, None, None)\n",
"\n", "\n",
"print(\"[Test data scores]\\n\")\n", "print(\"[Test data scores]\\n\")\n",
"for key, value in scores.items(): \n", "for key, value in scores.items(): \n",

View File

@@ -3,8 +3,7 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy==1.16.2 - numpy==1.16.2
- pandas==0.23.4
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- azureml-explain-model
- azureml-contrib-interpret

View File

@@ -35,7 +35,6 @@
"Terminology:\n", "Terminology:\n",
"* forecast origin: the last period when the target value is known\n", "* forecast origin: the last period when the target value is known\n",
"* forecast periods(s): the period(s) for which the value of the target is desired.\n", "* forecast periods(s): the period(s) for which the value of the target is desired.\n",
"* forecast horizon: the number of forecast periods\n",
"* lookback: how many past periods (before forecast origin) the model function depends on. The larger of number of lags and length of rolling window.\n", "* lookback: how many past periods (before forecast origin) the model function depends on. The larger of number of lags and length of rolling window.\n",
"* prediction context: `lookback` periods immediately preceding the forecast origin\n", "* prediction context: `lookback` periods immediately preceding the forecast origin\n",
"\n", "\n",
@@ -68,6 +67,7 @@
"import logging\n", "import logging\n",
"import warnings\n", "import warnings\n",
"\n", "\n",
"import azureml.core\n",
"from azureml.core.dataset import Dataset\n", "from azureml.core.dataset import Dataset\n",
"from pandas.tseries.frequencies import to_offset\n", "from pandas.tseries.frequencies import to_offset\n",
"from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import AmlCompute\n",
@@ -81,13 +81,29 @@
"np.set_printoptions(precision=4, suppress=True, linewidth=120)" "np.set_printoptions(precision=4, suppress=True, linewidth=120)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import azureml.core\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.train.automl import AutoMLConfig\n", "from azureml.train.automl import AutoMLConfig\n",
@@ -100,7 +116,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['SKU'] = ws.sku\n", "output['SKU'] = ws.sku\n",
@@ -258,29 +273,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"amlcompute_cluster_name = \"cpu-cluster-fcfn\"\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
" \n", "from azureml.core.compute_target import ComputeTargetException\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
"\n", "\n",
"if not found:\n", "# Choose a name for your CPU cluster\n",
" print('Creating a new compute target...')\n", "amlcompute_cluster_name = \"fcfn-cluster\"\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n", "\n",
" # Create the cluster.\\n\",\n", "# Verify that cluster does not exist already\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", "try:\n",
" compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" max_nodes=6)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n", "\n",
"print('Checking cluster status...')\n", "compute_target.wait_for_completion(show_output=True)"
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)"
] ]
}, },
{ {
@@ -346,9 +354,24 @@
" label_column_name=target_label,\n", " label_column_name=target_label,\n",
" **time_series_settings)\n", " **time_series_settings)\n",
"\n", "\n",
"remote_run = experiment.submit(automl_config, show_output=False)\n", "remote_run = experiment.submit(automl_config, show_output=False)"
"remote_run.wait_for_completion()\n", ]
"\n", },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run.wait_for_completion()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Retrieve the best model to use it further.\n", "# Retrieve the best model to use it further.\n",
"_, fitted_model = remote_run.get_output()" "_, fitted_model = remote_run.get_output()"
] ]
@@ -696,6 +719,90 @@
"X_show[['date', 'grain', 'ext_predictor', '_automl_target_col']]\n", "X_show[['date', 'grain', 'ext_predictor', '_automl_target_col']]\n",
"# prediction is in _automl_target_col" "# prediction is in _automl_target_col"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting farther than the maximum horizon <a id=\"recursive forecasting\"></a>\n",
"When the forecast destination, or the latest date in the prediction data frame, is farther into the future than the specified maximum horizon, the `forecast()` function will still make point predictions out to the later date using a recursive operation mode. Internally, the method recursively applies the regular forecaster to generate context so that we can forecast further into the future. \n",
"\n",
"To illustrate the use-case and operation of recursive forecasting, we'll consider an example with a single time-series where the forecasting period directly follows the training period and is twice as long as the maximum horizon given at training time.\n",
"\n",
"![Recursive_forecast_overview](recursive_forecast_overview_small.png)\n",
"\n",
"Internally, we apply the forecaster in an iterative manner and finish the forecast task in two interations. In the first iteration, we apply the forecaster and get the prediction for the first max-horizon periods (y_pred1). In the second iteraction, y_pred1 is used as the context to produce the prediction for the next max-horizon periods (y_pred2). The combination of (y_pred1 and y_pred2) gives the results for the total forecast periods. \n",
"\n",
"A caveat: forecast accuracy will likely be worse the farther we predict into the future since errors are compounded with recursive application of the forecaster.\n",
"\n",
"![Recursive_forecast_iter1](recursive_forecast_iter1.png)\n",
"![Recursive_forecast_iter2](recursive_forecast_iter2.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# generate the same kind of test data we trained on, but with a single grain/time-series and test period twice as long as the max_horizon\n",
"_, _, X_test_long, y_test_long = get_timeseries(train_len=n_train_periods,\n",
" test_len=max_horizon*2,\n",
" time_column_name=TIME_COLUMN_NAME,\n",
" target_column_name=TARGET_COLUMN_NAME,\n",
" grain_column_name=GRAIN_COLUMN_NAME,\n",
" grains=1)\n",
"\n",
"print(X_test_long.groupby(GRAIN_COLUMN_NAME)[TIME_COLUMN_NAME].min())\n",
"print(X_test_long.groupby(GRAIN_COLUMN_NAME)[TIME_COLUMN_NAME].max())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# forecast() function will invoke the recursive forecast method internally.\n",
"y_pred_long, X_trans_long = fitted_model.forecast(X_test_long)\n",
"y_pred_long"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# What forecast() function does in this case is equivalent to iterating it twice over the test set as the following. \n",
"y_pred1, _ = fitted_model.forecast(X_test_long[:max_horizon])\n",
"y_pred_all, _ = fitted_model.forecast(X_test_long, np.concatenate((y_pred1, np.full(max_horizon, np.nan))))\n",
"np.array_equal(y_pred_all, y_pred_long)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Confidence interval and distributional forecasts\n",
"AutoML cannot currently estimate forecast errors beyond the maximum horizon set during training, so the `forecast_quantiles()` function will return missing values for quantiles not equal to 0.5 beyond the maximum horizon. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fitted_model.forecast_quantiles(X_test_long)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly with the simple senarios illustrated above, forecasting farther than the max horizon in other senarios like 'multiple grain', 'Destination-date forecast', and 'forecast away from the training data' are also automatically handled by the `forecast()` function. "
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -4,6 +4,7 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy==1.16.2 - numpy==1.16.2
- pandas==0.23.4
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -65,7 +65,25 @@
"\n", "\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.train.automl import AutoMLConfig" "from azureml.train.automl import AutoMLConfig\n",
"from azureml.automl.core.featurization import FeaturizationConfig"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
] ]
}, },
{ {
@@ -89,7 +107,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['SKU'] = ws.sku\n", "output['SKU'] = ws.sku\n",
@@ -118,35 +135,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your CPU cluster\n",
"amlcompute_cluster_name = \"cpu-cluster-oj\"\n", "amlcompute_cluster_name = \"oj-cluster\"\n",
"\n", "\n",
"found = False\n", "# Verify that cluster does not exist already\n",
"# Check if this compute target already exists in the workspace.\n", "try:\n",
"cts = ws.compute_targets\n", " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " print('Found existing cluster, use it.')\n",
" found = True\n", "except ComputeTargetException:\n",
" print('Found existing compute target.')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_target = cts[amlcompute_cluster_name]\n", " max_nodes=6)\n",
" \n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n", "\n",
" # Create the cluster.\n", "compute_target.wait_for_completion(show_output=True)"
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
"# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -315,17 +319,58 @@
"target_column_name = 'Quantity'" "target_column_name = 'Quantity'"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Customization\n",
"\n",
"The featurization customization in forecasting is an advanced feature in AutoML which allows our customers to change the default forecasting featurization behaviors and column types through `FeaturizationConfig`. The supported scenarios include,\n",
"1. Column purposes update: Override feature type for the specified column. Currently supports DateTime, Categorical and Numeric. This customization can be used in the scenario that the type of the column cannot correctly reflect its purpose. Some numerical columns, for instance, can be treated as Categorical columns which need to be converted to categorical while some can be treated as epoch timestamp which need to be converted to datetime. To tell our SDK to correctly preprocess these columns, a configuration need to be add with the columns and their desired types.\n",
"2. Transformer parameters update: Currently supports parameter change for Imputer only. User can customize imputation methods, the supported methods are constant for target data and mean, median, most frequent and constant for training data. This customization can be used for the scenario that our customers know which imputation methods fit best to the input data. For instance, some datasets use NaN to represent 0 which the correct behavior should impute all the missing value with 0. To achieve this behavior, these columns need to be configured as constant imputation with `fill_value` 0.\n",
"3. Drop columns: Columns to drop from being featurized. These usually are the columns which are leaky or the columns contain no useful data.\n",
"\n",
"This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page.](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"sample-featurizationconfig-remarks"
]
},
"outputs": [],
"source": [
"featurization_config = FeaturizationConfig()\n",
"featurization_config.drop_columns = ['logQuantity'] # 'logQuantity' is a leaky feature, so we remove it.\n",
"# Force the CPWVOL5 feature to be numeric type.\n",
"featurization_config.add_column_purpose('CPWVOL5', 'Numeric')\n",
"# Fill missing values in the target column, Quantity, with zeros.\n",
"featurization_config.add_transformer_params('Imputer', ['Quantity'], {\"strategy\": \"constant\", \"fill_value\": 0})\n",
"# Fill missing values in the INCOME column with median value.\n",
"featurization_config.add_transformer_params('Imputer', ['INCOME'], {\"strategy\": \"median\"})"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Train\n", "## Train\n",
"\n", "\n",
"The AutoMLConfig object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters. \n", "The [AutoMLConfig](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py) object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters.\n",
"\n", "\n",
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n", "For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If grain columns are not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
"\n",
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning application that estimates the next month of sales should set the horizon according to suitable planning time-scales. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
"\n",
"We note here that AutoML can sweep over two types of time-series models:\n",
"* Models that are trained for each series such as ARIMA and Facebook's Prophet. Note that these models are only available for [Enterprise Edition Workspaces](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace#upgrade).\n",
"* Models trained across multiple time-series using a regression approach.\n",
"\n",
"In the first case, AutoML loops over all time-series in your dataset and trains one model (e.g. AutoArima or Prophet, as the case may be) for each series. This can result in long runtimes to train these models if there are a lot of series in the data. One way to mitigate this problem is to fit models for different series in parallel if you have multiple compute cores available. To enable this behavior, set the `max_cores_per_iteration` parameter in your AutoMLConfig as shown in the example in the next cell. \n",
"\n", "\n",
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
"\n", "\n",
"Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *validation_data* parameter of AutoMLConfig.\n", "Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *validation_data* parameter of AutoMLConfig.\n",
"\n", "\n",
@@ -346,8 +391,9 @@
"|**debug_log**|Log file path for writing debugging information|\n", "|**debug_log**|Log file path for writing debugging information|\n",
"|**time_column_name**|Name of the datetime column in the input data|\n", "|**time_column_name**|Name of the datetime column in the input data|\n",
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n", "|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n", "|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|\n",
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|" "|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
"|**max_cores_per_iteration**|Maximum number of cores to utilize per iteration. A value of -1 indicates all available cores should be used.|"
] ]
}, },
{ {
@@ -359,7 +405,6 @@
"time_series_settings = {\n", "time_series_settings = {\n",
" 'time_column_name': time_column_name,\n", " 'time_column_name': time_column_name,\n",
" 'grain_column_names': grain_column_names,\n", " 'grain_column_names': grain_column_names,\n",
" 'drop_column_names': ['logQuantity'], # 'logQuantity' is a leaky feature, so we remove it.\n",
" 'max_horizon': n_test_periods\n", " 'max_horizon': n_test_periods\n",
"}\n", "}\n",
"\n", "\n",
@@ -371,8 +416,10 @@
" label_column_name=target_column_name,\n", " label_column_name=target_column_name,\n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
" enable_early_stopping=True,\n", " enable_early_stopping=True,\n",
" featurization=featurization_config,\n",
" n_cross_validations=3,\n", " n_cross_validations=3,\n",
" verbosity=logging.INFO,\n", " verbosity=logging.INFO,\n",
" max_cores_per_iteration=-1,\n",
" **time_series_settings)" " **time_series_settings)"
] ]
}, },
@@ -422,6 +469,33 @@
"model_name = best_run.properties['model_name']" "model_name = best_run.properties['model_name']"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transparency\n",
"\n",
"View updated featurization summary"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_featurizer = fitted_model.named_steps['timeseriestransformer']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_featurizer.get_featurization_summary()"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -475,7 +549,7 @@
"source": [ "source": [
"If you are used to scikit pipelines, perhaps you expected `predict(X_test)`. However, forecasting requires a more general interface that also supplies the past target `y` values. Please use `forecast(X,y)` as `predict(X)` is reserved for internal purposes on forecasting models.\n", "If you are used to scikit pipelines, perhaps you expected `predict(X_test)`. However, forecasting requires a more general interface that also supplies the past target `y` values. Please use `forecast(X,y)` as `predict(X)` is reserved for internal purposes on forecasting models.\n",
"\n", "\n",
"The [energy demand forecasting notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) demonstrates the use of the forecast function in more detail in the context of using lags and rolling window features. " "The [forecast function notebook](../forecasting-forecast-function/auto-ml-forecasting-function.ipynb)."
] ]
}, },
{ {
@@ -506,16 +580,15 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.automl.core._vendor.automl.client.core.common import metrics\n", "from azureml.automl.core.shared import constants\n",
"from azureml.automl.runtime.shared.score import scoring\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"from automl.client.core.common import constants\n",
"\n", "\n",
"# use automl metrics module\n", "# use automl scoring module\n",
"scores = metrics.compute_metrics_regression(\n", "scores = scoring.score_regression(\n",
" df_all['predicted'],\n", " y_test=df_all[target_column_name],\n",
" df_all[target_column_name],\n", " y_pred=df_all['predicted'],\n",
" list(constants.Metric.SCALAR_REGRESSION_SET),\n", " metrics=list(constants.Metric.SCALAR_REGRESSION_SET))\n",
" None, None, None)\n",
"\n", "\n",
"print(\"[Test data scores]\\n\")\n", "print(\"[Test data scores]\\n\")\n",
"for key, value in scores.items(): \n", "for key, value in scores.items(): \n",

View File

@@ -28,7 +28,8 @@
"1. [Setup](#Setup)\n", "1. [Setup](#Setup)\n",
"1. [Train](#Train)\n", "1. [Train](#Train)\n",
"1. [Results](#Results)\n", "1. [Results](#Results)\n",
"1. [Test](#Test)\n", "1. [Test](#Tests)\n",
"1. [Explanation](#Explanation)\n",
"1. [Acknowledgements](#Acknowledgements)" "1. [Acknowledgements](#Acknowledgements)"
] ]
}, },
@@ -42,16 +43,16 @@
"\n", "\n",
"This notebook is using the local machine compute to train the model.\n", "This notebook is using the local machine compute to train the model.\n",
"\n", "\n",
"If you are using an Azure Machine Learning [Notebook VM](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup), you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n", "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n", "\n",
"In this notebook you will learn how to:\n", "In this notebook you will learn how to:\n",
"1. Create an experiment using an existing workspace.\n", "1. Create an experiment using an existing workspace.\n",
"2. Configure AutoML using `AutoMLConfig`.\n", "2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model.\n", "3. Train the model.\n",
"4. Explore the results.\n", "4. Explore the results.\n",
"5. Visualization model's feature importance in azure portal\n", "5. Test the fitted model.\n",
"6. Explore any model's explanation and explore feature importance in azure portal\n", "6. Explore any model's explanation and explore feature importance in azure portal.\n",
"7. Test the fitted model." "7. Create an AKS cluster, deploy the webservice of AutoML scoring model and the explainer model to the AKS and consume the web service."
] ]
}, },
{ {
@@ -82,6 +83,23 @@
"from azureml.explain.model._internal.explanation_client import ExplanationClient" "from azureml.explain.model._internal.explanation_client import ExplanationClient"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -96,7 +114,6 @@
"experiment=Experiment(ws, experiment_name)\n", "experiment=Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -239,9 +256,9 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Analyze results\n", "### Analyze results\n",
"\n", "\n",
"### Retrieve the Best Model\n", "#### Retrieve the Best Model\n",
"\n", "\n",
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
] ]
@@ -268,134 +285,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Best Model 's explanation\n", "## Tests\n",
"Retrieve the explanation from the best_run which includes explanations for engineered features and raw features.\n",
"\n",
"#### Download engineered feature importance from artifact store\n",
"You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run. You can also use azure portal url to view the dash board visualization of the feature importance values of the engineered features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = ExplanationClient.from_run(best_run)\n",
"engineered_explanations = client.download_model_explanation(raw=False)\n",
"print(engineered_explanations.get_feature_importance_dict())\n",
"print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + best_run.get_portal_url())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explanations\n",
"In this section, we will show how to compute model explanations and visualize the explanations using azureml-explain-model package. Besides retrieving an existing model explanation for an AutoML model, you can also explain your AutoML model with different test data. The following steps will allow you to compute and visualize engineered feature importance based on your test data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve any other AutoML model from training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_run, fitted_model = local_run.get_output(metric='accuracy')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Setup the model explanations for AutoML models\n",
"The fitted_model can generate the following which will be used for getting the engineered explanations using automl_setup_model_explanations:-\n",
"\n",
"1. Featurized data from train samples/test samples\n",
"2. Gather engineered name lists\n",
"3. Find the classes in your labeled column in classification scenarios\n",
"\n",
"The automl_explainer_setup_obj contains all the structures from above list."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train = training_data.drop_columns(columns=[label_column_name])\n",
"y_train = training_data.keep_columns(columns=[label_column_name], validate=True)\n",
"X_test = validation_data.drop_columns(columns=[label_column_name])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations\n",
"\n",
"automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, \n",
" X_test=X_test, y=y_train, \n",
" task='classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize the Mimic Explainer for feature importance\n",
"For explaining the AutoML models, use the MimicWrapper from azureml.explain.model package. The MimicWrapper can be initialized with fields in automl_explainer_setup_obj, your workspace and a LightGBM model which acts as a surrogate model to explain the AutoML model (fitted_model here). The MimicWrapper also takes the automl_run object where engineered explanations will be uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
"from azureml.explain.model.mimic_wrapper import MimicWrapper\n",
"explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, \n",
" init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,\n",
" features=automl_explainer_setup_obj.engineered_feature_names, \n",
" feature_maps=[automl_explainer_setup_obj.feature_map],\n",
" classes=automl_explainer_setup_obj.classes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Mimic Explainer for computing and visualizing engineered feature importance\n",
"The explain() method in MimicWrapper can be called with the transformed test samples to get the feature importance for the generated engineered features. You can also use azure portal url to view the dash board visualization of the feature importance values of the engineered features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engineered_explanations = explainer.explain(['local', 'global'], eval_dataset=automl_explainer_setup_obj.X_test_transform)\n",
"print(engineered_explanations.get_feature_importance_dict())\n",
"print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test the fitted model\n",
"\n", "\n",
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values." "Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
] ]
@@ -459,6 +349,407 @@
"plt.show()" "plt.show()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explanation\n",
"In this section, we will show how to compute model explanations and visualize the explanations using azureml-explain-model package. We will also show how to run the automl model and the explainer model through deploying an AKS web service.\n",
"\n",
"Besides retrieving an existing model explanation for an AutoML model, you can also explain your AutoML model with different test data. The following steps will allow you to compute and visualize engineered feature importance based on your test data.\n",
"\n",
"### Run the explanation\n",
"#### Download engineered feature importance from artifact store\n",
"You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run. You can also use azure portal url to view the dash board visualization of the feature importance values of the engineered features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = ExplanationClient.from_run(best_run)\n",
"engineered_explanations = client.download_model_explanation(raw=False)\n",
"print(engineered_explanations.get_feature_importance_dict())\n",
"print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + best_run.get_portal_url())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve any other AutoML model from training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_run, fitted_model = local_run.get_output(metric='accuracy')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Setup the model explanations for AutoML models\n",
"The fitted_model can generate the following which will be used for getting the engineered explanations using automl_setup_model_explanations:-\n",
"\n",
"1. Featurized data from train samples/test samples\n",
"2. Gather engineered name lists\n",
"3. Find the classes in your labeled column in classification scenarios\n",
"\n",
"The automl_explainer_setup_obj contains all the structures from above list."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train = training_data.drop_columns(columns=[label_column_name])\n",
"y_train = training_data.keep_columns(columns=[label_column_name], validate=True)\n",
"X_test = validation_data.drop_columns(columns=[label_column_name])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations\n",
"\n",
"automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, \n",
" X_test=X_test, y=y_train, \n",
" task='classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize the Mimic Explainer for feature importance\n",
"For explaining the AutoML models, use the MimicWrapper from azureml.explain.model package. The MimicWrapper can be initialized with fields in automl_explainer_setup_obj, your workspace and a surrogate model to explain the AutoML model (fitted_model here). The MimicWrapper also takes the automl_run object where engineered explanations will be uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.mimic_wrapper import MimicWrapper\n",
"explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator,\n",
" explainable_model=automl_explainer_setup_obj.surrogate_model, \n",
" init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,\n",
" features=automl_explainer_setup_obj.engineered_feature_names, \n",
" feature_maps=[automl_explainer_setup_obj.feature_map],\n",
" classes=automl_explainer_setup_obj.classes,\n",
" explainer_kwargs=automl_explainer_setup_obj.surrogate_model_params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Mimic Explainer for computing and visualizing engineered feature importance\n",
"The explain() method in MimicWrapper can be called with the transformed test samples to get the feature importance for the generated engineered features. You can also use azure portal url to view the dash board visualization of the feature importance values of the engineered features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Compute the engineered explanations\n",
"engineered_explanations = explainer.explain(['local', 'global'], eval_dataset=automl_explainer_setup_obj.X_test_transform)\n",
"print(engineered_explanations.get_feature_importance_dict())\n",
"print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize the scoring Explainer, save and upload it for later use in scoring explanation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer\n",
"import joblib\n",
"\n",
"# Initialize the ScoringExplainer\n",
"scoring_explainer = TreeScoringExplainer(explainer.explainer, feature_maps=[automl_explainer_setup_obj.feature_map])\n",
"\n",
"# Pickle scoring explainer locally to './scoring_explainer.pkl'\n",
"scoring_explainer_file_name = 'scoring_explainer.pkl'\n",
"with open(scoring_explainer_file_name, 'wb') as stream:\n",
" joblib.dump(scoring_explainer, stream)\n",
"\n",
"# Upload the scoring explainer to the automl run\n",
"automl_run.upload_file('outputs/scoring_explainer.pkl', scoring_explainer_file_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploying the scoring and explainer models to a web service to Azure Kubernetes Service (AKS)\n",
"\n",
"We use the TreeScoringExplainer from azureml.explain.model package to create the scoring explainer which will be used to compute the raw and engineered feature importances at the inference time. In the cell below, we register the AutoML model and the scoring explainer with the Model Management Service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Register trained automl model present in the 'outputs' folder in the artifacts\n",
"original_model = automl_run.register_model(model_name='automl_model', \n",
" model_path='outputs/model.pkl')\n",
"scoring_explainer_model = automl_run.register_model(model_name='scoring_explainer',\n",
" model_path='outputs/scoring_explainer.pkl')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create the conda dependencies for setting up the service\n",
"\n",
"We need to create the conda dependencies comprising of the azureml-explain-model, azureml-train-automl and azureml-defaults packages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.automl.core.shared import constants\n",
"from azureml.core.environment import Environment\n",
"\n",
"automl_run.download_file(constants.CONDA_ENV_FILE_PATH, 'myenv.yml')\n",
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
"myenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Write the Entry Script\n",
"Write the script that will be used to predict on your model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
"import pickle\n",
"import azureml.train.automl\n",
"import azureml.explain.model\n",
"from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \\\n",
" automl_setup_model_explanations\n",
"import joblib\n",
"from azureml.core.model import Model\n",
"\n",
"\n",
"def init():\n",
" global automl_model\n",
" global scoring_explainer\n",
"\n",
" # Retrieve the path to the model file using the model name\n",
" # Assume original model is named original_prediction_model\n",
" automl_model_path = Model.get_model_path('automl_model')\n",
" scoring_explainer_path = Model.get_model_path('scoring_explainer')\n",
"\n",
" automl_model = joblib.load(automl_model_path)\n",
" scoring_explainer = joblib.load(scoring_explainer_path)\n",
"\n",
"\n",
"def run(raw_data):\n",
" data = pd.read_json(raw_data, orient='records') \n",
" # Make prediction\n",
" predictions = automl_model.predict(data)\n",
" # Setup for inferencing explanations\n",
" automl_explainer_setup_obj = automl_setup_model_explanations(automl_model,\n",
" X_test=data, task='classification')\n",
" # Retrieve model explanations for engineered explanations\n",
" engineered_local_importance_values = scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform) \n",
" # You can return any data type as long as it is JSON-serializable\n",
" return {'predictions': predictions.tolist(),\n",
" 'engineered_local_importance_values': engineered_local_importance_values}\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create the InferenceConfig \n",
"Create the inference config that will be used when deploying the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"\n",
"inf_config = InferenceConfig(entry_script='score.py', environment=myenv)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Provision the AKS Cluster\n",
"This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AksCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your cluster.\n",
"aks_name = 'scoring-explain'\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" aks_target = ComputeTarget(workspace=ws, name=aks_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" prov_config = AksCompute.provisioning_configuration(vm_size='STANDARD_D3_V2')\n",
" aks_target = ComputeTarget.create(workspace=ws, \n",
" name=aks_name,\n",
" provisioning_configuration=prov_config)\n",
"aks_target.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deploy web service to AKS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set the web service configuration (using default here)\n",
"from azureml.core.webservice import AksWebservice\n",
"from azureml.core.model import Model\n",
"\n",
"aks_config = AksWebservice.deploy_configuration()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service_name ='model-scoring-local-aks'\n",
"\n",
"aks_service = Model.deploy(workspace=ws,\n",
" name=aks_service_name,\n",
" models=[scoring_explainer_model, original_model],\n",
" inference_config=inf_config,\n",
" deployment_config=aks_config,\n",
" deployment_target=aks_target)\n",
"\n",
"aks_service.wait_for_deployment(show_output = True)\n",
"print(aks_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### View the service logs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.get_logs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Consume the web service using run method to do the scoring and explanation of scoring.\n",
"We test the web sevice by passing data. Run() method retrieves API keys behind the scenes to make sure that call is authenticated."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Serialize the first row of the test data into json\n",
"X_test_json = X_test_df[:1].to_json(orient='records')\n",
"print(X_test_json)\n",
"\n",
"# Call the service to get the predictions and the engineered and raw explanations\n",
"output = aks_service.run(X_test_json)\n",
"\n",
"# Print the predicted value\n",
"print('predictions:\\n{}\\n'.format(output['predictions']))\n",
"# Print the engineered feature importances for the predicted value\n",
"print('engineered_local_importance_values:\\n{}\\n'.format(output['engineered_local_importance_values']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Clean up\n",
"Delete the service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.delete()"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -5,4 +5,3 @@ dependencies:
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- azureml-explain-model

View File

@@ -40,7 +40,7 @@
"In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n", "In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n",
"After training AutoML models for this regression data set, we show how you can compute model explanations on your remote compute using a sample explainer script.\n", "After training AutoML models for this regression data set, we show how you can compute model explanations on your remote compute using a sample explainer script.\n",
"\n", "\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n", "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n", "\n",
"An Enterprise workspace is required for this notebook. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page.](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade) \n", "An Enterprise workspace is required for this notebook. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page.](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade) \n",
"\n", "\n",
@@ -85,6 +85,23 @@
"from azureml.core.dataset import Dataset" "from azureml.core.dataset import Dataset"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -98,7 +115,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n", "output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -127,35 +143,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpu-cluster-5\"\n", "amlcompute_cluster_name = \"hardware-cluster\"\n",
"\n", "\n",
"found = False\n", "# Verify that cluster does not exist already\n",
"# Check if this compute target already exists in the workspace.\n", "try:\n",
"cts = ws.compute_targets\n", " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " print('Found existing cluster, use it.')\n",
" found = True\n", "except ComputeTargetException:\n",
" print('Found existing compute target.')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_target = cts[amlcompute_cluster_name]\n", " max_nodes=4)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n", "\n",
"if not found:\n", "compute_target.wait_for_completion(show_output=True)"
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 4)\n",
"\n",
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -239,7 +242,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"sample-featurizationconfig-remarks2"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"featurization_config = FeaturizationConfig()\n", "featurization_config = FeaturizationConfig()\n",
@@ -257,7 +264,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"sample-featurizationconfig-remarks3"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"automl_settings = {\n", "automl_settings = {\n",
@@ -320,8 +331,6 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"#from azureml.train.automl.run import AutoMLRun\n", "#from azureml.train.automl.run import AutoMLRun\n",
"#experiment_name = 'automl-regression-hardware'\n",
"#experiment = Experiment(ws, experiment_name)\n",
"#remote_run = AutoMLRun(experiment=experiment, run_id='<run_ID_goes_here')\n", "#remote_run = AutoMLRun(experiment=experiment, run_id='<run_ID_goes_here')\n",
"#remote_run" "#remote_run"
] ]
@@ -618,7 +627,7 @@
"source": [ "source": [
"from azureml.explain.model._internal.explanation_client import ExplanationClient\n", "from azureml.explain.model._internal.explanation_client import ExplanationClient\n",
"client = ExplanationClient.from_run(automl_run)\n", "client = ExplanationClient.from_run(automl_run)\n",
"engineered_explanations = client.download_model_explanation(raw=False)\n", "engineered_explanations = client.download_model_explanation(raw=False, comment='engineered explanations')\n",
"print(engineered_explanations.get_feature_importance_dict())\n", "print(engineered_explanations.get_feature_importance_dict())\n",
"print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())" "print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())"
] ]
@@ -637,7 +646,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"raw_explanations = client.download_model_explanation(raw=True)\n", "raw_explanations = client.download_model_explanation(raw=True, comment='raw explanations')\n",
"print(raw_explanations.get_feature_importance_dict())\n", "print(raw_explanations.get_feature_importance_dict())\n",
"print(\"You can visualize the raw explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())" "print(\"You can visualize the raw explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())"
] ]

View File

@@ -0,0 +1,7 @@
name: auto-ml-regression-explanation-featurization
dependencies:
- pip:
- azureml-sdk
- azureml-train-automl
- azureml-widgets
- matplotlib

View File

@@ -7,7 +7,7 @@ import azureml.train.automl
import azureml.explain.model import azureml.explain.model
from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \ from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \
automl_setup_model_explanations automl_setup_model_explanations
from sklearn.externals import joblib import joblib
from azureml.core.model import Model from azureml.core.model import Model

View File

@@ -4,15 +4,14 @@ import os
from azureml.core.run import Run from azureml.core.run import Run
from azureml.core.experiment import Experiment from azureml.core.experiment import Experiment
from sklearn.externals import joblib
from azureml.core.dataset import Dataset from azureml.core.dataset import Dataset
from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \ from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \
automl_setup_model_explanations, automl_check_model_if_explainable automl_setup_model_explanations, automl_check_model_if_explainable
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
from azureml.explain.model.mimic_wrapper import MimicWrapper from azureml.explain.model.mimic_wrapper import MimicWrapper
from automl.client.core.common.constants import MODEL_PATH from azureml.automl.core.shared.constants import MODEL_PATH
from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer, save from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer
import joblib
OUTPUT_DIR = './outputs/' OUTPUT_DIR = './outputs/'
os.makedirs(OUTPUT_DIR, exist_ok=True) os.makedirs(OUTPUT_DIR, exist_ok=True)
@@ -60,22 +59,22 @@ explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMEx
classes=automl_explainer_setup_obj.classes) classes=automl_explainer_setup_obj.classes)
# Compute the engineered explanations # Compute the engineered explanations
engineered_explanations = explainer.explain(['local', 'global'], engineered_explanations = explainer.explain(['local', 'global'], tag='engineered explanations',
eval_dataset=automl_explainer_setup_obj.X_test_transform) eval_dataset=automl_explainer_setup_obj.X_test_transform)
# Compute the raw explanations # Compute the raw explanations
raw_explanations = explainer.explain(['local', 'global'], get_raw=True, raw_explanations = explainer.explain(['local', 'global'], get_raw=True, tag='raw explanations',
raw_feature_names=automl_explainer_setup_obj.raw_feature_names, raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
eval_dataset=automl_explainer_setup_obj.X_test_transform) eval_dataset=automl_explainer_setup_obj.X_test_transform)
print("Engineered and raw explanations computed successfully") print("Engineered and raw explanations computed successfully")
# Initialize the ScoringExplainer # Initialize the ScoringExplainer
scoring_explainer = TreeScoringExplainer(explainer.explainer, feature_maps=[automl_explainer_setup_obj.feature_map]) scoring_explainer = TreeScoringExplainer(explainer.explainer, feature_maps=[automl_explainer_setup_obj.feature_map])
# Pickle scoring explainer locally # Pickle scoring explainer locally
save(scoring_explainer, exist_ok=True) with open('scoring_explainer.pkl', 'wb') as stream:
joblib.dump(scoring_explainer, stream)
# Upload the scoring explainer to the automl run # Upload the scoring explainer to the automl run
automl_run.upload_file('outputs/scoring_explainer.pkl', 'scoring_explainer.pkl') automl_run.upload_file('outputs/scoring_explainer.pkl', 'scoring_explainer.pkl')

View File

@@ -1,10 +0,0 @@
name: auto-ml-regression-hardware-performance-explanation-and-featurization
dependencies:
- pip:
- azureml-sdk
- azureml-train-automl
- azureml-widgets
- matplotlib
- azureml-explain-model
- azureml-explain-model
- azureml-contrib-interpret

View File

@@ -40,7 +40,7 @@
"## Introduction\n", "## Introduction\n",
"In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n", "In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n",
"\n", "\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n", "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n", "\n",
"In this notebook you will learn how to:\n", "In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n", "1. Create an `Experiment` in an existing `Workspace`.\n",
@@ -79,6 +79,23 @@
"from azureml.train.automl import AutoMLConfig" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -93,7 +110,6 @@
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
"output = {}\n", "output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
@@ -122,7 +138,7 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your CPU cluster\n", "# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster-2\"\n", "cpu_cluster_name = \"reg-cluster\"\n",
"\n", "\n",
"# Verify that cluster does not exist already\n", "# Verify that cluster does not exist already\n",
"try:\n", "try:\n",

View File

@@ -1,23 +0,0 @@
-- This shows using the AutoMLForecast stored procedure to predict using a forecasting model for the nyc_energy dataset.
DECLARE @Model NVARCHAR(MAX) = (SELECT TOP 1 Model FROM dbo.aml_model
WHERE ExperimentName = 'automl-sql-forecast'
ORDER BY CreatedDate DESC)
DECLARE @max_horizon INT = 48
DECLARE @split_time NVARCHAR(22) = (SELECT DATEADD(hour, -@max_horizon, MAX(timeStamp)) FROM nyc_energy WHERE demand IS NOT NULL)
DECLARE @TestDataQuery NVARCHAR(MAX) = '
SELECT CAST(timeStamp AS NVARCHAR(30)) AS timeStamp,
demand,
precip,
temp
FROM nyc_energy
WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL
AND timeStamp > ''' + @split_time + ''''
EXEC dbo.AutoMLForecast @input_query=@TestDataQuery,
@label_column='demand',
@time_column_name='timeStamp',
@model=@model
WITH RESULT SETS ((timeStamp DATETIME, grain NVARCHAR(255), predicted_demand FLOAT, precip FLOAT, temp FLOAT, actual_demand FLOAT))

View File

@@ -1,10 +0,0 @@
-- This lists all the metrics for all iterations for the most recent run.
DECLARE @RunId NVARCHAR(43)
DECLARE @ExperimentName NVARCHAR(255)
SELECT TOP 1 @ExperimentName=ExperimentName, @RunId=SUBSTRING(RunId, 1, 43)
FROM aml_model
ORDER BY CreatedDate DESC
EXEC dbo.AutoMLGetMetrics @RunId, @ExperimentName

View File

@@ -1,25 +0,0 @@
-- This shows using the AutoMLTrain stored procedure to create a forecasting model for the nyc_energy dataset.
DECLARE @max_horizon INT = 48
DECLARE @split_time NVARCHAR(22) = (SELECT DATEADD(hour, -@max_horizon, MAX(timeStamp)) FROM nyc_energy WHERE demand IS NOT NULL)
DECLARE @TrainDataQuery NVARCHAR(MAX) = '
SELECT CAST(timeStamp as NVARCHAR(30)) as timeStamp,
demand,
precip,
temp
FROM nyc_energy
WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL
and timeStamp < ''' + @split_time + ''''
INSERT INTO dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)
EXEC dbo.AutoMLTrain @input_query= @TrainDataQuery,
@label_column='demand',
@task='forecasting',
@iterations=10,
@iteration_timeout_minutes=5,
@time_column_name='timeStamp',
@max_horizon=@max_horizon,
@experiment_name='automl-sql-forecast',
@primary_metric='normalized_root_mean_squared_error'

View File

@@ -1,161 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train a model and use it for prediction\r\n",
"\r\n",
"Before running this notebook, run the auto-ml-sql-setup.ipynb notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/sql-server/energy-demand/auto-ml-sql-energy-demand.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set the default database"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"USE [automl]\r\n",
"GO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use the AutoMLTrain stored procedure to create a forecasting model for the nyc_energy dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"INSERT INTO dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)\r\n",
"EXEC dbo.AutoMLTrain @input_query='\r\n",
"SELECT CAST(timeStamp as NVARCHAR(30)) as timeStamp,\r\n",
" demand,\r\n",
"\t precip,\r\n",
"\t temp,\r\n",
"\t CASE WHEN timeStamp < ''2017-01-01'' THEN 0 ELSE 1 END AS is_validate_column\r\n",
"FROM nyc_energy\r\n",
"WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL\r\n",
"and timeStamp < ''2017-02-01''',\r\n",
"@label_column='demand',\r\n",
"@task='forecasting',\r\n",
"@iterations=10,\r\n",
"@iteration_timeout_minutes=5,\r\n",
"@time_column_name='timeStamp',\r\n",
"@is_validate_column='is_validate_column',\r\n",
"@experiment_name='automl-sql-forecast',\r\n",
"@primary_metric='normalized_root_mean_squared_error'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use the AutoMLPredict stored procedure to predict using the forecasting model for the nyc_energy dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"DECLARE @Model NVARCHAR(MAX) = (SELECT TOP 1 Model FROM dbo.aml_model\r\n",
" WHERE ExperimentName = 'automl-sql-forecast'\r\n",
"\t\t\t\t\t\t\t\tORDER BY CreatedDate DESC)\r\n",
"\r\n",
"EXEC dbo.AutoMLPredict @input_query='\r\n",
"SELECT CAST(timeStamp AS NVARCHAR(30)) AS timeStamp,\r\n",
" demand,\r\n",
"\t precip,\r\n",
"\t temp\r\n",
"FROM nyc_energy\r\n",
"WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL\r\n",
"AND timeStamp >= ''2017-02-01''',\r\n",
"@label_column='demand',\r\n",
"@model=@model\r\n",
"WITH RESULT SETS ((timeStamp NVARCHAR(30), actual_demand FLOAT, precip FLOAT, temp FLOAT, predicted_demand FLOAT))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## List all the metrics for all iterations for the most recent training run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"DECLARE @RunId NVARCHAR(43)\r\n",
"DECLARE @ExperimentName NVARCHAR(255)\r\n",
"\r\n",
"SELECT TOP 1 @ExperimentName=ExperimentName, @RunId=SUBSTRING(RunId, 1, 43)\r\n",
"FROM aml_model\r\n",
"ORDER BY CreatedDate DESC\r\n",
"\r\n",
"EXEC dbo.AutoMLGetMetrics @RunId, @ExperimentName"
]
}
],
"metadata": {
"authors": [
{
"name": "jeffshep"
}
],
"category": "tutorial",
"compute": [
"Local"
],
"datasets": [
"NYC Energy"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"Azure ML AutoML"
],
"tags": [
""
],
"friendly_name": "Forecasting with automated ML SQL integration",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.6",
"language": "sql",
"name": "python36"
},
"language_info": {
"name": "sql",
"version": ""
},
"task": "Forecasting"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,92 +0,0 @@
-- This procedure forecast values based on a forecasting model returned by AutoMLTrain.
-- It returns a dataset with the forecasted values.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE OR ALTER PROCEDURE [dbo].[AutoMLForecast]
(
@input_query NVARCHAR(MAX), -- A SQL query returning data to predict on.
@model NVARCHAR(MAX), -- A model returned from AutoMLTrain.
@time_column_name NVARCHAR(255)='', -- The name of the timestamp column for forecasting.
@label_column NVARCHAR(255)='', -- Optional name of the column from input_query, which should be ignored when predicting
@y_query_column NVARCHAR(255)='', -- Optional value column that can be used for predicting.
-- If specified, this can contain values for past times (after the model was trained)
-- and contain Nan for future times.
@forecast_column_name NVARCHAR(255) = 'predicted'
-- The name of the output column containing the forecast value.
) AS
BEGIN
EXEC sp_execute_external_script @language = N'Python', @script = N'import pandas as pd
import azureml.core
import numpy as np
from azureml.train.automl import AutoMLConfig
import pickle
import codecs
model_obj = pickle.loads(codecs.decode(model.encode(), "base64"))
test_data = input_data.copy()
if label_column != "" and label_column is not None:
y_test = test_data.pop(label_column).values
else:
y_test = None
if y_query_column != "" and y_query_column is not None:
y_query = test_data.pop(y_query_column).values
else:
y_query = np.repeat(np.nan, len(test_data))
X_test = test_data
if time_column_name != "" and time_column_name is not None:
X_test[time_column_name] = pd.to_datetime(X_test[time_column_name])
y_fcst, X_trans = model_obj.forecast(X_test, y_query)
def align_outputs(y_forecast, X_trans, X_test, y_test, forecast_column_name):
# Demonstrates how to get the output aligned to the inputs
# using pandas indexes. Helps understand what happened if
# the output shape differs from the input shape, or if
# the data got re-sorted by time and grain during forecasting.
# Typical causes of misalignment are:
# * we predicted some periods that were missing in actuals -> drop from eval
# * model was asked to predict past max_horizon -> increase max horizon
# * data at start of X_test was needed for lags -> provide previous periods
df_fcst = pd.DataFrame({forecast_column_name : y_forecast})
# y and X outputs are aligned by forecast() function contract
df_fcst.index = X_trans.index
# align original X_test to y_test
X_test_full = X_test.copy()
if y_test is not None:
X_test_full[label_column] = y_test
# X_test_full does not include origin, so reset for merge
df_fcst.reset_index(inplace=True)
X_test_full = X_test_full.reset_index().drop(columns=''index'')
together = df_fcst.merge(X_test_full, how=''right'')
# drop rows where prediction or actuals are nan
# happens because of missing actuals
# or at edges of time due to lags/rolling windows
clean = together[together[[label_column, forecast_column_name]].notnull().all(axis=1)]
return(clean)
combined_output = align_outputs(y_fcst, X_trans, X_test, y_test, forecast_column_name)
'
, @input_data_1 = @input_query
, @input_data_1_name = N'input_data'
, @output_data_1_name = N'combined_output'
, @params = N'@model NVARCHAR(MAX), @time_column_name NVARCHAR(255), @label_column NVARCHAR(255), @y_query_column NVARCHAR(255), @forecast_column_name NVARCHAR(255)'
, @model = @model
, @time_column_name = @time_column_name
, @label_column = @label_column
, @y_query_column = @y_query_column
, @forecast_column_name = @forecast_column_name
END

View File

@@ -1,70 +0,0 @@
-- This procedure returns a list of metrics for each iteration of a run.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE OR ALTER PROCEDURE [dbo].[AutoMLGetMetrics]
(
@run_id NVARCHAR(250), -- The RunId
@experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.
@connection_name NVARCHAR(255)='default' -- The AML connection to use.
) AS
BEGIN
DECLARE @tenantid NVARCHAR(255)
DECLARE @appid NVARCHAR(255)
DECLARE @password NVARCHAR(255)
DECLARE @config_file NVARCHAR(255)
SELECT @tenantid=TenantId, @appid=AppId, @password=Password, @config_file=ConfigFile
FROM aml_connection
WHERE ConnectionName = @connection_name;
EXEC sp_execute_external_script @language = N'Python', @script = N'import pandas as pd
import logging
import azureml.core
import numpy as np
from azureml.core.experiment import Experiment
from azureml.train.automl.run import AutoMLRun
from azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.workspace import Workspace
auth = ServicePrincipalAuthentication(tenantid, appid, password)
ws = Workspace.from_config(path=config_file, auth=auth)
experiment = Experiment(ws, experiment_name)
ml_run = AutoMLRun(experiment = experiment, run_id = run_id)
children = list(ml_run.get_children())
iterationlist = []
metricnamelist = []
metricvaluelist = []
for run in children:
properties = run.get_properties()
if "iteration" in properties:
iteration = int(properties["iteration"])
for metric_name, metric_value in run.get_metrics().items():
if isinstance(metric_value, float):
iterationlist.append(iteration)
metricnamelist.append(metric_name)
metricvaluelist.append(metric_value)
metrics = pd.DataFrame({"iteration": iterationlist, "metric_name": metricnamelist, "metric_value": metricvaluelist})
'
, @output_data_1_name = N'metrics'
, @params = N'@run_id NVARCHAR(250),
@experiment_name NVARCHAR(32),
@tenantid NVARCHAR(255),
@appid NVARCHAR(255),
@password NVARCHAR(255),
@config_file NVARCHAR(255)'
, @run_id = @run_id
, @experiment_name = @experiment_name
, @tenantid = @tenantid
, @appid = @appid
, @password = @password
, @config_file = @config_file
WITH RESULT SETS ((iteration INT, metric_name NVARCHAR(100), metric_value FLOAT))
END

View File

@@ -1,41 +0,0 @@
-- This procedure predicts values based on a model returned by AutoMLTrain and a dataset.
-- It returns the dataset with a new column added, which is the predicted value.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE OR ALTER PROCEDURE [dbo].[AutoMLPredict]
(
@input_query NVARCHAR(MAX), -- A SQL query returning data to predict on.
@model NVARCHAR(MAX), -- A model returned from AutoMLTrain.
@label_column NVARCHAR(255)='' -- Optional name of the column from input_query, which should be ignored when predicting
) AS
BEGIN
EXEC sp_execute_external_script @language = N'Python', @script = N'import pandas as pd
import azureml.core
import numpy as np
from azureml.train.automl import AutoMLConfig
import pickle
import codecs
model_obj = pickle.loads(codecs.decode(model.encode(), "base64"))
test_data = input_data.copy()
if label_column != "" and label_column is not None:
y_test = test_data.pop(label_column).values
X_test = test_data
predicted = model_obj.predict(X_test)
combined_output = input_data.assign(predicted=predicted)
'
, @input_data_1 = @input_query
, @input_data_1_name = N'input_data'
, @output_data_1_name = N'combined_output'
, @params = N'@model NVARCHAR(MAX), @label_column NVARCHAR(255)'
, @model = @model
, @label_column = @label_column
END

View File

@@ -1,240 +0,0 @@
-- This stored procedure uses automated machine learning to train several models
-- and returns the best model.
--
-- The result set has several columns:
-- best_run - iteration ID for the best model
-- experiment_name - experiment name pass in with the @experiment_name parameter
-- fitted_model - best model found
-- log_file_text - AutoML debug_log contents
-- workspace - name of the Azure ML workspace where run history is stored
--
-- An example call for a classification problem is:
-- insert into dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)
-- exec dbo.AutoMLTrain @input_query='
-- SELECT top 100000
-- CAST([pickup_datetime] AS NVARCHAR(30)) AS pickup_datetime
-- ,CAST([dropoff_datetime] AS NVARCHAR(30)) AS dropoff_datetime
-- ,[passenger_count]
-- ,[trip_time_in_secs]
-- ,[trip_distance]
-- ,[payment_type]
-- ,[tip_class]
-- FROM [dbo].[nyctaxi_sample] order by [hack_license] ',
-- @label_column = 'tip_class',
-- @iterations=10
--
-- An example call for forecasting is:
-- insert into dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)
-- exec dbo.AutoMLTrain @input_query='
-- select cast(timeStamp as nvarchar(30)) as timeStamp,
-- demand,
-- precip,
-- temp,
-- case when timeStamp < ''2017-01-01'' then 0 else 1 end as is_validate_column
-- from nyc_energy
-- where demand is not null and precip is not null and temp is not null
-- and timeStamp < ''2017-02-01''',
-- @label_column='demand',
-- @task='forecasting',
-- @iterations=10,
-- @iteration_timeout_minutes=5,
-- @time_column_name='timeStamp',
-- @is_validate_column='is_validate_column',
-- @experiment_name='automl-sql-forecast',
-- @primary_metric='normalized_root_mean_squared_error'
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE OR ALTER PROCEDURE [dbo].[AutoMLTrain]
(
@input_query NVARCHAR(MAX), -- The SQL Query that will return the data to train and validate the model.
@label_column NVARCHAR(255)='Label', -- The name of the column in the result of @input_query that is the label.
@primary_metric NVARCHAR(40)='AUC_weighted', -- The metric to optimize.
@iterations INT=100, -- The maximum number of pipelines to train.
@task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.
@experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.
@iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline.
@experiment_timeout_hours FLOAT = 1, -- The maximum time in hours for training all pipelines.
@n_cross_validations INT = 3, -- The number of cross validations.
@blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.
-- The list of possible models can be found at:
-- https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings
@whitelist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that can be used.
-- The list of possible models can be found at:
-- https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings
@experiment_exit_score FLOAT = 0, -- Stop the experiment if this score is acheived.
@sample_weight_column NVARCHAR(255)='', -- The name of the column in the result of @input_query that gives a sample weight.
@is_validate_column NVARCHAR(255)='', -- The name of the column in the result of @input_query that indicates if the row is for training or validation.
-- In the values of the column, 0 means for training and 1 means for validation.
@time_column_name NVARCHAR(255)='', -- The name of the timestamp column for forecasting.
@connection_name NVARCHAR(255)='default', -- The AML connection to use.
@max_horizon INT = 0 -- A forecast horizon is a time span into the future (or just beyond the latest date in the training data)
-- where forecasts of the target quantity are needed.
-- For example, if data is recorded daily and max_horizon is 5, we will predict 5 days ahead.
) AS
BEGIN
DECLARE @tenantid NVARCHAR(255)
DECLARE @appid NVARCHAR(255)
DECLARE @password NVARCHAR(255)
DECLARE @config_file NVARCHAR(255)
SELECT @tenantid=TenantId, @appid=AppId, @password=Password, @config_file=ConfigFile
FROM aml_connection
WHERE ConnectionName = @connection_name;
EXEC sp_execute_external_script @language = N'Python', @script = N'import pandas as pd
import logging
import azureml.core
import pandas as pd
import numpy as np
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from sklearn import datasets
import pickle
import codecs
from azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.workspace import Workspace
if __name__.startswith("sqlindb"):
auth = ServicePrincipalAuthentication(tenantid, appid, password)
ws = Workspace.from_config(path=config_file, auth=auth)
project_folder = "./sample_projects/" + experiment_name
experiment = Experiment(ws, experiment_name)
data_train = input_data
X_valid = None
y_valid = None
sample_weight_valid = None
if is_validate_column != "" and is_validate_column is not None:
data_train = input_data[input_data[is_validate_column] <= 0]
data_valid = input_data[input_data[is_validate_column] > 0]
data_train.pop(is_validate_column)
data_valid.pop(is_validate_column)
y_valid = data_valid.pop(label_column).values
if sample_weight_column != "" and sample_weight_column is not None:
sample_weight_valid = data_valid.pop(sample_weight_column).values
X_valid = data_valid
n_cross_validations = None
y_train = data_train.pop(label_column).values
sample_weight = None
if sample_weight_column != "" and sample_weight_column is not None:
sample_weight = data_train.pop(sample_weight_column).values
X_train = data_train
if experiment_timeout_hours == 0:
experiment_timeout_hours = None
if experiment_exit_score == 0:
experiment_exit_score = None
if blacklist_models == "":
blacklist_models = None
if blacklist_models is not None:
blacklist_models = blacklist_models.replace(" ", "").split(",")
if whitelist_models == "":
whitelist_models = None
if whitelist_models is not None:
whitelist_models = whitelist_models.replace(" ", "").split(",")
automl_settings = {}
preprocess = True
if time_column_name != "" and time_column_name is not None:
automl_settings = { "time_column_name": time_column_name }
preprocess = False
if max_horizon > 0:
automl_settings["max_horizon"] = max_horizon
log_file_name = "automl_sqlindb_errors.log"
automl_config = AutoMLConfig(task = task,
debug_log = log_file_name,
primary_metric = primary_metric,
iteration_timeout_minutes = iteration_timeout_minutes,
experiment_timeout_hours = experiment_timeout_hours,
iterations = iterations,
n_cross_validations = n_cross_validations,
preprocess = preprocess,
verbosity = logging.INFO,
X = X_train,
y = y_train,
path = project_folder,
blacklist_models = blacklist_models,
whitelist_models = whitelist_models,
experiment_exit_score = experiment_exit_score,
sample_weight = sample_weight,
X_valid = X_valid,
y_valid = y_valid,
sample_weight_valid = sample_weight_valid,
**automl_settings)
local_run = experiment.submit(automl_config, show_output = True)
best_run, fitted_model = local_run.get_output()
pickled_model = codecs.encode(pickle.dumps(fitted_model), "base64").decode()
log_file_text = ""
try:
with open(log_file_name, "r") as log_file:
log_file_text = log_file.read()
except:
log_file_text = "Log file not found"
returned_model = pd.DataFrame({"best_run": [best_run.id], "experiment_name": [experiment_name], "fitted_model": [pickled_model], "log_file_text": [log_file_text], "workspace": [ws.name]}, dtype=np.dtype(np.str))
'
, @input_data_1 = @input_query
, @input_data_1_name = N'input_data'
, @output_data_1_name = N'returned_model'
, @params = N'@label_column NVARCHAR(255),
@primary_metric NVARCHAR(40),
@iterations INT, @task NVARCHAR(40),
@experiment_name NVARCHAR(32),
@iteration_timeout_minutes INT,
@experiment_timeout_hours FLOAT,
@n_cross_validations INT,
@blacklist_models NVARCHAR(MAX),
@whitelist_models NVARCHAR(MAX),
@experiment_exit_score FLOAT,
@sample_weight_column NVARCHAR(255),
@is_validate_column NVARCHAR(255),
@time_column_name NVARCHAR(255),
@tenantid NVARCHAR(255),
@appid NVARCHAR(255),
@password NVARCHAR(255),
@config_file NVARCHAR(255),
@max_horizon INT'
, @label_column = @label_column
, @primary_metric = @primary_metric
, @iterations = @iterations
, @task = @task
, @experiment_name = @experiment_name
, @iteration_timeout_minutes = @iteration_timeout_minutes
, @experiment_timeout_hours = @experiment_timeout_hours
, @n_cross_validations = @n_cross_validations
, @blacklist_models = @blacklist_models
, @whitelist_models = @whitelist_models
, @experiment_exit_score = @experiment_exit_score
, @sample_weight_column = @sample_weight_column
, @is_validate_column = @is_validate_column
, @time_column_name = @time_column_name
, @tenantid = @tenantid
, @appid = @appid
, @password = @password
, @config_file = @config_file
, @max_horizon = @max_horizon
WITH RESULT SETS ((best_run NVARCHAR(250), experiment_name NVARCHAR(100), fitted_model VARCHAR(MAX), log_file_text NVARCHAR(MAX), workspace NVARCHAR(100)))
END

View File

@@ -1,18 +0,0 @@
-- This is a table to store the Azure ML connection information.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[aml_connection](
[Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[ConnectionName] [nvarchar](255) NULL,
[TenantId] [nvarchar](255) NULL,
[AppId] [nvarchar](255) NULL,
[Password] [nvarchar](255) NULL,
[ConfigFile] [nvarchar](255) NULL
) ON [PRIMARY]
GO

View File

@@ -1,22 +0,0 @@
-- This is a table to hold the results from the AutoMLTrain procedure.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[aml_model](
[Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Model] [varchar](max) NOT NULL, -- The model, which can be passed to AutoMLPredict for testing or prediction.
[RunId] [nvarchar](250) NULL, -- The RunId, which can be used to view the model in the Azure Portal.
[CreatedDate] [datetime] NULL,
[ExperimentName] [nvarchar](100) NULL, -- Azure ML Experiment Name
[WorkspaceName] [nvarchar](100) NULL, -- Azure ML Workspace Name
[LogFileText] [nvarchar](max) NULL
)
GO
ALTER TABLE [dbo].[aml_model] ADD DEFAULT (getutcdate()) FOR [CreatedDate]
GO

View File

@@ -1,581 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Set up Azure ML Automated Machine Learning on SQL Server 2019 CTP 2.4 big data cluster\r\n",
"\r\n",
"\\# Prerequisites: \r\n",
"\\# - An Azure subscription and resource group \r\n",
"\\# - An Azure Machine Learning workspace \r\n",
"\\# - A SQL Server 2019 CTP 2.4 big data cluster with Internet access and a database named 'automl' \r\n",
"\\# - Azure CLI \r\n",
"\\# - kubectl command \r\n",
"\\# - The https://github.com/Azure/MachineLearningNotebooks repository downloaded (cloned) to your local machine\r\n",
"\r\n",
"\\# In the 'automl' database, create a table named 'dbo.nyc_energy' as follows: \r\n",
"\\# - In SQL Server Management Studio, right-click the 'automl' database, select Tasks, then Import Flat File. \r\n",
"\\# - Select the file AzureMlCli\\notebooks\\how-to-use-azureml\\automated-machine-learning\\forecasting-energy-demand\\nyc_energy.csv. \r\n",
"\\# - Using the \"Modify Columns\" page, allow nulls for all columns. \r\n",
"\r\n",
"\\# Create an Azure Machine Learning Workspace using the instructions at https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace \r\n",
"\r\n",
"\\# Create an Azure service principal. You can do this with the following commands: \r\n",
"\r\n",
"az login \r\n",
"az account set --subscription *subscriptionid* \r\n",
"\r\n",
"\\# The following command prints out the **appId** and **tenant**, \r\n",
"\\# which you insert into the indicated cell later in this notebook \r\n",
"\\# to allow AutoML to authenticate with Azure: \r\n",
"\r\n",
"az ad sp create-for-rbac --name *principlename* --password *password*\r\n",
"\r\n",
"\\# Log into the master instance of SQL Server 2019 CTP 2.4: \r\n",
"kubectl exec -it mssql-master-pool-0 -n *clustername* -c mssql-server -- /bin/bash\r\n",
"\r\n",
"mkdir /tmp/aml\r\n",
"\r\n",
"cd /tmp/aml\r\n",
"\r\n",
"\\# **Modify** the following with your subscription_id, resource_group, and workspace_name: \r\n",
"cat > config.json << EOF \r\n",
"{ \r\n",
" \"subscription_id\": \"123456ab-78cd-0123-45ef-abcd12345678\", \r\n",
" \"resource_group\": \"myrg1\", \r\n",
" \"workspace_name\": \"myws1\" \r\n",
"} \r\n",
"EOF\r\n",
"\r\n",
"\\# The directory referenced below is appropriate for the master instance of SQL Server 2019 CTP 2.4.\r\n",
"\r\n",
"cd /opt/mssql/mlservices/runtime/python/bin\r\n",
"\r\n",
"./python -m pip install azureml-sdk[automl]\r\n",
"\r\n",
"./python -m pip install --upgrade numpy \r\n",
"\r\n",
"./python -m pip install --upgrade sklearn\r\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/sql-server/setup/auto-ml-sql-setup.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"-- Enable external scripts to allow invoking Python\r\n",
"sp_configure 'external scripts enabled',1 \r\n",
"reconfigure with override \r\n",
"GO\r\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"-- Use database 'automl'\r\n",
"USE [automl]\r\n",
"GO"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"-- This is a table to hold the Azure ML connection information.\r\n",
"SET ANSI_NULLS ON\r\n",
"GO\r\n",
"\r\n",
"SET QUOTED_IDENTIFIER ON\r\n",
"GO\r\n",
"\r\n",
"CREATE TABLE [dbo].[aml_connection](\r\n",
" [Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,\r\n",
"\t[ConnectionName] [nvarchar](255) NULL,\r\n",
"\t[TenantId] [nvarchar](255) NULL,\r\n",
"\t[AppId] [nvarchar](255) NULL,\r\n",
"\t[Password] [nvarchar](255) NULL,\r\n",
"\t[ConfigFile] [nvarchar](255) NULL\r\n",
") ON [PRIMARY]\r\n",
"GO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Copy the values from create-for-rbac above into the cell below"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"-- Use the following values:\r\n",
"-- Leave the name as 'Default'\r\n",
"-- Insert <tenant> returned by create-for-rbac above\r\n",
"-- Insert <AppId> returned by create-for-rbac above\r\n",
"-- Insert <password> used in create-for-rbac above\r\n",
"-- Leave <path> as '/tmp/aml/config.json'\r\n",
"INSERT INTO [dbo].[aml_connection] \r\n",
"VALUES (\r\n",
" N'Default', -- Name\r\n",
" N'11111111-2222-3333-4444-555555555555', -- Tenant\r\n",
" N'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee', -- AppId\r\n",
" N'insertpasswordhere', -- Password\r\n",
" N'/tmp/aml/config.json' -- Path\r\n",
" );\r\n",
"GO"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"-- This is a table to hold the results from the AutoMLTrain procedure.\r\n",
"SET ANSI_NULLS ON\r\n",
"GO\r\n",
"\r\n",
"SET QUOTED_IDENTIFIER ON\r\n",
"GO\r\n",
"\r\n",
"CREATE TABLE [dbo].[aml_model](\r\n",
" [Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,\r\n",
" [Model] [varchar](max) NOT NULL, -- The model, which can be passed to AutoMLPredict for testing or prediction.\r\n",
" [RunId] [nvarchar](250) NULL, -- The RunId, which can be used to view the model in the Azure Portal.\r\n",
" [CreatedDate] [datetime] NULL,\r\n",
" [ExperimentName] [nvarchar](100) NULL, -- Azure ML Experiment Name\r\n",
" [WorkspaceName] [nvarchar](100) NULL, -- Azure ML Workspace Name\r\n",
"\t[LogFileText] [nvarchar](max) NULL\r\n",
") \r\n",
"GO\r\n",
"\r\n",
"ALTER TABLE [dbo].[aml_model] ADD DEFAULT (getutcdate()) FOR [CreatedDate]\r\n",
"GO\r\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"-- This stored procedure uses automated machine learning to train several models\r\n",
"-- and return the best model.\r\n",
"--\r\n",
"-- The result set has several columns:\r\n",
"-- best_run - ID of the best model found\r\n",
"-- experiment_name - training run name\r\n",
"-- fitted_model - best model found\r\n",
"-- log_file_text - console output\r\n",
"-- workspace - name of the Azure ML workspace where run history is stored\r\n",
"--\r\n",
"-- An example call for a classification problem is:\r\n",
"-- insert into dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)\r\n",
"-- exec dbo.AutoMLTrain @input_query='\r\n",
"-- SELECT top 100000 \r\n",
"-- CAST([pickup_datetime] AS NVARCHAR(30)) AS pickup_datetime\r\n",
"-- ,CAST([dropoff_datetime] AS NVARCHAR(30)) AS dropoff_datetime\r\n",
"-- ,[passenger_count]\r\n",
"-- ,[trip_time_in_secs]\r\n",
"-- ,[trip_distance]\r\n",
"-- ,[payment_type]\r\n",
"-- ,[tip_class]\r\n",
"-- FROM [dbo].[nyctaxi_sample] order by [hack_license] ',\r\n",
"-- @label_column = 'tip_class',\r\n",
"-- @iterations=10\r\n",
"-- \r\n",
"-- An example call for forecasting is:\r\n",
"-- insert into dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)\r\n",
"-- exec dbo.AutoMLTrain @input_query='\r\n",
"-- select cast(timeStamp as nvarchar(30)) as timeStamp,\r\n",
"-- demand,\r\n",
"-- \t precip,\r\n",
"-- \t temp,\r\n",
"-- case when timeStamp < ''2017-01-01'' then 0 else 1 end as is_validate_column\r\n",
"-- from nyc_energy\r\n",
"-- where demand is not null and precip is not null and temp is not null\r\n",
"-- and timeStamp < ''2017-02-01''',\r\n",
"-- @label_column='demand',\r\n",
"-- @task='forecasting',\r\n",
"-- @iterations=10,\r\n",
"-- @iteration_timeout_minutes=5,\r\n",
"-- @time_column_name='timeStamp',\r\n",
"-- @is_validate_column='is_validate_column',\r\n",
"-- @experiment_name='automl-sql-forecast',\r\n",
"-- @primary_metric='normalized_root_mean_squared_error'\r\n",
"\r\n",
"SET ANSI_NULLS ON\r\n",
"GO\r\n",
"SET QUOTED_IDENTIFIER ON\r\n",
"GO\r\n",
"CREATE OR ALTER PROCEDURE [dbo].[AutoMLTrain]\r\n",
" (\r\n",
" @input_query NVARCHAR(MAX), -- The SQL Query that will return the data to train and validate the model.\r\n",
" @label_column NVARCHAR(255)='Label', -- The name of the column in the result of @input_query that is the label.\r\n",
" @primary_metric NVARCHAR(40)='AUC_weighted', -- The metric to optimize.\r\n",
" @iterations INT=100, -- The maximum number of pipelines to train.\r\n",
" @task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.\r\n",
" @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.\r\n",
" @iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline. \r\n",
" @experiment_timeout_hours FLOAT = 1, -- The maximum time in hours for training all pipelines.\r\n",
" @n_cross_validations INT = 3, -- The number of cross validations.\r\n",
" @blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.\r\n",
" -- The list of possible models can be found at:\r\n",
" -- https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings\r\n",
" @whitelist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that can be used.\r\n",
" -- The list of possible models can be found at:\r\n",
" -- https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings\r\n",
" @experiment_exit_score FLOAT = 0, -- Stop the experiment if this score is acheived.\r\n",
" @sample_weight_column NVARCHAR(255)='', -- The name of the column in the result of @input_query that gives a sample weight.\r\n",
" @is_validate_column NVARCHAR(255)='', -- The name of the column in the result of @input_query that indicates if the row is for training or validation.\r\n",
"\t -- In the values of the column, 0 means for training and 1 means for validation.\r\n",
" @time_column_name NVARCHAR(255)='', -- The name of the timestamp column for forecasting.\r\n",
"\t@connection_name NVARCHAR(255)='default' -- The AML connection to use.\r\n",
" ) AS\r\n",
"BEGIN\r\n",
"\r\n",
" DECLARE @tenantid NVARCHAR(255)\r\n",
" DECLARE @appid NVARCHAR(255)\r\n",
" DECLARE @password NVARCHAR(255)\r\n",
" DECLARE @config_file NVARCHAR(255)\r\n",
"\r\n",
"\tSELECT @tenantid=TenantId, @appid=AppId, @password=Password, @config_file=ConfigFile\r\n",
"\tFROM aml_connection\r\n",
"\tWHERE ConnectionName = @connection_name;\r\n",
"\r\n",
"\tEXEC sp_execute_external_script @language = N'Python', @script = N'import pandas as pd\r\n",
"import logging \r\n",
"import azureml.core \r\n",
"import pandas as pd\r\n",
"import numpy as np\r\n",
"from azureml.core.experiment import Experiment \r\n",
"from azureml.train.automl import AutoMLConfig \r\n",
"from sklearn import datasets \r\n",
"import pickle\r\n",
"import codecs\r\n",
"from azureml.core.authentication import ServicePrincipalAuthentication \r\n",
"from azureml.core.workspace import Workspace \r\n",
"\r\n",
"if __name__.startswith(\"sqlindb\"):\r\n",
" auth = ServicePrincipalAuthentication(tenantid, appid, password) \r\n",
" \r\n",
" ws = Workspace.from_config(path=config_file, auth=auth) \r\n",
" \r\n",
" project_folder = \"./sample_projects/\" + experiment_name\r\n",
" \r\n",
" experiment = Experiment(ws, experiment_name) \r\n",
"\r\n",
" data_train = input_data\r\n",
" X_valid = None\r\n",
" y_valid = None\r\n",
" sample_weight_valid = None\r\n",
"\r\n",
" if is_validate_column != \"\" and is_validate_column is not None:\r\n",
" data_train = input_data[input_data[is_validate_column] <= 0]\r\n",
" data_valid = input_data[input_data[is_validate_column] > 0]\r\n",
" data_train.pop(is_validate_column)\r\n",
" data_valid.pop(is_validate_column)\r\n",
" y_valid = data_valid.pop(label_column).values\r\n",
" if sample_weight_column != \"\" and sample_weight_column is not None:\r\n",
" sample_weight_valid = data_valid.pop(sample_weight_column).values\r\n",
" X_valid = data_valid\r\n",
" n_cross_validations = None\r\n",
"\r\n",
" y_train = data_train.pop(label_column).values\r\n",
"\r\n",
" sample_weight = None\r\n",
" if sample_weight_column != \"\" and sample_weight_column is not None:\r\n",
" sample_weight = data_train.pop(sample_weight_column).values\r\n",
"\r\n",
" X_train = data_train\r\n",
"\r\n",
" if experiment_timeout_hours == 0:\r\n",
" experiment_timeout_hours = None\r\n",
"\r\n",
" if experiment_exit_score == 0:\r\n",
" experiment_exit_score = None\r\n",
"\r\n",
" if blacklist_models == \"\":\r\n",
" blacklist_models = None\r\n",
"\r\n",
" if blacklist_models is not None:\r\n",
" blacklist_models = blacklist_models.replace(\" \", \"\").split(\",\")\r\n",
"\r\n",
" if whitelist_models == \"\":\r\n",
" whitelist_models = None\r\n",
"\r\n",
" if whitelist_models is not None:\r\n",
" whitelist_models = whitelist_models.replace(\" \", \"\").split(\",\")\r\n",
"\r\n",
" automl_settings = {}\r\n",
" preprocess = True\r\n",
" if time_column_name != \"\" and time_column_name is not None:\r\n",
" automl_settings = { \"time_column_name\": time_column_name }\r\n",
" preprocess = False\r\n",
"\r\n",
" log_file_name = \"automl_errors.log\"\r\n",
"\t \r\n",
" automl_config = AutoMLConfig(task = task, \r\n",
" debug_log = log_file_name, \r\n",
" primary_metric = primary_metric, \r\n",
" iteration_timeout_minutes = iteration_timeout_minutes, \r\n",
" experiment_timeout_hours = experiment_timeout_hours,\r\n",
" iterations = iterations, \r\n",
" n_cross_validations = n_cross_validations, \r\n",
" preprocess = preprocess,\r\n",
" verbosity = logging.INFO, \r\n",
" X = X_train, \r\n",
" y = y_train, \r\n",
" path = project_folder,\r\n",
" blacklist_models = blacklist_models,\r\n",
" whitelist_models = whitelist_models,\r\n",
" experiment_exit_score = experiment_exit_score,\r\n",
" sample_weight = sample_weight,\r\n",
" X_valid = X_valid,\r\n",
" y_valid = y_valid,\r\n",
" sample_weight_valid = sample_weight_valid,\r\n",
" **automl_settings) \r\n",
" \r\n",
" local_run = experiment.submit(automl_config, show_output = True) \r\n",
"\r\n",
" best_run, fitted_model = local_run.get_output()\r\n",
"\r\n",
" pickled_model = codecs.encode(pickle.dumps(fitted_model), \"base64\").decode()\r\n",
"\r\n",
" log_file_text = \"\"\r\n",
"\r\n",
" try:\r\n",
" with open(log_file_name, \"r\") as log_file:\r\n",
" log_file_text = log_file.read()\r\n",
" except:\r\n",
" log_file_text = \"Log file not found\"\r\n",
"\r\n",
" returned_model = pd.DataFrame({\"best_run\": [best_run.id], \"experiment_name\": [experiment_name], \"fitted_model\": [pickled_model], \"log_file_text\": [log_file_text], \"workspace\": [ws.name]}, dtype=np.dtype(np.str))\r\n",
"'\r\n",
"\t, @input_data_1 = @input_query\r\n",
"\t, @input_data_1_name = N'input_data'\r\n",
"\t, @output_data_1_name = N'returned_model'\r\n",
"\t, @params = N'@label_column NVARCHAR(255), \r\n",
"\t @primary_metric NVARCHAR(40),\r\n",
"\t\t\t\t @iterations INT, @task NVARCHAR(40),\r\n",
"\t\t\t\t @experiment_name NVARCHAR(32),\r\n",
"\t\t\t\t @iteration_timeout_minutes INT,\r\n",
"\t\t\t\t @experiment_timeout_hours FLOAT,\r\n",
"\t\t\t\t @n_cross_validations INT,\r\n",
"\t\t\t\t @blacklist_models NVARCHAR(MAX),\r\n",
"\t\t\t\t @whitelist_models NVARCHAR(MAX),\r\n",
"\t\t\t\t @experiment_exit_score FLOAT,\r\n",
"\t\t\t\t @sample_weight_column NVARCHAR(255),\r\n",
"\t\t\t\t @is_validate_column NVARCHAR(255),\r\n",
"\t\t\t\t @time_column_name NVARCHAR(255),\r\n",
"\t\t\t\t @tenantid NVARCHAR(255),\r\n",
"\t\t\t\t @appid NVARCHAR(255),\r\n",
"\t\t\t\t @password NVARCHAR(255),\r\n",
"\t\t\t\t @config_file NVARCHAR(255)'\r\n",
"\t, @label_column = @label_column\r\n",
"\t, @primary_metric = @primary_metric\r\n",
"\t, @iterations = @iterations\r\n",
"\t, @task = @task\r\n",
"\t, @experiment_name = @experiment_name\r\n",
"\t, @iteration_timeout_minutes = @iteration_timeout_minutes\r\n",
"\t, @experiment_timeout_hours = @experiment_timeout_hours\r\n",
"\t, @n_cross_validations = @n_cross_validations\r\n",
"\t, @blacklist_models = @blacklist_models\r\n",
"\t, @whitelist_models = @whitelist_models\r\n",
"\t, @experiment_exit_score = @experiment_exit_score\r\n",
"\t, @sample_weight_column = @sample_weight_column\r\n",
"\t, @is_validate_column = @is_validate_column\r\n",
"\t, @time_column_name = @time_column_name\r\n",
"\t, @tenantid = @tenantid\r\n",
"\t, @appid = @appid\r\n",
"\t, @password = @password\r\n",
"\t, @config_file = @config_file\r\n",
"WITH RESULT SETS ((best_run NVARCHAR(250), experiment_name NVARCHAR(100), fitted_model VARCHAR(MAX), log_file_text NVARCHAR(MAX), workspace NVARCHAR(100)))\r\n",
"END"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"-- This procedure returns a list of metrics for each iteration of a training run.\r\n",
"SET ANSI_NULLS ON\r\n",
"GO\r\n",
"SET QUOTED_IDENTIFIER ON\r\n",
"GO\r\n",
"CREATE OR ALTER PROCEDURE [dbo].[AutoMLGetMetrics]\r\n",
" (\r\n",
"\t@run_id NVARCHAR(250), -- The RunId\r\n",
" @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.\r\n",
" @connection_name NVARCHAR(255)='default' -- The AML connection to use.\r\n",
" ) AS\r\n",
"BEGIN\r\n",
" DECLARE @tenantid NVARCHAR(255)\r\n",
" DECLARE @appid NVARCHAR(255)\r\n",
" DECLARE @password NVARCHAR(255)\r\n",
" DECLARE @config_file NVARCHAR(255)\r\n",
"\r\n",
"\tSELECT @tenantid=TenantId, @appid=AppId, @password=Password, @config_file=ConfigFile\r\n",
"\tFROM aml_connection\r\n",
"\tWHERE ConnectionName = @connection_name;\r\n",
"\r\n",
" EXEC sp_execute_external_script @language = N'Python', @script = N'import pandas as pd\r\n",
"import logging \r\n",
"import azureml.core \r\n",
"import numpy as np\r\n",
"from azureml.core.experiment import Experiment \r\n",
"from azureml.train.automl.run import AutoMLRun\r\n",
"from azureml.core.authentication import ServicePrincipalAuthentication \r\n",
"from azureml.core.workspace import Workspace \r\n",
"\r\n",
"auth = ServicePrincipalAuthentication(tenantid, appid, password) \r\n",
" \r\n",
"ws = Workspace.from_config(path=config_file, auth=auth) \r\n",
" \r\n",
"experiment = Experiment(ws, experiment_name) \r\n",
"\r\n",
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)\r\n",
"\r\n",
"children = list(ml_run.get_children())\r\n",
"iterationlist = []\r\n",
"metricnamelist = []\r\n",
"metricvaluelist = []\r\n",
"\r\n",
"for run in children:\r\n",
" properties = run.get_properties()\r\n",
" if \"iteration\" in properties:\r\n",
" iteration = int(properties[\"iteration\"])\r\n",
" for metric_name, metric_value in run.get_metrics().items():\r\n",
" if isinstance(metric_value, float):\r\n",
" iterationlist.append(iteration)\r\n",
" metricnamelist.append(metric_name)\r\n",
" metricvaluelist.append(metric_value)\r\n",
" \r\n",
"metrics = pd.DataFrame({\"iteration\": iterationlist, \"metric_name\": metricnamelist, \"metric_value\": metricvaluelist})\r\n",
"'\r\n",
" , @output_data_1_name = N'metrics'\r\n",
"\t, @params = N'@run_id NVARCHAR(250), \r\n",
"\t\t\t\t @experiment_name NVARCHAR(32),\r\n",
" \t\t\t\t @tenantid NVARCHAR(255),\r\n",
"\t\t\t\t @appid NVARCHAR(255),\r\n",
"\t\t\t\t @password NVARCHAR(255),\r\n",
"\t\t\t\t @config_file NVARCHAR(255)'\r\n",
" , @run_id = @run_id\r\n",
"\t, @experiment_name = @experiment_name\r\n",
"\t, @tenantid = @tenantid\r\n",
"\t, @appid = @appid\r\n",
"\t, @password = @password\r\n",
"\t, @config_file = @config_file\r\n",
"WITH RESULT SETS ((iteration INT, metric_name NVARCHAR(100), metric_value FLOAT))\r\n",
"END"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"-- This procedure predicts values based on a model returned by AutoMLTrain and a dataset.\r\n",
"-- It returns the dataset with a new column added, which is the predicted value.\r\n",
"SET ANSI_NULLS ON\r\n",
"GO\r\n",
"SET QUOTED_IDENTIFIER ON\r\n",
"GO\r\n",
"CREATE OR ALTER PROCEDURE [dbo].[AutoMLPredict]\r\n",
" (\r\n",
" @input_query NVARCHAR(MAX), -- A SQL query returning data to predict on.\r\n",
" @model NVARCHAR(MAX), -- A model returned from AutoMLTrain.\r\n",
" @label_column NVARCHAR(255)='' -- Optional name of the column from input_query, which should be ignored when predicting\r\n",
" ) AS \r\n",
"BEGIN \r\n",
" \r\n",
" EXEC sp_execute_external_script @language = N'Python', @script = N'import pandas as pd \r\n",
"import azureml.core \r\n",
"import numpy as np \r\n",
"from azureml.train.automl import AutoMLConfig \r\n",
"import pickle \r\n",
"import codecs \r\n",
" \r\n",
"model_obj = pickle.loads(codecs.decode(model.encode(), \"base64\")) \r\n",
" \r\n",
"test_data = input_data.copy() \r\n",
"\r\n",
"if label_column != \"\" and label_column is not None:\r\n",
" y_test = test_data.pop(label_column).values \r\n",
"X_test = test_data \r\n",
" \r\n",
"predicted = model_obj.predict(X_test) \r\n",
" \r\n",
"combined_output = input_data.assign(predicted=predicted)\r\n",
" \r\n",
"' \r\n",
" , @input_data_1 = @input_query \r\n",
" , @input_data_1_name = N'input_data' \r\n",
" , @output_data_1_name = N'combined_output' \r\n",
" , @params = N'@model NVARCHAR(MAX), @label_column NVARCHAR(255)' \r\n",
" , @model = @model \r\n",
"\t, @label_column = @label_column\r\n",
"END"
]
}
],
"metadata": {
"authors": [
{
"name": "jeffshep"
}
],
"category": "tutorial",
"compute": [
"None"
],
"datasets": [
"None"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"Azure ML AutoML"
],
"friendly_name": "Setup automated ML SQL integration",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.6",
"language": "sql",
"name": "python36"
},
"language_info": {
"name": "sql",
"version": ""
},
"tags": [
""
],
"task": "None"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -512,9 +512,11 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Retrieve the Best Model after the above run is complete \n", "## Deploy\n",
"\n", "\n",
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." "### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
] ]
}, },
{ {
@@ -523,17 +525,15 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"best_run, fitted_model = local_run.get_output()\n", "best_run, fitted_model = local_run.get_output()"
"print(best_run)\n",
"print(fitted_model)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Best Model Based on Any Other Metric after the above run is complete based on the child run\n", "### Download the conda environment file\n",
"Show the run and the model that has the smallest `log_loss` value:" "From the *best_run* download the conda environment file that was used to train the AutoML model."
] ]
}, },
{ {
@@ -542,10 +542,34 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"lookup_metric = \"log_loss\"\n", "from azureml.automl.core.shared import constants\n",
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n", "conda_env_file_name = 'conda_env.yml'\n",
"print(best_run)\n", "best_run.download_file(name=\"outputs/conda_env_v_1_0_0.yml\", output_file_path=conda_env_file_name)\n",
"print(fitted_model)" "with open(conda_env_file_name, \"r\") as conda_file:\n",
" conda_file_contents = conda_file.read()\n",
" print(conda_file_contents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download the model scoring file\n",
"From the *best_run* download the scoring file to get the predictions from the AutoML model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.automl.core.shared import constants\n",
"script_file_name = 'scoring_file.py'\n",
"best_run.download_file(name=\"outputs/scoring_file_v_1_0_0.py\", output_file_path=script_file_name)\n",
"with open(script_file_name, \"r\") as scoring_file:\n",
" scoring_file_contents = scoring_file.read()\n",
" print(scoring_file_contents)"
] ]
}, },
{ {
@@ -572,8 +596,9 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Create Scoring Script\n", "### Deploy the model as a Web Service on Azure Container Instance\n",
"Replace model_id with name of model from output of above register cell" "\n",
"Create the configuration needed for deploying the model as a web service service."
] ]
}, },
{ {
@@ -582,123 +607,17 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"%%writefile score.py\n",
"import pickle\n",
"import json\n",
"import numpy as np\n",
"import azureml.train.automl\n",
"from sklearn.externals import joblib\n",
"from azureml.core.model import Model\n",
"import pandas as pd\n",
"\n",
"def init():\n",
" global model\n",
" model_path = Model.get_model_path(model_name = '<<model_id>>') # this name is model.id of model that we want to deploy\n",
" # deserialize the model file back into a sklearn model\n",
" model = joblib.load(model_path)\n",
"\n",
"def run(raw_data):\n",
" try:\n",
" data = (pd.DataFrame(np.array(json.loads(raw_data)['data']), columns=[str(i) for i in range(0,64)]))\n",
" result = model.predict(data)\n",
" except Exception as e:\n",
" result = str(e)\n",
" return json.dumps({\"error\": result})\n",
" return json.dumps({\"result\":result.tolist()})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Replace <<model_id>>\n",
"content = \"\"\n",
"with open(\"score.py\", \"r\") as fo:\n",
" content = fo.read()\n",
"\n",
"new_content = content.replace(\"<<model_id>>\", local_run.model_id)\n",
"with open(\"score.py\", \"w\") as fw:\n",
" fw.write(new_content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create a YAML File for the Environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults', 'azureml-sdk[automl]'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy the model as a Web Service on Azure Container Instance\n",
"Replace servicename with any meaningful name of service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this will take 10-15 minutes to finish\n",
"\n",
"from azureml.core.webservice import AciWebservice, Webservice\n",
"from azureml.exceptions import WebserviceException\n",
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.model import Model\n", "from azureml.core.webservice import AciWebservice\n",
"from azureml.core.environment import Environment\n", "from azureml.core.environment import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import uuid\n",
"\n", "\n",
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=conda_env_file_name)\n",
"inference_config = InferenceConfig(entry_script=script_file_name, environment=myenv)\n",
"\n", "\n",
"myaci_config = AciWebservice.deploy_configuration(\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" cpu_cores = 2, \n", " memory_gb = 1, \n",
" memory_gb = 2, \n", " tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n",
" tags = {'name':'Databricks Azure ML ACI'}, \n", " description = 'sample service for Automl Classification')"
" description = 'This is for ADB and AutoML example.')\n",
"\n",
"myenv = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')\n",
"# we need to add extra packages to procured environment\n",
"# in order to deploy amended environment we need to rename it\n",
"myenv.name = 'myenv'\n",
"model_dependencies = CondaDependencies('myenv.yml')\n",
"for pip_dep in model_dependencies.pip_packages:\n",
" myenv.python.conda_dependencies.add_pip_package(pip_dep)\n",
"for conda_dep in model_dependencies.conda_packages:\n",
" myenv.python.conda_dependencies.add_conda_package(conda_dep)\n",
"inference_config = InferenceConfig(entry_script='score_sparkml.py', environment=myenv)\n",
"\n",
"guid = str(uuid.uuid4()).split(\"-\")[0]\n",
"service_name = \"myservice-{}\".format(guid)\n",
"\n",
"# Remove any existing service under the same name.\n",
"try:\n",
" Webservice(ws, service_name).delete()\n",
"except WebserviceException:\n",
" pass\n",
"\n",
"print(\"Creating service with name: {}\".format(service_name))\n",
"\n",
"myservice = Model.deploy(ws, service_name, [model], inference_config, myaci_config)\n",
"myservice.wait_for_deployment(show_output=True)"
] ]
}, },
{ {
@@ -707,8 +626,14 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#for using the Web HTTP API \n", "from azureml.core.webservice import Webservice\n",
"print(myservice.scoring_uri)" "from azureml.core.model import Model\n",
"\n",
"aci_service_name = 'automl-databricks-local'\n",
"print(aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
] ]
}, },
{ {
@@ -752,7 +677,7 @@
"for index in np.random.choice(len(y_test), 2, replace = False):\n", "for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n", " print(index)\n",
" test_sample = json.dumps({'data':X_test[index:index + 1].values.tolist()})\n", " test_sample = json.dumps({'data':X_test[index:index + 1].values.tolist()})\n",
" predicted = myservice.run(input_data = test_sample)\n", " predicted = aci_service.run(input_data = test_sample)\n",
" label = y_test.values[index]\n", " label = y_test.values[index]\n",
" predictedDict = json.loads(predicted)\n", " predictedDict = json.loads(predicted)\n",
" title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0]) \n", " title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0]) \n",

View File

@@ -1,497 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/accelerated-models/accelerated-models-object-detection.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure ML Hardware Accelerated Object Detection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial will show you how to deploy an object detection service based on the SSD-VGG model in just a few minutes using the Azure Machine Learning Accelerated AI service.\n",
"\n",
"We will use the SSD-VGG model accelerated on an FPGA. Our Accelerated Models Service handles translating deep neural networks (DNN) into an FPGA program.\n",
"\n",
"The steps in this notebook are: \n",
"1. [Setup Environment](#set-up-environment)\n",
"* [Construct Model](#construct-model)\n",
" * Image Preprocessing\n",
" * Featurizer\n",
" * Save Model\n",
" * Save input and output tensor names\n",
"* [Create Image](#create-image)\n",
"* [Deploy Image](#deploy-image)\n",
"* [Test the Service](#test-service)\n",
" * Create Client\n",
" * Serve the model\n",
"* [Cleanup](#cleanup)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"set-up-environment\"></a>\n",
"## 1. Set up Environment\n",
"### 1.a. Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.b. Retrieve Workspace\n",
"If you haven't created a Workspace, please follow [this notebook](\"../../../configuration.ipynb\") to do so. If you have, run the codeblock below to retrieve it. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"construct-model\"></a>\n",
"## 2. Construct model\n",
"### 2.a. Image preprocessing\n",
"We'd like our service to accept JPEG images as input. However the input to SSD-VGG is a float tensor of shape \\[1, 300, 300, 3\\]. The first dimension is batch, then height, width, and channels (i.e. NHWC). To bridge this gap, we need code that decodes JPEG images and resizes them appropriately for input to SSD-VGG. The Accelerated AI service can execute TensorFlow graphs as part of the service and we'll use that ability to do the image preprocessing. This code defines a TensorFlow graph that preprocesses an array of JPEG images (as TensorFlow strings) and produces a tensor that is ready to be featurized by SSD-VGG.\n",
"\n",
"**Note:** Expect to see TF deprecation warnings until we port our SDK over to use Tensorflow 2.0."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n",
"import azureml.accel.models.utils as utils\n",
"tf.reset_default_graph()\n",
"\n",
"in_images = tf.placeholder(tf.string)\n",
"image_tensors = utils.preprocess_array(in_images, output_width=300, output_height=300, preserve_aspect_ratio=False)\n",
"print(image_tensors.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.b. Featurizer\n",
"The SSD-VGG model is different from our other models in that it generates 12 tensor outputs. These corresponds to x,y displacements of the anchor boxes and the detection confidence (for 21 classes). Because these outputs are not convenient to work with, we will later use a pre-defined post-processing utility to transform the outputs into a simplified list of bounding boxes with their respective class and confidence.\n",
"\n",
"For more information about the output tensors, take this example: the output tensor 'ssd_300_vgg/block4_box/Reshape_1:0' has a shape of [None, 37, 37, 4, 21]. This gives the pre-softmax confidence for 4 anchor boxes situated at each site of a 37 x 37 grid imposed on the image, one confidence score for each of the 21 classes. The first dimension is the batch dimension. Likewise, 'ssd_300_vgg/block4_box/Reshape:0' has shape [None, 37, 37, 4, 4] and encodes the (cx, cy) center shift and rescaling (sw, sh) relative to each anchor box. Refer to the [SSD-VGG paper](https://arxiv.org/abs/1512.02325) to understand how these are computed. The other 10 tensors are defined similarly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.accel.models import SsdVgg\n",
"\n",
"saved_model_dir = os.path.join(os.path.expanduser('~'), 'models')\n",
"model_graph = SsdVgg(saved_model_dir, is_frozen = True)\n",
"\n",
"print('SSD-VGG Input Tensors:')\n",
"for idx, input_name in enumerate(model_graph.input_tensor_list):\n",
" print('{}, {}'.format(input_name, model_graph.get_input_dims(idx)))\n",
" \n",
"print('SSD-VGG Output Tensors:')\n",
"for idx, output_name in enumerate(model_graph.output_tensor_list):\n",
" print('{}, {}'.format(output_name, model_graph.get_output_dims(idx)))\n",
"\n",
"ssd_outputs = model_graph.import_graph_def(image_tensors, is_training=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.c. Save Model\n",
"Now that we loaded both parts of the tensorflow graph (preprocessor and SSD-VGG featurizer), we can save the graph and associated variables to a directory which we can register as an Azure ML Model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_name = \"ssdvgg\"\n",
"model_save_path = os.path.join(saved_model_dir, model_name, \"saved_model\")\n",
"print(\"Saving model in {}\".format(model_save_path))\n",
"\n",
"output_map = {}\n",
"for i, output in enumerate(ssd_outputs):\n",
" output_map['out_{}'.format(i)] = output\n",
"\n",
"with tf.Session() as sess:\n",
" model_graph.restore_weights(sess)\n",
" tf.saved_model.simple_save(sess, \n",
" model_save_path, \n",
" inputs={'images': in_images}, \n",
" outputs=output_map)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.d. Important! Save names of input and output tensors\n",
"\n",
"These input and output tensors that were created during the preprocessing and classifier steps are also going to be used when **converting the model** to an Accelerated Model that can run on FPGA's and for **making an inferencing request**. It is very important to save this information!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"input_tensors = in_images.name\n",
"# We will use the list of output tensors during inferencing\n",
"output_tensors = [output.name for output in ssd_outputs]\n",
"# However, for multiple output tensors, our AccelOnnxConverter will \n",
"# accept comma-delimited strings (lists will cause error)\n",
"output_tensors_str = \",\".join(output_tensors)\n",
"\n",
"print(input_tensors)\n",
"print(output_tensors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"create-image\"></a>\n",
"## 3. Create AccelContainerImage\n",
"Below we will execute all the same steps as in the [Quickstart](./accelerated-models-quickstart.ipynb#create-image) to package the model we have saved locally into an accelerated Docker image saved in our workspace. To complete all the steps, it may take a few minutes. For more details on each step, check out the [Quickstart section on model registration](./accelerated-models-quickstart.ipynb#register-model)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"from azureml.core.model import Model\n",
"from azureml.core.image import Image\n",
"from azureml.accel import AccelOnnxConverter\n",
"from azureml.accel import AccelContainerImage\n",
"\n",
"# Retrieve workspace\n",
"ws = Workspace.from_config()\n",
"print(\"Successfully retrieved workspace:\", ws.name, ws.resource_group, ws.location, ws.subscription_id, '\\n')\n",
"\n",
"# Register model\n",
"registered_model = Model.register(workspace = ws,\n",
" model_path = model_save_path,\n",
" model_name = model_name)\n",
"print(\"Successfully registered: \", registered_model.name, registered_model.description, registered_model.version, '\\n', sep = '\\t')\n",
"\n",
"# Convert model\n",
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors_str)\n",
"if convert_request.wait_for_completion(show_output = False):\n",
" # If the above call succeeded, get the converted model\n",
" converted_model = convert_request.result\n",
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
" converted_model.id, converted_model.created_time, '\\n')\n",
"else:\n",
" print(\"Model conversion failed. Showing output.\")\n",
" convert_request.wait_for_completion(show_output = True)\n",
"\n",
"# Package into AccelContainerImage\n",
"image_config = AccelContainerImage.image_configuration()\n",
"# Image name must be lowercase\n",
"image_name = \"{}-image\".format(model_name)\n",
"image = Image.create(name = image_name,\n",
" models = [converted_model],\n",
" image_config = image_config, \n",
" workspace = ws)\n",
"image.wait_for_creation()\n",
"print(\"Created AccelContainerImage: {} {} {}\\n\".format(image.name, image.creation_state, image.image_location))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"deploy-image\"></a>\n",
"## 4. Deploy image\n",
"Once you have an Azure ML Accelerated Image in your Workspace, you can deploy it to two destinations, to a Databox Edge machine or to an AKS cluster. \n",
"\n",
"### 4.a. Deploy to Databox Edge Machine using IoT Hub\n",
"See the sample [here](https://github.com/Azure-Samples/aml-real-time-ai/) for using the Azure IoT CLI extension for deploying your Docker image to your Databox Edge Machine.\n",
"\n",
"### 4.b. Deploy to AKS Cluster\n",
"Same as in the [Quickstart section on image deployment](./accelerated-models-quickstart.ipynb#deploy-image), we are going to create an AKS cluster with FPGA-enabled machines, then deploy our service to it.\n",
"#### Create AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Uses the specific FPGA enabled VM (sku: Standard_PB6s)\n",
"# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia\n",
"prov_config = AksCompute.provisioning_configuration(vm_size = \"Standard_PB6s\",\n",
" agent_count = 1, \n",
" location = \"eastus\")\n",
"\n",
"aks_name = 'aks-pb6-obj'\n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Provisioning an AKS cluster might take awhile (15 or so minutes), and we want to wait until it's successfully provisioned before we can deploy a service to it. If you interrupt this cell, provisioning of the cluster will continue. You can re-run it or check the status in your Workspace under Compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deploy AccelContainerImage to AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"\n",
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
"# Authentication is enabled by default, but for testing we specify False\n",
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service-3'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target)\n",
"aks_service.wait_for_deployment(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-service\"></a>\n",
"## 5. Test the service\n",
"<a id=\"create-client\"></a>\n",
"### 5.a. Create Client\n",
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
"\n",
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using the grpc client in AzureML Accelerated Models SDK\n",
"from azureml.accel import client_from_service\n",
"\n",
"# Initialize AzureML Accelerated Models client\n",
"client = client_from_service(aks_service)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can adapt the client [code](https://github.com/Azure/aml-real-time-ai/blob/master/pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](https://github.com/Azure/aml-real-time-ai/blob/master/sample-clients/csharp).\n",
"\n",
"The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"serve-model\"></a>\n",
"### 5.b. Serve the model\n",
"The SSD-VGG model returns the confidence and bounding boxes for all possible anchor boxes. As mentioned earlier, we will use a post-processing routine to transform this into a list of bounding boxes (y1, x1, y2, x2) where x, y are fractional coordinates measured from left and top respectively. A respective list of classes and scores is also returned to tag each bounding box. Below we make use of this information to draw the bounding boxes on top the original image. Note that in the post-processing routine we select a confidence threshold of 0.5."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cv2\n",
"from matplotlib import pyplot as plt\n",
"\n",
"colors_tableau = [(255, 255, 255), (31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),\n",
" (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),\n",
" (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),\n",
" (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),\n",
" (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]\n",
"\n",
"\n",
"def draw_boxes_on_img(img, classes, scores, bboxes, thickness=2):\n",
" shape = img.shape\n",
" for i in range(bboxes.shape[0]):\n",
" bbox = bboxes[i]\n",
" color = colors_tableau[classes[i]]\n",
" # Draw bounding box...\n",
" p1 = (int(bbox[0] * shape[0]), int(bbox[1] * shape[1]))\n",
" p2 = (int(bbox[2] * shape[0]), int(bbox[3] * shape[1]))\n",
" cv2.rectangle(img, p1[::-1], p2[::-1], color, thickness)\n",
" # Draw text...\n",
" s = '%s/%.3f' % (classes[i], scores[i])\n",
" p1 = (p1[0]-5, p1[1])\n",
" cv2.putText(img, s, p1[::-1], cv2.FONT_HERSHEY_DUPLEX, 0.4, color, 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.accel._external.ssdvgg_utils as ssdvgg_utils\n",
"\n",
"result = client.score_file(path=\"meeting.jpg\", input_name=input_tensors, outputs=output_tensors)\n",
"classes, scores, bboxes = ssdvgg_utils.postprocess(result, select_threshold=0.5)\n",
"\n",
"img = cv2.imread('meeting.jpg', 1)\n",
"img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
"draw_boxes_on_img(img, classes, scores, bboxes)\n",
"plt.imshow(img)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"cleanup\"></a>\n",
"## 6. Cleanup\n",
"It's important to clean up your resources, so that you won't incur unnecessary costs. In the [next notebook](./accelerated-models-training.ipynb) you will learn how to train a classfier on a new dataset using transfer learning."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.delete()\n",
"aks_target.delete()\n",
"image.delete()\n",
"registered_model.delete()\n",
"converted_model.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "coverste"
},
{
"name": "paledger"
},
{
"name": "sukha"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,7 +0,0 @@
name: accelerated-models-object-detection
dependencies:
- pip:
- azureml-sdk
- azureml-accel-models[cpu]
- opencv-python
- matplotlib

View File

@@ -1,555 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/accelerated-models/accelerated-models-quickstart.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure ML Hardware Accelerated Models Quickstart"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial will show you how to deploy an image recognition service based on the ResNet 50 classifier using the Azure Machine Learning Accelerated Models service. Get more information about our service from our [documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-accelerate-with-fpgas), [API reference](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel?view=azure-ml-py), or [forum](https://aka.ms/aml-forum).\n",
"\n",
"We will use an accelerated ResNet50 featurizer running on an FPGA. Our Accelerated Models Service handles translating deep neural networks (DNN) into an FPGA program.\n",
"\n",
"For more information about using other models besides Resnet50, see the [README](./README.md).\n",
"\n",
"The steps covered in this notebook are: \n",
"1. [Set up environment](#set-up-environment)\n",
"* [Construct model](#construct-model)\n",
" * Image Preprocessing\n",
" * Featurizer (Resnet50)\n",
" * Classifier\n",
" * Save Model\n",
"* [Register Model](#register-model)\n",
"* [Convert into Accelerated Model](#convert-model)\n",
"* [Create Image](#create-image)\n",
"* [Deploy](#deploy-image)\n",
"* [Test service](#test-service)\n",
"* [Clean-up](#clean-up)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"set-up-environment\"></a>\n",
"## 1. Set up environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve Workspace\n",
"If you haven't created a Workspace, please follow [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) to do so. If you have, run the codeblock below to retrieve it. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"construct-model\"></a>\n",
"## 2. Construct model\n",
"\n",
"There are three parts to the model we are deploying: pre-processing, featurizer with ResNet50, and classifier with ImageNet dataset. Then we will save this complete Tensorflow model graph locally before registering it to your Azure ML Workspace.\n",
"\n",
"### 2.a. Image preprocessing\n",
"We'd like our service to accept JPEG images as input. However the input to ResNet50 is a tensor. So we need code that decodes JPEG images and does the preprocessing required by ResNet50. The Accelerated AI service can execute TensorFlow graphs as part of the service and we'll use that ability to do the image preprocessing. This code defines a TensorFlow graph that preprocesses an array of JPEG images (as strings) and produces a tensor that is ready to be featurized by ResNet50.\n",
"\n",
"**Note:** Expect to see TF deprecation warnings until we port our SDK over to use Tensorflow 2.0."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n",
"import azureml.accel.models.utils as utils\n",
"tf.reset_default_graph()\n",
"\n",
"in_images = tf.placeholder(tf.string)\n",
"image_tensors = utils.preprocess_array(in_images)\n",
"print(image_tensors.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.b. Featurizer\n",
"We use ResNet50 as a featurizer. In this step we initialize the model. This downloads a TensorFlow checkpoint of the quantized ResNet50."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.accel.models import QuantizedResnet50\n",
"save_path = os.path.expanduser('~/models')\n",
"model_graph = QuantizedResnet50(save_path, is_frozen = True)\n",
"feature_tensor = model_graph.import_graph_def(image_tensors)\n",
"print(model_graph.version)\n",
"print(feature_tensor.name)\n",
"print(feature_tensor.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.c. Classifier\n",
"The model we downloaded includes a classifier which takes the output of the ResNet50 and identifies an image. This classifier is trained on the ImageNet dataset. We are going to use this classifier for our service. The next [notebook](./accelerated-models-training.ipynb) shows how to train a classifier for a different data set. The input to the classifier is a tensor matching the output of our ResNet50 featurizer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"classifier_output = model_graph.get_default_classifier(feature_tensor)\n",
"print(classifier_output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.d. Save Model\n",
"Now that we loaded all three parts of the tensorflow graph (preprocessor, resnet50 featurizer, and the classifier), we can save the graph and associated variables to a directory which we can register as an Azure ML Model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# model_name must be lowercase\n",
"model_name = \"resnet50\"\n",
"model_save_path = os.path.join(save_path, model_name)\n",
"print(\"Saving model in {}\".format(model_save_path))\n",
"\n",
"with tf.Session() as sess:\n",
" model_graph.restore_weights(sess)\n",
" tf.saved_model.simple_save(sess, model_save_path,\n",
" inputs={'images': in_images},\n",
" outputs={'output_alias': classifier_output})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.e. Important! Save names of input and output tensors\n",
"\n",
"These input and output tensors that were created during the preprocessing and classifier steps are also going to be used when **converting the model** to an Accelerated Model that can run on FPGA's and for **making an inferencing request**. It is very important to save this information! You can see our defaults for all the models in the [README](./README.md).\n",
"\n",
"By default for Resnet50, these are the values you should see when running the cell below: \n",
"* input_tensors = \"Placeholder:0\"\n",
"* output_tensors = \"classifier/resnet_v1_50/predictions/Softmax:0\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"input_tensors = in_images.name\n",
"output_tensors = classifier_output.name\n",
"\n",
"print(input_tensors)\n",
"print(output_tensors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"register-model\"></a>\n",
"## 3. Register Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can add tags and descriptions to your models. Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"from azureml.core.model import Model\n",
"\n",
"registered_model = Model.register(workspace = ws,\n",
" model_path = model_save_path,\n",
" model_name = model_name)\n",
"\n",
"print(\"Successfully registered: \", registered_model.name, registered_model.description, registered_model.version, sep = '\\t')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"convert-model\"></a>\n",
"## 4. Convert Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For conversion you need to provide names of input and output tensors. This information can be found from the model_graph you saved in step 2.e. above.\n",
"\n",
"**Note**: Conversion may take a while and on average for FPGA model it is about 1-3 minutes and it depends on model type."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"from azureml.accel import AccelOnnxConverter\n",
"\n",
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
"\n",
"if convert_request.wait_for_completion(show_output = False):\n",
" # If the above call succeeded, get the converted model\n",
" converted_model = convert_request.result\n",
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
" converted_model.id, converted_model.created_time, '\\n')\n",
"else:\n",
" print(\"Model conversion failed. Showing output.\")\n",
" convert_request.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"create-image\"></a>\n",
"## 5. Package the model into an Image"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can add tags and descriptions to image. Also, for FPGA model an image can only contain **single** model.\n",
"\n",
"**Note**: The following command can take few minutes. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import Image\n",
"from azureml.accel import AccelContainerImage\n",
"\n",
"image_config = AccelContainerImage.image_configuration()\n",
"# Image name must be lowercase\n",
"image_name = \"{}-image\".format(model_name)\n",
"\n",
"image = Image.create(name = image_name,\n",
" models = [converted_model],\n",
" image_config = image_config, \n",
" workspace = ws)\n",
"image.wait_for_creation(show_output = False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"deploy-image\"></a>\n",
"## 6. Deploy\n",
"Once you have an Azure ML Accelerated Image in your Workspace, you can deploy it to two destinations, to a Databox Edge machine or to an AKS cluster. \n",
"\n",
"### 6.a. Databox Edge Machine using IoT Hub\n",
"See the sample [here](https://github.com/Azure-Samples/aml-real-time-ai/) for using the Azure IoT CLI extension for deploying your Docker image to your Databox Edge Machine.\n",
"\n",
"### 6.b. Azure Kubernetes Service (AKS) using Azure ML Service\n",
"We are going to create an AKS cluster with FPGA-enabled machines, then deploy our service to it. For more information, see [AKS official docs](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#aks).\n",
"\n",
"#### Create AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"sample-akscompute-provision"
]
},
"outputs": [],
"source": [
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Uses the specific FPGA enabled VM (sku: Standard_PB6s)\n",
"# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia\n",
"prov_config = AksCompute.provisioning_configuration(vm_size = \"Standard_PB6s\",\n",
" agent_count = 1, \n",
" location = \"eastus\")\n",
"\n",
"aks_name = 'my-aks-pb6'\n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Provisioning an AKS cluster might take awhile (15 or so minutes), and we want to wait until it's successfully provisioned before we can deploy a service to it. If you interrupt this cell, provisioning of the cluster will continue. You can also check the status in your Workspace under Compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deploy AccelContainerImage to AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"\n",
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
"# Authentication is enabled by default, but for testing we specify False\n",
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service-1'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target)\n",
"aks_service.wait_for_deployment(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-service\"></a>\n",
"## 7. Test the service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7.a. Create Client\n",
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
"\n",
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice, see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using the grpc client in AzureML Accelerated Models SDK\n",
"from azureml.accel import client_from_service\n",
"\n",
"# Initialize AzureML Accelerated Models client\n",
"client = client_from_service(aks_service)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can adapt the client [code](https://github.com/Azure/aml-real-time-ai/blob/master/pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](https://github.com/Azure/aml-real-time-ai/blob/master/sample-clients/csharp).\n",
"\n",
"The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7.b. Serve the model\n",
"To understand the results we need a mapping to the human readable imagenet classes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"classes_entries = requests.get(\"https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt\").text.splitlines()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Score image with input and output tensor names\n",
"results = client.score_file(path=\"./snowleopardgaze.jpg\", \n",
" input_name=input_tensors, \n",
" outputs=output_tensors)\n",
"\n",
"# map results [class_id] => [confidence]\n",
"results = enumerate(results)\n",
"# sort results by confidence\n",
"sorted_results = sorted(results, key=lambda x: x[1], reverse=True)\n",
"# print top 5 results\n",
"for top in sorted_results[:5]:\n",
" print(classes_entries[top[0]], 'confidence:', top[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"clean-up\"></a>\n",
"## 8. Clean-up\n",
"Run the cell below to delete your webservice, image, and model (must be done in that order). In the [next notebook](./accelerated-models-training.ipynb) you will learn how to train a classfier on a new dataset using transfer learning and finetune the weights."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.delete()\n",
"aks_target.delete()\n",
"image.delete()\n",
"registered_model.delete()\n",
"converted_model.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "coverste"
},
{
"name": "paledger"
},
{
"name": "aibhalla"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,5 +0,0 @@
name: accelerated-models-quickstart
dependencies:
- pip:
- azureml-sdk
- azureml-accel-models[cpu]

View File

@@ -1,870 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/accelerated-models/accelerated-models-training.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training with the Azure Machine Learning Accelerated Models Service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook will introduce how to apply common machine learning techniques, like transfer learning, custom weights, and unquantized vs. quantized models, when working with our Azure Machine Learning Accelerated Models Service (Azure ML Accel Models).\n",
"\n",
"We will use Tensorflow for the preprocessing steps, ResNet50 for the featurizer, and the Keras API (built on Tensorflow backend) to build the classifier layers instead of the default ImageNet classifier used in Quickstart. Then we will train the model, evaluate it, and deploy it to run on an FPGA.\n",
"\n",
"#### Transfer Learning and Custom weights\n",
"We will walk you through two ways to build and train a ResNet50 model on the Kaggle Cats and Dogs dataset: transfer learning only and then transfer learning with custom weights.\n",
"\n",
"In using transfer learning, our goal is to re-purpose the ResNet50 model already trained on the [ImageNet image dataset](http://www.image-net.org/) as a basis for our training of the Kaggle Cats and Dogs dataset. The ResNet50 featurizer will be imported as frozen, so only the Keras classifier will be trained.\n",
"\n",
"With the addition of custom weights, we will build the model so that the ResNet50 featurizer weights as not frozen. This will let us retrain starting with custom weights trained with ImageNet on ResNet50 and then use the Kaggle Cats and Dogs dataset to retrain and fine-tune the quantized version of the model.\n",
"\n",
"#### Unquantized vs. Quantized models\n",
"The unquantized version of our models (ie. Resnet50, Resnet152, Densenet121, Vgg16, SsdVgg) uses native float precision (32-bit floats), which will be faster at training. We will use this for our first run through, then fine-tune the weights with the quantized version. The quantized version of our models (i.e. QuantizedResnet50, QuantizedResnet152, QuantizedDensenet121, QuantizedVgg16, QuantizedSsdVgg) will have the same node names as the unquantized version, but use quantized operations and will match the performance of the model when running on an FPGA.\n",
"\n",
"#### Contents\n",
"1. [Setup Environment](#setup)\n",
"* [Prepare Data](#prepare-data)\n",
"* [Construct Model](#construct-model)\n",
" * Preprocessor\n",
" * Classifier\n",
" * Model construction\n",
"* [Train Model](#train-model)\n",
"* [Test Model](#test-model)\n",
"* [Execution](#execution)\n",
" * [Transfer Learning](#transfer-learning)\n",
" * [Transfer Learning with Custom Weights](#custom-weights)\n",
"* [Create Image](#create-image)\n",
"* [Deploy Image](#deploy-image)\n",
"* [Test the service](#test-service)\n",
"* [Clean-up](#cleanup)\n",
"* [Appendix](#appendix)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"setup\"></a>\n",
"## 1. Setup Environment\n",
"#### 1.a. Please set up your environment as described in the [Quickstart](./accelerated-models-quickstart.ipynb), meaning:\n",
"* Make sure your Workspace config.json exists and has the correct info\n",
"* Install Tensorflow\n",
"\n",
"#### 1.b. Download dataset into ~/catsanddogs \n",
"The dataset we will be using for training can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). Download the zip and extract to a directory named 'catsanddogs' under your user directory (\"~/catsanddogs\"). \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.c. Import packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import sys\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"from keras import backend as K\n",
"import sklearn\n",
"import tqdm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.d. Create directories for later use\n",
"After you train your model in float32, you'll write the weights to a place on disk. We also need a location to store the models that get downloaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_weights_dir = os.path.expanduser(\"~/custom-weights\")\n",
"saved_model_dir = os.path.expanduser(\"~/models\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"prepare-data\"></a>\n",
"## 2. Prepare Data\n",
"Load the files we are going to use for training and testing. By default this notebook uses only a very small subset of the Cats and Dogs dataset. That makes it run relatively quickly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import glob\n",
"import imghdr\n",
"datadir = os.path.expanduser(\"~/catsanddogs\")\n",
"\n",
"cat_files = glob.glob(os.path.join(datadir, 'PetImages', 'Cat', '*.jpg'))\n",
"dog_files = glob.glob(os.path.join(datadir, 'PetImages', 'Dog', '*.jpg'))\n",
"\n",
"# Limit the data set to make the notebook execute quickly.\n",
"cat_files = cat_files[:64]\n",
"dog_files = dog_files[:64]\n",
"\n",
"# The data set has a few images that are not jpeg. Remove them.\n",
"cat_files = [f for f in cat_files if imghdr.what(f) == 'jpeg']\n",
"dog_files = [f for f in dog_files if imghdr.what(f) == 'jpeg']\n",
"\n",
"if(not len(cat_files) or not len(dog_files)):\n",
" print(\"Please download the Kaggle Cats and Dogs dataset form https://www.microsoft.com/en-us/download/details.aspx?id=54765 and extract the zip to \" + datadir) \n",
" raise ValueError(\"Data not found\")\n",
"else:\n",
" print(cat_files[0])\n",
" print(dog_files[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Construct a numpy array as labels\n",
"image_paths = cat_files + dog_files\n",
"total_files = len(cat_files) + len(dog_files)\n",
"labels = np.zeros(total_files)\n",
"labels[len(cat_files):] = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Split images data as training data and test data\n",
"from sklearn.model_selection import train_test_split\n",
"onehot_labels = np.array([[0,1] if i else [1,0] for i in labels])\n",
"img_train, img_test, label_train, label_test = train_test_split(image_paths, onehot_labels, random_state=42, shuffle=True)\n",
"\n",
"print(len(img_train), len(img_test), label_train.shape, label_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"construct-model\"></a>\n",
"## 3. Construct Model\n",
"We will define the functions to handle creating the preprocessor and the classifier first, and then run them together to actually construct the model with the Resnet50 featurizer in a single Tensorflow session in a separate cell.\n",
"\n",
"We use ResNet50 for the featurizer and build our own classifier using Keras layers. We train the featurizer and the classifier as one model. We will provide parameters to determine whether we are using the quantized version and whether we are using custom weights in training or not."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.a. Define image preprocessing step\n",
"Same as in the Quickstart, before passing image dataset to the ResNet50 featurizer, we need to preprocess the input file to get it into the form expected by ResNet50. ResNet50 expects float tensors representing the images in BGR, channel last order. We've provided a default implementation of the preprocessing that you can use.\n",
"\n",
"**Note:** Expect to see TF deprecation warnings until we port our SDK over to use Tensorflow 2.0."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.accel.models.utils as utils\n",
"\n",
"def preprocess_images(scaling_factor=1.0):\n",
" # Convert images to 3D tensors [width,height,channel] - channels are in BGR order.\n",
" in_images = tf.placeholder(tf.string)\n",
" image_tensors = utils.preprocess_array(in_images, 'RGB', scaling_factor)\n",
" return in_images, image_tensors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.b. Define classifier\n",
"We use Keras layer APIs to construct the classifier. Because we're using the tensorflow backend, we can train this classifier in one session with our Resnet50 model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def construct_classifier(in_tensor, seed=None):\n",
" from keras.layers import Dropout, Dense, Flatten\n",
" from keras.initializers import glorot_uniform\n",
" K.set_session(tf.get_default_session())\n",
"\n",
" FC_SIZE = 1024\n",
" NUM_CLASSES = 2\n",
"\n",
" x = Dropout(0.2, input_shape=(1, 1, int(in_tensor.shape[3]),), seed=seed)(in_tensor)\n",
" x = Dense(FC_SIZE, activation='relu', input_dim=(1, 1, int(in_tensor.shape[3]),),\n",
" kernel_initializer=glorot_uniform(seed=seed), bias_initializer='zeros')(x)\n",
" x = Flatten()(x)\n",
" preds = Dense(NUM_CLASSES, activation='softmax', input_dim=FC_SIZE, name='classifier_output',\n",
" kernel_initializer=glorot_uniform(seed=seed), bias_initializer='zeros')(x)\n",
" return preds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.c. Define model construction\n",
"Now that the preprocessor and classifier for the model are defined, we can define how we want to construct the model. \n",
"\n",
"Constructing the model has these steps: \n",
"1. Get preprocessing steps\n",
"* Get featurizer using the Azure ML Accel Models SDK:\n",
" * import the graph definition\n",
" * restore the weights of the model into a Tensorflow session\n",
"* Get classifier\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def construct_model(quantized, starting_weights_directory = None):\n",
" from azureml.accel.models import Resnet50, QuantizedResnet50\n",
" \n",
" # Convert images to 3D tensors [width,height,channel]\n",
" in_images, image_tensors = preprocess_images(1.0)\n",
"\n",
" # Construct featurizer using quantized or unquantized ResNet50 model\n",
" if not quantized:\n",
" featurizer = Resnet50(saved_model_dir)\n",
" else:\n",
" featurizer = QuantizedResnet50(saved_model_dir, custom_weights_directory = starting_weights_directory)\n",
"\n",
" features = featurizer.import_graph_def(input_tensor=image_tensors)\n",
" \n",
" # Construct classifier\n",
" preds = construct_classifier(features)\n",
" \n",
" # Initialize weights\n",
" sess = tf.get_default_session()\n",
" tf.global_variables_initializer().run()\n",
"\n",
" featurizer.restore_weights(sess)\n",
"\n",
" return in_images, image_tensors, features, preds, featurizer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"train-model\"></a>\n",
"## 4. Train Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def read_files(files):\n",
" \"\"\" Read files to array\"\"\"\n",
" contents = []\n",
" for path in files:\n",
" with open(path, 'rb') as f:\n",
" contents.append(f.read())\n",
" return contents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def train_model(preds, in_images, img_train, label_train, is_retrain = False, train_epoch = 10, learning_rate=None):\n",
" \"\"\" training model \"\"\"\n",
" from keras.objectives import binary_crossentropy\n",
" from tqdm import tqdm\n",
" \n",
" learning_rate = learning_rate if learning_rate else 0.001 if is_retrain else 0.01\n",
" \n",
" # Specify the loss function\n",
" in_labels = tf.placeholder(tf.float32, shape=(None, 2)) \n",
" cross_entropy = tf.reduce_mean(binary_crossentropy(in_labels, preds))\n",
" optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)\n",
"\n",
" def chunks(a, b, n):\n",
" \"\"\"Yield successive n-sized chunks from a and b.\"\"\"\n",
" if (len(a) != len(b)):\n",
" print(\"a and b are not equal in chunks(a,b,n)\")\n",
" raise ValueError(\"Parameter error\")\n",
"\n",
" for i in range(0, len(a), n):\n",
" yield a[i:i + n], b[i:i + n]\n",
"\n",
" chunk_size = 16\n",
" chunk_num = len(label_train) / chunk_size\n",
"\n",
" sess = tf.get_default_session()\n",
" for epoch in range(train_epoch):\n",
" avg_loss = 0\n",
" for img_chunk, label_chunk in tqdm(chunks(img_train, label_train, chunk_size)):\n",
" contents = read_files(img_chunk)\n",
" _, loss = sess.run([optimizer, cross_entropy],\n",
" feed_dict={in_images: contents,\n",
" in_labels: label_chunk,\n",
" K.learning_phase(): 1})\n",
" avg_loss += loss / chunk_num\n",
" print(\"Epoch:\", (epoch + 1), \"loss = \", \"{:.3f}\".format(avg_loss))\n",
" \n",
" # Reach desired performance\n",
" if (avg_loss < 0.001):\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-model\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-model\"></a>\n",
"## 5. Test Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def test_model(preds, in_images, img_test, label_test):\n",
" \"\"\"Test the model\"\"\"\n",
" from keras.metrics import categorical_accuracy\n",
"\n",
" in_labels = tf.placeholder(tf.float32, shape=(None, 2))\n",
" accuracy = tf.reduce_mean(categorical_accuracy(in_labels, preds))\n",
" contents = read_files(img_test)\n",
"\n",
" accuracy = accuracy.eval(feed_dict={in_images: contents,\n",
" in_labels: label_test,\n",
" K.learning_phase(): 0})\n",
" return accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"execution\"></a>\n",
"## 6. Execute steps\n",
"You can run through the Transfer Learning section, then skip to Create AccelContainerImage. By default, because the custom weights section takes much longer for training twice, it is not saved as executable cells. You can copy the code or change cell type to 'Code'.\n",
"\n",
"<a id=\"transfer-learning\"></a>\n",
"### 6.a. Training using Transfer Learning"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Launch the training\n",
"tf.reset_default_graph()\n",
"sess = tf.Session(graph=tf.get_default_graph())\n",
"\n",
"with sess.as_default():\n",
" in_images, image_tensors, features, preds, featurizer = construct_model(quantized=True)\n",
" train_model(preds, in_images, img_train, label_train, is_retrain=False, train_epoch=10, learning_rate=0.01) \n",
" accuracy = test_model(preds, in_images, img_test, label_test) \n",
" print(\"Accuracy:\", accuracy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Save Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_name = 'resnet50-catsanddogs-tl'\n",
"model_save_path = os.path.join(saved_model_dir, model_name)\n",
"\n",
"tf.saved_model.simple_save(sess, model_save_path,\n",
" inputs={'images': in_images},\n",
" outputs={'output_alias': preds})\n",
"\n",
"input_tensors = in_images.name\n",
"output_tensors = preds.name\n",
"\n",
"print(input_tensors)\n",
"print(output_tensors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"custom-weights\"></a>\n",
"### 6.b. Traning using Custom Weights\n",
"\n",
"Because the quantized graph defintion and the float32 graph defintion share the same node names in the graph definitions, we can initally train the weights in float32, and then reload them with the quantized operations (which take longer) to fine-tune the model.\n",
"\n",
"First we train the model with custom weights but without quantization. Training is done with native float precision (32-bit floats). We load the training data set and batch the training with 10 epochs. When the performance reaches desired level or starts decredation, we stop the training iteration and save the weights as tensorflow checkpoint files. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Launch the training\n",
"```\n",
"tf.reset_default_graph()\n",
"sess = tf.Session(graph=tf.get_default_graph())\n",
"\n",
"with sess.as_default():\n",
" in_images, image_tensors, features, preds, featurizer = construct_model(quantized=False)\n",
" train_model(preds, in_images, img_train, label_train, is_retrain=False, train_epoch=10) \n",
" accuracy = test_model(preds, in_images, img_test, label_test) \n",
" print(\"Accuracy:\", accuracy)\n",
" featurizer.save_weights(custom_weights_dir + \"/rn50\", tf.get_default_session())\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Test Model\n",
"After training, we evaluate the trained model's accuracy on test dataset with quantization. So that we know the model's performance if it is deployed on the FPGA."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"tf.reset_default_graph()\n",
"sess = tf.Session(graph=tf.get_default_graph())\n",
"\n",
"with sess.as_default():\n",
" print(\"Testing trained model with quantization\")\n",
" in_images, image_tensors, features, preds, quantized_featurizer = construct_model(quantized=True, starting_weights_directory=custom_weights_dir)\n",
" accuracy = test_model(preds, in_images, img_test, label_test) \n",
" print(\"Accuracy:\", accuracy)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Fine-Tune Model\n",
"Sometimes, the model's accuracy can drop significantly after quantization. In those cases, we need to retrain the model enabled with quantization to get better model accuracy."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"if (accuracy < 0.93):\n",
" with sess.as_default():\n",
" print(\"Fine-tuning model with quantization\")\n",
" train_model(preds, in_images, img_train, label_train, is_retrain=True, train_epoch=10)\n",
" accuracy = test_model(preds, in_images, img_test, label_test) \n",
" print(\"Accuracy:\", accuracy)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Save Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"model_name = 'resnet50-catsanddogs-cw'\n",
"model_save_path = os.path.join(saved_model_dir, model_name)\n",
"\n",
"tf.saved_model.simple_save(sess, model_save_path,\n",
" inputs={'images': in_images},\n",
" outputs={'output_alias': preds})\n",
"\n",
"input_tensors = in_images.name\n",
"output_tensors = preds.name\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"create-image\"></a>\n",
"## 7. Create AccelContainerImage\n",
"\n",
"Below we will execute all the same steps as in the [Quickstart](./accelerated-models-quickstart.ipynb#create-image) to package the model we have saved locally into an accelerated Docker image saved in our workspace. To complete all the steps, it may take a few minutes. For more details on each step, check out the [Quickstart section on model registration](./accelerated-models-quickstart.ipynb#register-model)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"from azureml.core.model import Model\n",
"from azureml.core.image import Image\n",
"from azureml.accel import AccelOnnxConverter\n",
"from azureml.accel import AccelContainerImage\n",
"\n",
"# Retrieve workspace\n",
"ws = Workspace.from_config()\n",
"print(\"Successfully retrieved workspace:\", ws.name, ws.resource_group, ws.location, ws.subscription_id, '\\n')\n",
"\n",
"# Register model\n",
"registered_model = Model.register(workspace = ws,\n",
" model_path = model_save_path,\n",
" model_name = model_name)\n",
"print(\"Successfully registered: \", registered_model.name, registered_model.description, registered_model.version, '\\n', sep = '\\t')\n",
"\n",
"# Convert model\n",
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
"if convert_request.wait_for_completion(show_output = False):\n",
" # If the above call succeeded, get the converted model\n",
" converted_model = convert_request.result\n",
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
" converted_model.id, converted_model.created_time, '\\n')\n",
"else:\n",
" print(\"Model conversion failed. Showing output.\")\n",
" convert_request.wait_for_completion(show_output = True)\n",
"\n",
"# Package into AccelContainerImage\n",
"image_config = AccelContainerImage.image_configuration()\n",
"# Image name must be lowercase\n",
"image_name = \"{}-image\".format(model_name)\n",
"image = Image.create(name = image_name,\n",
" models = [converted_model],\n",
" image_config = image_config, \n",
" workspace = ws)\n",
"image.wait_for_creation()\n",
"print(\"Created AccelContainerImage: {} {} {}\\n\".format(image.name, image.creation_state, image.image_location))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"deploy-image\"></a>\n",
"## 8. Deploy image\n",
"Once you have an Azure ML Accelerated Image in your Workspace, you can deploy it to two destinations, to a Databox Edge machine or to an AKS cluster. \n",
"\n",
"### 8.a. Deploy to Databox Edge Machine using IoT Hub\n",
"See the sample [here](https://github.com/Azure-Samples/aml-real-time-ai/) for using the Azure IoT CLI extension for deploying your Docker image to your Databox Edge Machine.\n",
"\n",
"### 8.b. Deploy to AKS Cluster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Uses the specific FPGA enabled VM (sku: Standard_PB6s)\n",
"# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia\n",
"prov_config = AksCompute.provisioning_configuration(vm_size = \"Standard_PB6s\",\n",
" agent_count = 1,\n",
" location = \"eastus\")\n",
"\n",
"aks_name = 'aks-pb6-tl'\n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Provisioning an AKS cluster might take awhile (15 or so minutes), and we want to wait until it's successfully provisioned before we can deploy a service to it. If you interrupt this cell, provisioning of the cluster will continue. You can re-run it or check the status in your Workspace under Compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deploy AccelContainerImage to AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"sample-akswebservice-deploy-from-image"
]
},
"outputs": [],
"source": [
"%%time\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"\n",
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
"# Authentication is enabled by default, but for testing we specify False\n",
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service-2'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target)\n",
"aks_service.wait_for_deployment(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-service\"></a>\n",
"## 9. Test the service\n",
"\n",
"<a id=\"create-client\"></a>\n",
"### 9.a. Create Client\n",
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
"\n",
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using the grpc client in AzureML Accelerated Models SDK\n",
"from azureml.accel import client_from_service\n",
"\n",
"# Initialize AzureML Accelerated Models client\n",
"client = client_from_service(aks_service)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"serve-model\"></a>\n",
"### 9.b. Serve the model\n",
"Let's see how our service does on a few images. It may get a few wrong."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Specify an image to classify\n",
"print('CATS')\n",
"for image_file in cat_files[:8]:\n",
" results = client.score_file(path=image_file, \n",
" input_name=input_tensors, \n",
" outputs=output_tensors)\n",
" result = 'CORRECT ' if results[0] > results[1] else 'WRONG '\n",
" print(result + str(results))\n",
"print('DOGS')\n",
"for image_file in dog_files[:8]:\n",
" results = client.score_file(path=image_file, \n",
" input_name=input_tensors, \n",
" outputs=output_tensors)\n",
" result = 'CORRECT ' if results[1] > results[0] else 'WRONG '\n",
" print(result + str(results))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"cleanup\"></a>\n",
"## 10. Cleanup\n",
"It's important to clean up your resources, so that you won't incur unnecessary costs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.delete()\n",
"aks_target.delete()\n",
"image.delete()\n",
"registered_model.delete()\n",
"converted_model.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"appendix\"></a>\n",
"## 11. Appendix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"License for plot_confusion_matrix:\n",
"\n",
"New BSD License\n",
"\n",
"Copyright (c) 2007-2018 The scikit-learn developers.\n",
"All rights reserved.\n",
"\n",
"\n",
"Redistribution and use in source and binary forms, with or without\n",
"modification, are permitted provided that the following conditions are met:\n",
"\n",
" a. Redistributions of source code must retain the above copyright notice,\n",
" this list of conditions and the following disclaimer.\n",
" b. Redistributions in binary form must reproduce the above copyright\n",
" notice, this list of conditions and the following disclaimer in the\n",
" documentation and/or other materials provided with the distribution.\n",
" c. Neither the name of the Scikit-learn Developers nor the names of\n",
" its contributors may be used to endorse or promote products\n",
" derived from this software without specific prior written\n",
" permission. \n",
"\n",
"\n",
"THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n",
"AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n",
"IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n",
"ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR\n",
"ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n",
"DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n",
"SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n",
"CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\n",
"LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY\n",
"OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH\n",
"DAMAGE.\n"
]
}
],
"metadata": {
"authors": [
{
"name": "coverste"
},
{
"name": "paledger"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,8 +0,0 @@
name: accelerated-models-training
dependencies:
- pip:
- azureml-sdk
- azureml-accel-models[cpu]
- keras
- tqdm
- sklearn

Binary file not shown.

Before

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 79 KiB

View File

@@ -285,7 +285,7 @@
"from azureml.exceptions import WebserviceException\n", "from azureml.exceptions import WebserviceException\n",
"\n", "\n",
"deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n", "deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n",
"aci_service_name = 'aciservice1'\n", "aci_service_name = 'aciservice-multimodel'\n",
"\n", "\n",
"try:\n", "try:\n",
" # if you want to get existing service below is the command\n", " # if you want to get existing service below is the command\n",

View File

@@ -383,11 +383,21 @@
"- an inference configuration\n", "- an inference configuration\n",
"- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n", "- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n",
"\n", "\n",
"Please, note that profiling is a long running operation and can take up to 25 minutes depending on the size of the dataset.\n",
"\n",
"At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n", "At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n",
"\n", "\n",
"Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent." "Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may want to register datasets using the register() method to your workspace so they can be shared with others, reused and referred to by name in your script.\n",
"You can try get the dataset first to see if it's already registered."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -398,36 +408,45 @@
"from azureml.core.dataset import Dataset\n", "from azureml.core.dataset import Dataset\n",
"from azureml.data import dataset_type_definitions\n", "from azureml.data import dataset_type_definitions\n",
"\n", "\n",
"dataset_name='diabetes_sample_request_data'\n",
"\n", "\n",
"# create a string that can be utf-8 encoded and\n", "dataset_registered = False\n",
"# put in the body of the request\n", "try:\n",
"serialized_input_json = json.dumps({\n", " sample_request_data = Dataset.get_by_name(workspace = ws, name = dataset_name)\n",
" 'data': [\n", " dataset_registered = True\n",
" [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n", "except:\n",
" -0.03482076, -0.04340085, -0.00259226, 0.01990842, -0.01764613]\n", " print(\"The dataset {} is not registered in workspace yet.\".format(dataset_name))\n",
" ]\n",
"})\n",
"dataset_content = []\n",
"for i in range(100):\n",
" dataset_content.append(serialized_input_json)\n",
"dataset_content = '\\n'.join(dataset_content)\n",
"file_name = 'sample_request_data.txt'\n",
"f = open(file_name, 'w')\n",
"f.write(dataset_content)\n",
"f.close()\n",
"\n", "\n",
"# upload the txt file created above to the Datastore and create a dataset from it\n", "if not dataset_registered:\n",
"data_store = Datastore.get_default(ws)\n", " # create a string that can be utf-8 encoded and\n",
"data_store.upload_files(['./' + file_name], target_path='sample_request_data')\n", " # put in the body of the request\n",
"datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]\n", " serialized_input_json = json.dumps({\n",
"sample_request_data = Dataset.Tabular.from_delimited_files(\n", " 'data': [\n",
" datastore_path,\n", " [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n",
" separator='\\n',\n", " -0.03482076, -0.04340085, -0.00259226, 0.01990842, -0.01764613]\n",
" infer_column_types=True,\n", " ]\n",
" header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n", " })\n",
"sample_request_data = sample_request_data.register(workspace=ws,\n", " dataset_content = []\n",
" name='diabetes_sample_request_data',\n", " for i in range(100):\n",
" create_new_version=True)" " dataset_content.append(serialized_input_json)\n",
" dataset_content = '\\n'.join(dataset_content)\n",
" file_name = \"{}.txt\".format(dataset_name)\n",
" f = open(file_name, 'w')\n",
" f.write(dataset_content)\n",
" f.close()\n",
"\n",
" # upload the txt file created above to the Datastore and create a dataset from it\n",
" data_store = Datastore.get_default(ws)\n",
" data_store.upload_files(['./' + file_name], target_path='sample_request_data')\n",
" datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]\n",
" sample_request_data = Dataset.Tabular.from_delimited_files(\n",
" datastore_path,\n",
" separator='\\n',\n",
" infer_column_types=True,\n",
" header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n",
" sample_request_data = sample_request_data.register(workspace=ws,\n",
" name=dataset_name,\n",
" create_new_version=True)"
] ]
}, },
{ {
@@ -466,6 +485,7 @@
" cpu=1.0,\n", " cpu=1.0,\n",
" memory_in_gb=0.5)\n", " memory_in_gb=0.5)\n",
"\n", "\n",
"# profiling is a long running operation and may take up to 25 min\n",
"profile.wait_for_completion(True)\n", "profile.wait_for_completion(True)\n",
"details = profile.get_details()" "details = profile.get_details()"
] ]
@@ -512,7 +532,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "aashishb" "name": "vaidyas"
} }
], ],
"category": "deployment", "category": "deployment",

View File

@@ -3,6 +3,6 @@ dependencies:
- python=3.6.2 - python=3.6.2
- pip: - pip:
- azureml-defaults - azureml-defaults
- scikit-learn - scikit-learn==0.19.1
- numpy - numpy
- inference-schema[numpy-support] - inference-schema[numpy-support]

View File

@@ -86,7 +86,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the name `sklearn_regression_model_local_adv` in the workspace.\n", "You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the name `sklearn_regression_model` in the workspace.\n",
"\n", "\n",
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric." "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric."
] ]
@@ -105,7 +105,7 @@
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"\n", "\n",
"model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n", "model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n",
" model_name=\"sklearn_regression_model_local_adv\",\n", " model_name=\"sklearn_regression_model\",\n",
" tags={'area': \"diabetes\", 'type': \"regression\"},\n", " tags={'area': \"diabetes\", 'type': \"regression\"},\n",
" description=\"Ridge regression model to predict diabetes\",\n", " description=\"Ridge regression model to predict diabetes\",\n",
" workspace=ws)" " workspace=ws)"
@@ -126,12 +126,12 @@
"source": [ "source": [
"import os\n", "import os\n",
"\n", "\n",
"source_directory = \"C:/abc\"\n", "source_directory = \"source_directory\"\n",
"\n", "\n",
"os.makedirs(source_directory, exist_ok=True)\n", "os.makedirs(source_directory, exist_ok=True)\n",
"os.makedirs(\"C:/abc/x/y\", exist_ok=True)\n", "os.makedirs(os.path.join(source_directory, \"x/y\"), exist_ok=True)\n",
"os.makedirs(\"C:/abc/env\", exist_ok=True)\n", "os.makedirs(os.path.join(source_directory, \"env\"), exist_ok=True)\n",
"os.makedirs(\"C:/abc/dockerstep\", exist_ok=True)" "os.makedirs(os.path.join(source_directory, \"dockerstep\"), exist_ok=True)"
] ]
}, },
{ {
@@ -147,7 +147,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"%%writefile C:/abc/x/y/score.py\n", "%%writefile source_directory/x/y/score.py\n",
"import os\n", "import os\n",
"import pickle\n", "import pickle\n",
"import json\n", "import json\n",
@@ -170,7 +170,7 @@
" global name\n", " global name\n",
" # note here, entire source directory on inference config gets added into image\n", " # note here, entire source directory on inference config gets added into image\n",
" # bellow is the example how you can use any extra files in image\n", " # bellow is the example how you can use any extra files in image\n",
" with open('./abc/extradata.json') as json_file: \n", " with open('./source_directory/extradata.json') as json_file:\n",
" data = json.load(json_file)\n", " data = json.load(json_file)\n",
" name = data[\"people\"][0][\"name\"]\n", " name = data[\"people\"][0][\"name\"]\n",
"\n", "\n",
@@ -191,9 +191,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [],
"source": [ "source": [
"Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency for your environemnt. This package contains the functionality needed to host the model as a web service." "Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency for your environemnt. This package contains the functionality needed to host the model as a web service."
] ]
@@ -204,7 +202,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"%%writefile C:/abc/env/myenv.yml\n", "%%writefile source_directory/env/myenv.yml\n",
"name: project_environment\n", "name: project_environment\n",
"dependencies:\n", "dependencies:\n",
" - python=3.6.2\n", " - python=3.6.2\n",
@@ -221,7 +219,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"%%writefile C:/abc/extradata.json\n", "%%writefile source_directory/extradata.json\n",
"{\n", "{\n",
" \"people\": [\n", " \"people\": [\n",
" {\n", " {\n",
@@ -255,13 +253,14 @@
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"\n", "\n",
"myenv = Environment.from_conda_specification(name='myenv', file_path='env/myenv.yml')\n", "myenv = Environment.from_conda_specification(name='myenv', file_path='myenv.yml')\n",
"\n", "\n",
"# explicitly set base_image to None when setting base_dockerfile\n", "# explicitly set base_image to None when setting base_dockerfile\n",
"myenv.docker.base_image = None\n", "myenv.docker.base_image = None\n",
"myenv.docker.base_dockerfile = \"RUN echo \\\"this is test\\\"\"\n", "myenv.docker.base_dockerfile = \"FROM mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04\\nRUN echo \\\"this is test\\\"\"\n",
"myenv.inferencing_stack_version = \"latest\"\n",
"\n", "\n",
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n", "inference_config = InferenceConfig(source_directory=source_directory,\n",
" entry_script=\"x/y/score.py\",\n", " entry_script=\"x/y/score.py\",\n",
" environment=myenv)\n" " environment=myenv)\n"
] ]
@@ -379,7 +378,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"%%writefile C:/abc/x/y/score.py\n", "%%writefile source_directory/x/y/score.py\n",
"import os\n", "import os\n",
"import pickle\n", "import pickle\n",
"import json\n", "import json\n",
@@ -401,7 +400,7 @@
" global name, from_location\n", " global name, from_location\n",
" # note here, entire source directory on inference config gets added into image\n", " # note here, entire source directory on inference config gets added into image\n",
" # bellow is the example how you can use any extra files in image\n", " # bellow is the example how you can use any extra files in image\n",
" with open('./abc/extradata.json') as json_file: \n", " with open('source_directory/extradata.json') as json_file: \n",
" data = json.load(json_file)\n", " data = json.load(json_file)\n",
" name = data[\"people\"][0][\"name\"]\n", " name = data[\"people\"][0][\"name\"]\n",
" from_location = data[\"people\"][0][\"from\"]\n", " from_location = data[\"people\"][0][\"from\"]\n",

View File

@@ -82,7 +82,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the name `sklearn_regression_model_local` in the workspace.\n", "You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the name `sklearn_regression_model` in the workspace.\n",
"\n", "\n",
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric." "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric."
] ]
@@ -100,7 +100,7 @@
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"\n", "\n",
"model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n", "model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n",
" model_name=\"sklearn_regression_model_local\",\n", " model_name=\"sklearn_regression_model\",\n",
" tags={'area': \"diabetes\", 'type': \"regression\"},\n", " tags={'area': \"diabetes\", 'type': \"regression\"},\n",
" description=\"Ridge regression model to predict diabetes\",\n", " description=\"Ridge regression model to predict diabetes\",\n",
" workspace=ws)" " workspace=ws)"
@@ -159,6 +159,8 @@
"- an inference configuration\n", "- an inference configuration\n",
"- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n", "- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n",
"\n", "\n",
"Please, note that profiling is a long running operation and can take up to 25 minutes depending on the size of the dataset.\n",
"\n",
"At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n", "At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n",
"\n", "\n",
"Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent." "Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent."
@@ -231,7 +233,8 @@
" 'inference-schema[numpy-support]',\n", " 'inference-schema[numpy-support]',\n",
" 'joblib',\n", " 'joblib',\n",
" 'numpy',\n", " 'numpy',\n",
" 'scikit-learn'\n", " 'scikit-learn==0.19.1',\n",
" 'scipy'\n",
"])\n", "])\n",
"inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n", "inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n",
"# if cpu and memory_in_gb parameters are not provided\n", "# if cpu and memory_in_gb parameters are not provided\n",
@@ -245,6 +248,7 @@
" cpu=1.0,\n", " cpu=1.0,\n",
" memory_in_gb=0.5)\n", " memory_in_gb=0.5)\n",
"\n", "\n",
"# profiling is a long running operation and may take up to 25 min\n",
"profile.wait_for_completion(True)\n", "profile.wait_for_completion(True)\n",
"details = profile.get_details()" "details = profile.get_details()"
] ]

View File

@@ -108,9 +108,9 @@
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n", "environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
" 'azureml-defaults',\n", " 'azureml-defaults',\n",
" 'inference-schema[numpy-support]',\n", " 'inference-schema[numpy-support]',\n",
" 'joblib',\n",
" 'numpy',\n", " 'numpy',\n",
" 'scikit-learn'\n", " 'scikit-learn==0.19.1',\n",
" 'scipy'\n",
"])" "])"
] ]
}, },

View File

@@ -5,7 +5,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Enabling App Insights for Services in Production\n", "# Enabling App Insights for Services in Production\n",
"With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model. \n", "With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model.\n",
"\n", "\n",
"\n", "\n",
"## What does Application Insights monitor?\n", "## What does Application Insights monitor?\n",
@@ -45,11 +45,13 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import azureml.core\n",
"import json\n",
"\n",
"from azureml.core import Workspace\n", "from azureml.core import Workspace\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n", "from azureml.core.compute import AksCompute, ComputeTarget\n",
"from azureml.core.webservice import AksWebservice\n", "from azureml.core.webservice import AksWebservice\n",
"import azureml.core\n", "\n",
"import json\n",
"print(azureml.core.VERSION)" "print(azureml.core.VERSION)"
] ]
}, },
@@ -67,7 +69,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
] ]
}, },
{ {
@@ -84,13 +86,13 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#Register the model\n", "from azureml.core import Model\n",
"from azureml.core.model import Model\n", "\n",
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n", "model = Model.register(model_path=\"sklearn_regression_model.pkl\", # This points to a local file.\n",
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n", " model_name=\"sklearn_regression_model.pkl\", # This is the name the model is registered as.\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n", " tags={'area': \"diabetes\", 'type': \"regression\"},\n",
" description = \"Ridge regression model to predict diabetes\",\n", " description=\"Ridge regression model to predict diabetes\",\n",
" workspace = ws)\n", " workspace=ws)\n",
"\n", "\n",
"print(model.name, model.description, model.version)" "print(model.name, model.description, model.version)"
] ]
@@ -120,7 +122,7 @@
"import os\n", "import os\n",
"import pickle\n", "import pickle\n",
"import json\n", "import json\n",
"import numpy \n", "import numpy\n",
"from sklearn.externals import joblib\n", "from sklearn.externals import joblib\n",
"from sklearn.linear_model import Ridge\n", "from sklearn.linear_model import Ridge\n",
"import time\n", "import time\n",
@@ -129,15 +131,15 @@
" global model\n", " global model\n",
" #Print statement for appinsights custom traces:\n", " #Print statement for appinsights custom traces:\n",
" print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n", " print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n",
" \n", "\n",
" # AZUREML_MODEL_DIR is an environment variable created during deployment.\n", " # AZUREML_MODEL_DIR is an environment variable created during deployment.\n",
" # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n", " # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n",
" # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n", " # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n",
" model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_regression_model.pkl')\n", " model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_regression_model.pkl')\n",
" \n", "\n",
" # deserialize the model file back into a sklearn model\n", " # deserialize the model file back into a sklearn model\n",
" model = joblib.load(model_path)\n", " model = joblib.load(model_path)\n",
" \n", "\n",
"\n", "\n",
"# note you can pass in multiple rows for scoring\n", "# note you can pass in multiple rows for scoring\n",
"def run(raw_data):\n", "def run(raw_data):\n",
@@ -168,7 +170,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies\n",
"\n", "\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'],\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'],\n",
" pip_packages=['azureml-defaults'])\n", " pip_packages=['azureml-defaults'])\n",
@@ -190,9 +192,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n", "from azureml.core.environment import Environment\n",
"\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)" "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
@@ -213,11 +214,11 @@
"source": [ "source": [
"from azureml.core.webservice import AciWebservice\n", "from azureml.core.webservice import AciWebservice\n",
"\n", "\n",
"aci_deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, \n", "aci_deployment_config = AciWebservice.deploy_configuration(cpu_cores=1,\n",
" memory_gb = 1, \n", " memory_gb=1,\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", " tags={'area': \"diabetes\", 'type': \"regression\"},\n",
" description = 'Predict diabetes using regression model',\n", " description=\"Predict diabetes using regression model\",\n",
" enable_app_insights = True)" " enable_app_insights=True)"
] ]
}, },
{ {
@@ -226,29 +227,14 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.webservice import Webservice\n", "aci_service_name = \"aci-service-appinsights\"\n",
"\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aci_deployment_config, overwrite=True)\n",
"aci_service.wait_for_deployment(show_output=True)\n",
"\n", "\n",
"aci_service_name = 'my-aci-service-4'\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aci_deployment_config)\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)" "print(aci_service.state)"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"test_sample = json.dumps({'data': [\n",
" [1,28,13,45,54,6,57,8,8,10], \n",
" [101,9,8,37,6,45,4,3,2,41]\n",
"]})\n",
"test_sample = bytes(test_sample,encoding='utf8')"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -256,7 +242,15 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"if aci_service.state == \"Healthy\":\n", "if aci_service.state == \"Healthy\":\n",
" prediction = aci_service.run(input_data=test_sample)\n", " test_sample = json.dumps({\n",
" \"data\": [\n",
" [1,28,13,45,54,6,57,8,8,10],\n",
" [101,9,8,37,6,45,4,3,2,41]\n",
" ]\n",
" })\n",
"\n",
" prediction = aci_service.run(test_sample)\n",
"\n",
" print(prediction)\n", " print(prediction)\n",
"else:\n", "else:\n",
" raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aci_service.error)" " raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aci_service.error)"
@@ -282,14 +276,21 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Use the default configuration (can also provide parameters to customize)\n", "from azureml.exceptions import ComputeTargetException\n",
"prov_config = AksCompute.provisioning_configuration()\n",
"\n", "\n",
"aks_name = 'my-aks-test3' \n", "aks_name = \"my-aks\"\n",
"# Create the cluster\n", "\n",
"aks_target = ComputeTarget.create(workspace = ws, \n", "try:\n",
" name = aks_name, \n", " aks_target = ComputeTarget(ws, aks_name)\n",
" provisioning_configuration = prov_config)" " print(\"Using existing AKS cluster {}.\".format(aks_name))\n",
"except ComputeTargetException:\n",
" print(\"Creating a new AKS cluster {}.\".format(aks_name))\n",
"\n",
" # Use the default configuration (can also provide parameters to customize).\n",
" prov_config = AksCompute.provisioning_configuration()\n",
" aks_target = ComputeTarget.create(workspace=ws,\n",
" name=aks_name,\n",
" provisioning_configuration=prov_config)"
] ]
}, },
{ {
@@ -299,7 +300,8 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"%%time\n", "%%time\n",
"aks_target.wait_for_completion(show_output = True)" "if aks_target.provisioning_state != \"Succeeded\":\n",
" aks_target.wait_for_completion(show_output=True)"
] ]
}, },
{ {
@@ -323,13 +325,13 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"```python \n", "```python\n",
"%%time\n", "%%time\n",
"resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n", "resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n",
"create_name= 'myaks4'\n", "create_name= 'myaks4'\n",
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", "attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
"aks_target = ComputeTarget.attach(workspace = ws, \n", "aks_target = ComputeTarget.attach(workspace=ws,\n",
" name = create_name, \n", " name=create_name,\n",
" attach_configuration=attach_config)\n", " attach_configuration=attach_config)\n",
"## Wait for the operation to complete\n", "## Wait for the operation to complete\n",
"aks_target.wait_for_provisioning(True)```" "aks_target.wait_for_provisioning(True)```"
@@ -349,7 +351,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#Set the web service configuration\n", "# Set the web service configuration.\n",
"aks_deployment_config = AksWebservice.deploy_configuration(enable_app_insights=True)" "aks_deployment_config = AksWebservice.deploy_configuration(enable_app_insights=True)"
] ]
}, },
@@ -366,15 +368,16 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"if aks_target.provisioning_state== \"Succeeded\": \n", "if aks_target.provisioning_state == \"Succeeded\":\n",
" aks_service_name ='aks-w-dc5'\n", " aks_service_name = \"aks-service-appinsights\"\n",
" aks_service = Model.deploy(ws,\n", " aks_service = Model.deploy(ws,\n",
" aks_service_name, \n", " aks_service_name,\n",
" [model], \n", " [model],\n",
" inference_config, \n", " inference_config,\n",
" aks_deployment_config, \n", " aks_deployment_config,\n",
" deployment_target = aks_target) \n", " deployment_target=aks_target,\n",
" aks_service.wait_for_deployment(show_output = True)\n", " overwrite=True)\n",
" aks_service.wait_for_deployment(show_output=True)\n",
" print(aks_service.state)\n", " print(aks_service.state)\n",
"else:\n", "else:\n",
" raise ValueError(\"AKS provisioning failed. Error: \", aks_service.error)" " raise ValueError(\"AKS provisioning failed. Error: \", aks_service.error)"
@@ -395,13 +398,14 @@
"source": [ "source": [
"%%time\n", "%%time\n",
"\n", "\n",
"test_sample = json.dumps({'data': [\n",
" [1,28,13,45,54,6,57,8,8,10], \n",
" [101,9,8,37,6,45,4,3,2,41]\n",
"]})\n",
"test_sample = bytes(test_sample,encoding='utf8')\n",
"\n",
"if aks_service.state == \"Healthy\":\n", "if aks_service.state == \"Healthy\":\n",
" test_sample = json.dumps({\n",
" \"data\": [\n",
" [1,28,13,45,54,6,57,8,8,10],\n",
" [101,9,8,37,6,45,4,3,2,41]\n",
" ]\n",
" })\n",
"\n",
" prediction = aks_service.run(input_data=test_sample)\n", " prediction = aks_service.run(input_data=test_sample)\n",
" print(prediction)\n", " print(prediction)\n",
"else:\n", "else:\n",
@@ -435,7 +439,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"aks_service.update(enable_app_insights=False)\n", "aks_service.update(enable_app_insights=False)\n",
"aks_service.wait_for_deployment(show_output = True)" "aks_service.wait_for_deployment(show_output=True)"
] ]
}, },
{ {

View File

@@ -115,6 +115,11 @@
"# Convert from CoreML into ONNX\n", "# Convert from CoreML into ONNX\n",
"onnx_model = onnxmltools.convert_coreml(coreml_model, 'TinyYOLOv2')\n", "onnx_model = onnxmltools.convert_coreml(coreml_model, 'TinyYOLOv2')\n",
"\n", "\n",
"# Fix the preprocessor bias in the ImageScaler\n",
"for init in onnx_model.graph.initializer:\n",
" if init.name == 'scalerPreprocessor_bias':\n",
" init.dims[1] = 1\n",
"\n",
"# Save ONNX model\n", "# Save ONNX model\n",
"onnxmltools.utils.save_model(onnx_model, 'tinyyolov2.onnx')\n", "onnxmltools.utils.save_model(onnx_model, 'tinyyolov2.onnx')\n",
"\n", "\n",
@@ -255,7 +260,7 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime==0.4.0\", \"azureml-core\", \"azureml-defaults\"])\n", "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())" " f.write(myenv.serialize_to_string())"
@@ -316,7 +321,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"aci_service_name = 'my-aci-service-15ad'\n", "aci_service_name = 'my-aci-service-tiny-yolo'\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service.wait_for_deployment(True)\n", "aci_service.wait_for_deployment(True)\n",

View File

@@ -4,4 +4,5 @@ dependencies:
- azureml-sdk - azureml-sdk
- numpy - numpy
- git+https://github.com/apple/coremltools@v2.1 - git+https://github.com/apple/coremltools@v2.1
- onnxmltools==1.3.1 - onnx<1.7.0
- onnxmltools

View File

@@ -5,5 +5,5 @@ dependencies:
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- numpy - numpy
- onnx - onnx<1.7.0
- opencv-python - opencv-python-headless

View File

@@ -5,5 +5,5 @@ dependencies:
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- numpy - numpy
- onnx - onnx<1.7.0
- opencv-python - opencv-python-headless

View File

@@ -202,7 +202,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "aashishb" "name": "vaidyas"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -59,8 +59,44 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Register the model\n", "# Download the model\n",
"Register an existing trained model, add descirption and tags. Prior to registering the model, you should have a TensorFlow [Saved Model](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md) in the `resnet50` directory. You can download a [pretrained resnet50](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NCHW_jpg.tar.gz) and unpack it to that directory." "\n",
"Prior to registering the model, you should have a TensorFlow [Saved Model](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md) in the `resnet50` directory. This cell will download a [pretrained resnet50](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NCHW_jpg.tar.gz) and unpack it to that directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"import shutil\n",
"import tarfile\n",
"import tempfile\n",
"\n",
"from io import BytesIO\n",
"\n",
"model_url = \"http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NCHW_jpg.tar.gz\"\n",
"\n",
"archive_prefix = \"./resnet_v1_fp32_savedmodel_NCHW_jpg/1538686758/\"\n",
"target_folder = \"resnet50\"\n",
"\n",
"if not os.path.exists(target_folder):\n",
" response = requests.get(model_url)\n",
" archive = tarfile.open(fileobj=BytesIO(response.content))\n",
" with tempfile.TemporaryDirectory() as temp_folder:\n",
" archive.extractall(temp_folder)\n",
" shutil.copytree(os.path.join(temp_folder, archive_prefix), target_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Register the model\n",
"Register an existing trained model, add description and tags."
] ]
}, },
{ {
@@ -69,13 +105,13 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#Register the model\n",
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"model = Model.register(model_path = \"resnet50\", # this points to a local file\n", "\n",
" model_name = \"resnet50\", # this is the name the model is registered as\n", "model = Model.register(model_path=\"resnet50\", # This points to the local directory to upload.\n",
" tags = {'area': \"Image classification\", 'type': \"classification\"},\n", " model_name=\"resnet50\", # This is the name the model is registered as.\n",
" description = \"Image classification trained on Imagenet Dataset\",\n", " tags={'area': \"Image classification\", 'type': \"classification\"},\n",
" workspace = ws)\n", " description=\"Image classification trained on Imagenet Dataset\",\n",
" workspace=ws)\n",
"\n", "\n",
"print(model.name, model.description, model.version)" "print(model.name, model.description, model.version)"
] ]
@@ -288,7 +324,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "aashishb" "name": "vaidyas"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -0,0 +1,354 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploying a web service to Azure Kubernetes Service (AKS)\n",
"This notebook shows the steps for deploying a service: registering a model, provisioning a cluster with ssl (one time action), and deploying a service to it. \n",
"We then test and delete the service, image and model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"from azureml.core.model import Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"print(azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get workspace\n",
"Load existing workspace from the config file info."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.workspace import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Register the model\n",
"Register an existing trained model, add descirption and tags."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Register the model\n",
"from azureml.core.model import Model\n",
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
" model_name = \"sklearn_model\", # this is the name the model is registered as\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
" description = \"Ridge regression model to predict diabetes\",\n",
" workspace = ws)\n",
"\n",
"print(model.name, model.description, model.version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create the Environment\n",
"Create an environment that the model will be deployed with"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"conda_deps = CondaDependencies.create(conda_packages=['numpy', 'scikit-learn==0.19.1', 'scipy'], pip_packages=['azureml-defaults', 'inference-schema'])\n",
"myenv = Environment(name='myenv')\n",
"myenv.python.conda_dependencies = conda_deps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use a custom Docker image\n",
"\n",
"You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n",
"\n",
"Only supported with `python` runtime.\n",
"```python\n",
"# use an image available in public Container Registry without authentication\n",
"myenv.docker.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n",
"\n",
"# or, use an image available in a private Container Registry\n",
"myenv.docker.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n",
"myenv.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n",
"myenv.docker.base_image_registry.username = \"username\"\n",
"myenv.docker.base_image_registry.password = \"password\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Write the Entry Script\n",
"Write the script that will be used to predict on your model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import os\n",
"import pickle\n",
"import json\n",
"import numpy\n",
"from sklearn.externals import joblib\n",
"from sklearn.linear_model import Ridge\n",
"from inference_schema.schema_decorators import input_schema, output_schema\n",
"from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType\n",
"\n",
"def init():\n",
" global model\n",
" # AZUREML_MODEL_DIR is an environment variable created during deployment.\n",
" # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n",
" # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n",
" model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_regression_model.pkl')\n",
" # deserialize the model file back into a sklearn model\n",
" model = joblib.load(model_path)\n",
"\n",
"\n",
"standard_sample_input = {'a': 10, 'b': 9, 'c': 8, 'd': 7, 'e': 6, 'f': 5, 'g': 4, 'h': 3, 'i': 2, 'j': 1 }\n",
"standard_sample_output = {'outcome': 1}\n",
"\n",
"@input_schema('param', StandardPythonParameterType(standard_sample_input))\n",
"@output_schema(StandardPythonParameterType(standard_sample_output))\n",
"def run(param):\n",
" try:\n",
" raw_data = [param['a'], param['b'], param['c'], param['d'], param['e'], param['f'], param['g'], param['h'], param['i'], param['j']]\n",
" data = numpy.array([raw_data])\n",
" result = model.predict(data)\n",
" return { 'outcome' : result[0] }\n",
" except Exception as e:\n",
" error = str(e)\n",
" return error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create the InferenceConfig\n",
"Create the inference config that will be used when deploying the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"\n",
"inf_config = InferenceConfig(entry_script='score.py', environment=myenv)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Provision the AKS Cluster with SSL\n",
"This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it.\n",
"\n",
"See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-secure-web-service) for more details"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use the default configuration (can also provide parameters to customize)\n",
"\n",
"provisioning_config = AksCompute.provisioning_configuration()\n",
"# Leaf domain label generates a name using the formula\n",
"# \"<leaf-domain-label>######.<azure-region>.cloudapp.azure.net\"\n",
"# where \"######\" is a random series of characters\n",
"provisioning_config.enable_ssl(leaf_domain_label = \"contoso\")\n",
"\n",
"aks_name = 'my-aks-ssl-1' \n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = provisioning_config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy web service to AKS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"sample-deploy-to-aks"
]
},
"outputs": [],
"source": [
"%%time\n",
"\n",
"aks_config = AksWebservice.deploy_configuration()\n",
"\n",
"aks_service_name ='aks-service-ssl-1'\n",
"\n",
"aks_service = Model.deploy(workspace=ws,\n",
" name=aks_service_name,\n",
" models=[model],\n",
" inference_config=inf_config,\n",
" deployment_config=aks_config,\n",
" deployment_target=aks_target,\n",
" overwrite=True)\n",
"\n",
"aks_service.wait_for_deployment(show_output = True)\n",
"print(aks_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Test the web service using run method\n",
"We test the web sevice by passing data.\n",
"Run() method retrieves API keys behind the scenes to make sure that call is authenticated."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"import json\n",
"\n",
"standard_sample_input = json.dumps({'param': {'a': 10, 'b': 9, 'c': 8, 'd': 7, 'e': 6, 'f': 5, 'g': 4, 'h': 3, 'i': 2, 'j': 1 }})\n",
"\n",
"aks_service.run(input_data=standard_sample_input)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clean up\n",
"Delete the service, image and model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_service.delete()\n",
"model.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "vaidyas"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,8 @@
name: production-deploy-to-aks-ssl
dependencies:
- pip:
- azureml-sdk
- matplotlib
- tqdm
- scipy
- sklearn

View File

@@ -109,7 +109,7 @@
"from azureml.core import Environment\n", "from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"conda_deps = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults'])\n", "conda_deps = CondaDependencies.create(conda_packages=['numpy','scikit-learn==0.19.1','scipy'], pip_packages=['azureml-defaults'])\n",
"myenv = Environment(name='myenv')\n", "myenv = Environment(name='myenv')\n",
"myenv.python.conda_dependencies = conda_deps" "myenv.python.conda_dependencies = conda_deps"
] ]
@@ -212,11 +212,21 @@
"- an inference configuration\n", "- an inference configuration\n",
"- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n", "- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n",
"\n", "\n",
"Please, note that profiling is a long running operation and can take up to 25 minutes depending on the size of the dataset.\n",
"\n",
"At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n", "At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n",
"\n", "\n",
"Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent." "Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may want to register datasets using the register() method to your workspace so they can be shared with others, reused and referred to by name in your script.\n",
"You can try get the dataset first to see if it's already registered."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -228,31 +238,41 @@
"from azureml.core.dataset import Dataset\n", "from azureml.core.dataset import Dataset\n",
"from azureml.data import dataset_type_definitions\n", "from azureml.data import dataset_type_definitions\n",
"\n", "\n",
"input_json = {'data': [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n", "dataset_name='sample_request_data'\n",
" [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]}\n",
"# create a string that can be put in the body of the request\n",
"serialized_input_json = json.dumps(input_json)\n",
"dataset_content = []\n",
"for i in range(100):\n",
" dataset_content.append(serialized_input_json)\n",
"sample_request_data = '\\n'.join(dataset_content)\n",
"file_name = 'sample_request_data.txt'\n",
"f = open(file_name, 'w')\n",
"f.write(sample_request_data)\n",
"f.close()\n",
"\n", "\n",
"# upload the txt file created above to the Datastore and create a dataset from it\n", "dataset_registered = False\n",
"data_store = Datastore.get_default(ws)\n", "try:\n",
"data_store.upload_files(['./' + file_name], target_path='sample_request_data')\n", " sample_request_data = Dataset.get_by_name(workspace = ws, name = dataset_name)\n",
"datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]\n", " dataset_registered = True\n",
"sample_request_data = Dataset.Tabular.from_delimited_files(\n", "except:\n",
" datastore_path,\n", " print(\"The dataset {} is not registered in workspace yet.\".format(dataset_name))\n",
" separator='\\n',\n", "\n",
" infer_column_types=True,\n", "if not dataset_registered:\n",
" header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n", " input_json = {'data': [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n",
"sample_request_data = sample_request_data.register(workspace=ws,\n", " [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]}\n",
" name='sample_request_data',\n", " # create a string that can be put in the body of the request\n",
" create_new_version=True)" " serialized_input_json = json.dumps(input_json)\n",
" dataset_content = []\n",
" for i in range(100):\n",
" dataset_content.append(serialized_input_json)\n",
" sample_request_data = '\\n'.join(dataset_content)\n",
" file_name = \"{}.txt\".format(dataset_name)\n",
" f = open(file_name, 'w')\n",
" f.write(sample_request_data)\n",
" f.close()\n",
"\n",
" # upload the txt file created above to the Datastore and create a dataset from it\n",
" data_store = Datastore.get_default(ws)\n",
" data_store.upload_files(['./' + file_name], target_path='sample_request_data')\n",
" datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]\n",
" sample_request_data = Dataset.Tabular.from_delimited_files(\n",
" datastore_path,\n",
" separator='\\n',\n",
" infer_column_types=True,\n",
" header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n",
" sample_request_data = sample_request_data.register(workspace=ws,\n",
" name=dataset_name,\n",
" create_new_version=True)"
] ]
}, },
{ {
@@ -280,7 +300,8 @@
" 'inference-schema[numpy-support]',\n", " 'inference-schema[numpy-support]',\n",
" 'joblib',\n", " 'joblib',\n",
" 'numpy',\n", " 'numpy',\n",
" 'scikit-learn'\n", " 'scikit-learn==0.19.1',\n",
" 'scipy'\n",
"])\n", "])\n",
"inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n", "inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n",
"# if cpu and memory_in_gb parameters are not provided\n", "# if cpu and memory_in_gb parameters are not provided\n",
@@ -294,6 +315,7 @@
" cpu=1.0,\n", " cpu=1.0,\n",
" memory_in_gb=0.5)\n", " memory_in_gb=0.5)\n",
"\n", "\n",
"# profiling is a long running operation and may take up to 25 min\n",
"profile.wait_for_completion(True)\n", "profile.wait_for_completion(True)\n",
"details = profile.get_details()" "details = profile.get_details()"
] ]
@@ -560,7 +582,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "aashishb" "name": "vaidyas"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -302,7 +302,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "aashishb" "name": "vaidyas"
} }
], ],
"category": "deployment", "category": "deployment",

File diff suppressed because one or more lines are too long

View File

@@ -1,260 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/tensorflow/tensorflow-model-register-and-deploy.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Register TensorFlow SavedModel and deploy as webservice\n",
"\n",
"Following this notebook, you will:\n",
"\n",
" - Learn how to register a TF SavedModel in your Azure Machine Learning Workspace.\n",
" - Deploy your model as a web service in an Azure Container Instance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) to install the Azure Machine Learning Python SDK and create a workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"\n",
"# Check core SDK version number.\n",
"print('SDK version:', azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize workspace\n",
"\n",
"Create a [Workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace%28class%29?view=azure-ml-py) object from your persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download the Model\n",
"\n",
"Download and extract the model from https://amlsamplenotebooksdata.blob.core.windows.net/data/flowers_model.tar.gz to \"models\" directory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import tarfile\n",
"import urllib.request\n",
"\n",
"# create directory for model\n",
"model_dir = 'models'\n",
"if not os.path.isdir(model_dir):\n",
" os.mkdir(model_dir)\n",
"\n",
"url=\"https://amlsamplenotebooksdata.blob.core.windows.net/data/flowers_model.tar.gz\"\n",
"response = urllib.request.urlretrieve(url, model_dir + \"/flowers_model.tar.gz\")\n",
"tar = tarfile.open(model_dir + \"/flowers_model.tar.gz\", \"r:gz\")\n",
"tar.extractall(model_dir)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register model\n",
"\n",
"Register a file or folder as a model by calling [Model.register()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#register-workspace--model-path--model-name--tags-none--properties-none--description-none--datasets-none--model-framework-none--model-framework-version-none--child-paths-none-). For this example, we have provided a TensorFlow SavedModel (`flowers_model` in the notebook's directory).\n",
"\n",
"In addition to the content of the model file itself, your registered model will also store model metadata -- model description, tags, and framework information -- that will be useful when managing and deploying models in your workspace. Using tags, for instance, you can categorize your models and apply filters when listing models in your workspace. Also, marking this model with the scikit-learn framework will simplify deploying it as a web service, as we'll see later."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"from azureml.core import Model\n",
"\n",
"model = Model.register(workspace=ws,\n",
" model_name='flowers', # Name of the registered model in your workspace.\n",
" model_path= model_dir + '/flowers_model', # Local Tensorflow SavedModel folder to upload and register as a model.\n",
" model_framework=Model.Framework.TENSORFLOW, # Framework used to create the model.\n",
" model_framework_version='1.14.0', # Version of Tensorflow used to create the model.\n",
" description='Flowers model')\n",
"\n",
"print('Name:', model.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy model\n",
"\n",
"Deploy your model as a web service using [Model.deploy()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#deploy-workspace--name--models--inference-config--deployment-config-none--deployment-target-none-). Web services take one or more models, load them in an environment, and run them on one of several supported deployment targets.\n",
"\n",
"For this example, we will deploy your TensorFlow SavedModel to an Azure Container Instance (ACI)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use a default environment (for supported models)\n",
"\n",
"The Azure Machine Learning service provides a default environment for supported model frameworks, including TensorFlow, based on the metadata you provided when registering your model. This is the easiest way to deploy your model.\n",
"\n",
"**Note**: This step can take several minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Webservice\n",
"from azureml.exceptions import WebserviceException\n",
"\n",
"service_name = 'tensorflow-flower-service'\n",
"\n",
"# Remove any existing service under the same name.\n",
"try:\n",
" Webservice(ws, service_name).delete()\n",
"except WebserviceException:\n",
" pass\n",
"\n",
"service = Model.deploy(ws, service_name, [model])\n",
"service.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After your model is deployed, perform a call to the web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"headers = {'Content-Type': 'application/json'}\n",
"\n",
"if service.auth_enabled:\n",
" headers['Authorization'] = 'Bearer '+ service.get_keys()[0]\n",
"elif service.token_auth_enabled:\n",
" headers['Authorization'] = 'Bearer '+ service.get_token()[0]\n",
"\n",
"scoring_uri = service.scoring_uri # If you have a SavedModel with classify and regress, \n",
" # you can change the scoring_uri from 'uri:predict' to 'uri:classify' or 'uri:regress'.\n",
"print(scoring_uri)\n",
"\n",
"with open('tensorflow-flower-predict-input.json', 'rb') as data_file:\n",
" response = requests.post(\n",
" scoring_uri, data=data_file, headers=headers)\n",
"print(response.status_code)\n",
"print(response.elapsed)\n",
"print(response.json())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you are finished testing your service, clean up the deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"service.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "aashishb"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,4 +0,0 @@
name: tensorflow-model-register-and-deploy
dependencies:
- pip:
- azureml-sdk

View File

@@ -58,7 +58,7 @@
"\n", "\n",
"Problem: Boston Housing Price Prediction with scikit-learn (train a model and run an explainer remotely via AMLCompute, and download and visualize the remotely-calculated explanations.)\n", "Problem: Boston Housing Price Prediction with scikit-learn (train a model and run an explainer remotely via AMLCompute, and download and visualize the remotely-calculated explanations.)\n",
"\n", "\n",
"| ![explanations-run-history](./img/explanations-run-history.PNG) |\n", "| ![explanations-run-history](./img/explanations-run-history.png) |\n",
"|:--:|\n" "|:--:|\n"
] ]
}, },
@@ -243,8 +243,29 @@
" 'azureml-interpret', 'sklearn-pandas', 'azureml-dataprep'\n", " 'azureml-interpret', 'sklearn-pandas', 'azureml-dataprep'\n",
"]\n", "]\n",
"\n", "\n",
"# Note: this is to pin the scikit-learn and pandas versions to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", "# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=azureml_pip_packages)\n", " pip_packages=azureml_pip_packages)\n",
"\n", "\n",
"# Now submit a run on AmlCompute\n", "# Now submit a run on AmlCompute\n",
@@ -344,8 +365,29 @@
" 'azureml-interpret', 'azureml-dataprep'\n", " 'azureml-interpret', 'azureml-dataprep'\n",
"]\n", "]\n",
"\n", "\n",
"# Note: this is to pin the scikit-learn and pandas versions to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", "# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=azureml_pip_packages)\n", " pip_packages=azureml_pip_packages)\n",
"\n", "\n",
"from azureml.core import Run\n", "from azureml.core import Run\n",
@@ -457,8 +499,29 @@
"\n", "\n",
"\n", "\n",
"\n", "\n",
"# Note: this is to pin the scikit-learn and pandas versions to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", "# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=azureml_pip_packages)\n", " pip_packages=azureml_pip_packages)\n",
"\n", "\n",
"from azureml.core import Run\n", "from azureml.core import Run\n",
@@ -621,7 +684,7 @@
"source": [ "source": [
"# retrieve model for visualization and deployment\n", "# retrieve model for visualization and deployment\n",
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"from sklearn.externals import joblib\n", "import joblib\n",
"original_model = Model(ws, 'model_explain_model_on_amlcomp')\n", "original_model = Model(ws, 'model_explain_model_on_amlcomp')\n",
"model_path = original_model.download(exist_ok=True)\n", "model_path = original_model.download(exist_ok=True)\n",
"original_model = joblib.load(model_path)" "original_model = joblib.load(model_path)"
@@ -641,7 +704,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# retrieve x_test for visualization\n", "# retrieve x_test for visualization\n",
"from sklearn.externals import joblib\n", "import joblib\n",
"x_test_path = './x_test_boston_housing.pkl'\n", "x_test_path = './x_test_boston_housing.pkl'\n",
"run.download_file('x_test_boston_housing.pkl', output_file_path=x_test_path)" "run.download_file('x_test_boston_housing.pkl', output_file_path=x_test_path)"
] ]
@@ -687,15 +750,16 @@
"source": [ "source": [
"## Next\n", "## Next\n",
"Learn about other use cases of the explain package on a:\n", "Learn about other use cases of the explain package on a:\n",
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n", "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n",
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n", "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n",
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n", "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n", "1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n", " 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n", " 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n",
"1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n", "1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
"1. Inferencing time: deploy a classification model and explainer:\n", "1. Inferencing time: deploy a classification model and explainer:\n",
" 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n", " 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
" 1. [Deploy a locally-trained keras model and explainer](../scoring-time/train-explain-model-keras-locally-and-deploy.ipynb)\n",
" 1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)" " 1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
] ]
}, },

View File

@@ -3,6 +3,8 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-interpret - azureml-interpret
- interpret-community[visualization]
- matplotlib
- azureml-contrib-interpret - azureml-contrib-interpret
- sklearn-pandas - sklearn-pandas
- azureml-dataprep - azureml-dataprep

View File

@@ -7,7 +7,7 @@ from interpret.ext.blackbox import TabularExplainer
from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient
from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split
from azureml.core.run import Run from azureml.core.run import Run
from sklearn.externals import joblib import joblib
import os import os
import numpy as np import numpy as np

View File

@@ -582,15 +582,16 @@
"source": [ "source": [
"## Next\n", "## Next\n",
"Learn about other use cases of the explain package on a:\n", "Learn about other use cases of the explain package on a:\n",
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n", "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n",
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n", "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n",
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n", "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n", "1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n", " 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n", " 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n",
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n", "1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
"1. Inferencing time: deploy a classification model and explainer:\n", "1. Inferencing time: deploy a classification model and explainer:\n",
" 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n", " 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
" 1. [Deploy a locally-trained keras model and explainer](../scoring-time/train-explain-model-keras-locally-and-deploy.ipynb)\n",
" 1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)" " 1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
] ]
}, },

View File

@@ -3,5 +3,7 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-interpret - azureml-interpret
- interpret-community[visualization]
- matplotlib
- azureml-contrib-interpret - azureml-contrib-interpret
- ipywidgets - ipywidgets

View File

@@ -3,7 +3,7 @@ import numpy as np
import pandas as pd import pandas as pd
import os import os
import pickle import pickle
from sklearn.externals import joblib import joblib
from sklearn.linear_model import LogisticRegression from sklearn.linear_model import LogisticRegression
from azureml.core.model import Model from azureml.core.model import Model

View File

@@ -3,7 +3,7 @@ import numpy as np
import pandas as pd import pandas as pd
import os import os
import pickle import pickle
from sklearn.externals import joblib import joblib
from sklearn.linear_model import LogisticRegression from sklearn.linear_model import LogisticRegression
from azureml.core.model import Model from azureml.core.model import Model

View File

@@ -165,7 +165,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from sklearn.model_selection import train_test_split\n", "from sklearn.model_selection import train_test_split\n",
"from sklearn.externals import joblib\n", "import joblib\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.impute import SimpleImputer\n", "from sklearn.impute import SimpleImputer\n",
"from sklearn.pipeline import Pipeline\n", "from sklearn.pipeline import Pipeline\n",
@@ -328,8 +328,29 @@
"]\n", "]\n",
" \n", " \n",
"\n", "\n",
"# Note: this is to pin the scikit-learn and pandas versions to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas'],\n", "# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"myenv = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n", " pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n",
" pin_sdk_version=False)\n", " pin_sdk_version=False)\n",
"\n", "\n",
@@ -396,9 +417,9 @@
"headers = {'Content-Type':'application/json'}\n", "headers = {'Content-Type':'application/json'}\n",
"\n", "\n",
"# send request to service\n", "# send request to service\n",
"print(\"POST to url\", service.scoring_uri)\n",
"resp = requests.post(service.scoring_uri, sample_data, headers=headers)\n", "resp = requests.post(service.scoring_uri, sample_data, headers=headers)\n",
"\n", "\n",
"print(\"POST to url\", service.scoring_uri)\n",
"# can covert back to Python objects from json string if desired\n", "# can covert back to Python objects from json string if desired\n",
"print(\"prediction:\", resp.text)\n", "print(\"prediction:\", resp.text)\n",
"result = json.loads(resp.text)" "result = json.loads(resp.text)"
@@ -445,15 +466,16 @@
"source": [ "source": [
"## Next\n", "## Next\n",
"Learn about other use cases of the explain package on a:\n", "Learn about other use cases of the explain package on a:\n",
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n", "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n",
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n", "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n",
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n", "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n", "1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n", " 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n", " 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n",
"1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n", "1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n", "1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
"1. [Inferencing time: deploy a remotely-trained model and explainer](./train-explain-model-on-amlcompute-and-deploy.ipynb)" "1. [Inferencing time: deploy a remotely-trained model and explainer](./train-explain-model-on-amlcompute-and-deploy.ipynb)\n",
"1. [Inferencing time: deploy a locally-trained keras model and explainer](./train-explain-model-keras-locally-and-deploy.ipynb)"
] ]
}, },
{ {

View File

@@ -3,6 +3,8 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-interpret - azureml-interpret
- interpret-community[visualization]
- matplotlib
- azureml-contrib-interpret - azureml-contrib-interpret
- sklearn-pandas - sklearn-pandas
- ipywidgets - ipywidgets

View File

@@ -63,7 +63,7 @@
"7.\tCreate an image and register it in the image registry.\n", "7.\tCreate an image and register it in the image registry.\n",
"8.\tDeploy the image as a web service in Azure.\n", "8.\tDeploy the image as a web service in Azure.\n",
"\n", "\n",
"| ![azure-machine-learning-cycle](./img/azure-machine-learning-cycle.PNG) |\n", "| ![azure-machine-learning-cycle](./img/azure-machine-learning-cycle.png) |\n",
"|:--:|" "|:--:|"
] ]
}, },
@@ -246,8 +246,29 @@
" \n", " \n",
"\n", "\n",
"\n", "\n",
"# Note: this is to pin the scikit-learn version to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n", "# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=['sklearn_pandas', 'pyyaml'] + azureml_pip_packages,\n", " pip_packages=['sklearn_pandas', 'pyyaml'] + azureml_pip_packages,\n",
" pin_sdk_version=False)\n", " pin_sdk_version=False)\n",
"# Now submit a run on AmlCompute\n", "# Now submit a run on AmlCompute\n",
@@ -308,7 +329,7 @@
"source": [ "source": [
"# retrieve model for visualization and deployment\n", "# retrieve model for visualization and deployment\n",
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"from sklearn.externals import joblib\n", "import joblib\n",
"original_model = Model(ws, 'amlcompute_deploy_model')\n", "original_model = Model(ws, 'amlcompute_deploy_model')\n",
"model_path = original_model.download(exist_ok=True)\n", "model_path = original_model.download(exist_ok=True)\n",
"original_svm_model = joblib.load(model_path)" "original_svm_model = joblib.load(model_path)"
@@ -335,7 +356,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# retrieve x_test for visualization\n", "# retrieve x_test for visualization\n",
"from sklearn.externals import joblib\n", "import joblib\n",
"x_test_path = './x_test.pkl'\n", "x_test_path = './x_test.pkl'\n",
"run.download_file('x_test_ibm.pkl', output_file_path=x_test_path)\n", "run.download_file('x_test_ibm.pkl', output_file_path=x_test_path)\n",
"x_test = joblib.load(x_test_path)" "x_test = joblib.load(x_test_path)"
@@ -397,8 +418,29 @@
"]\n", "]\n",
" \n", " \n",
"\n", "\n",
"# Note: this is to pin the scikit-learn and pandas versions to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas'],\n", "# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"myenv = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n", " pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n",
" pin_sdk_version=False)\n", " pin_sdk_version=False)\n",
"\n", "\n",
@@ -461,9 +503,9 @@
"headers = {'Content-Type':'application/json'}\n", "headers = {'Content-Type':'application/json'}\n",
"\n", "\n",
"# send request to service\n", "# send request to service\n",
"print(\"POST to url\", service.scoring_uri)\n",
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
"\n", "\n",
"print(\"POST to url\", service.scoring_uri)\n",
"# can covert back to Python objects from json string if desired\n", "# can covert back to Python objects from json string if desired\n",
"print(\"prediction:\", resp.text)" "print(\"prediction:\", resp.text)"
] ]
@@ -483,16 +525,16 @@
"source": [ "source": [
"## Next\n", "## Next\n",
"Learn about other use cases of the explain package on a:\n", "Learn about other use cases of the explain package on a:\n",
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n", "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n",
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n", "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n",
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n", "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n", "1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n", " 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n", " 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n",
"1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n", "1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n", "1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
"1. [Inferencing time: deploy a locally-trained model and explainer](./train-explain-model-locally-and-deploy.ipynb)\n", "1. [Inferencing time: deploy a locally-trained model and explainer](./train-explain-model-locally-and-deploy.ipynb)\n",
" " "1. [Inferencing time: deploy a locally-trained keras model and explainer](./train-explain-model-keras-locally-and-deploy.ipynb)"
] ]
}, },
{ {

View File

@@ -3,6 +3,8 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-interpret - azureml-interpret
- interpret-community[visualization]
- matplotlib
- azureml-contrib-interpret - azureml-contrib-interpret
- sklearn-pandas - sklearn-pandas
- azureml-dataprep - azureml-dataprep

View File

@@ -6,7 +6,7 @@ import os
import pandas as pd import pandas as pd
import zipfile import zipfile
from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split
from sklearn.externals import joblib import joblib
from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline from sklearn.pipeline import Pipeline

View File

@@ -252,7 +252,7 @@
"source": [ "source": [
"binaries_folder = \"azurebatch/job_binaries\"\n", "binaries_folder = \"azurebatch/job_binaries\"\n",
"if not os.path.isdir(binaries_folder):\n", "if not os.path.isdir(binaries_folder):\n",
" os.mkdir(binaries_folder)\n", " os.makedirs(binaries_folder)\n",
"\n", "\n",
"file_name=\"azurebatch.cmd\"\n", "file_name=\"azurebatch.cmd\"\n",
"with open(path.join(binaries_folder, file_name), 'w') as f:\n", "with open(path.join(binaries_folder, file_name), 'w') as f:\n",

View File

@@ -537,259 +537,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Deploy the model in ACI\n", "For model deployment, please refer to [Training, hyperparameter tune, and deploy with TensorFlow](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb)."
"Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). \n",
"### Create score.py\n",
"First, we will create a scoring script that will be invoked by the web service call. \n",
"\n",
"* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n",
" * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n",
" * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import json\n",
"import numpy as np\n",
"import os\n",
"import tensorflow as tf\n",
"\n",
"def init():\n",
" global X, output, sess\n",
" tf.reset_default_graph()\n",
" # AZUREML_MODEL_DIR is an environment variable created during deployment.\n",
" # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n",
" # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n",
" model_root = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model')\n",
" saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n",
" X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n",
" output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n",
" \n",
" sess = tf.Session()\n",
" saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n",
"\n",
"def run(raw_data):\n",
" data = np.array(json.loads(raw_data)['data'])\n",
" # make prediction\n",
" out = output.eval(session=sess, feed_dict={X: data})\n",
" y_hat = np.argmax(out, axis=1)\n",
" return y_hat.tolist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create myenv.yml\n",
"We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import CondaDependencies\n",
"\n",
"cd = CondaDependencies.create()\n",
"cd.add_conda_package('numpy')\n",
"cd.add_tensorflow_conda_package()\n",
"cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
"\n",
"print(cd.serialize_to_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy to ACI\n",
"Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, AzureML will build a Docker container image with the given configuration, if already not available. This image will be deployed to the ACI infrastructure and the scoring script and model will be mounted on the container. The model will then be available as a web service with an HTTP endpoint to accept REST client calls."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"from azureml.core.environment import Environment\n",
"from azureml.core.model import Model, InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n",
"\n",
"\n",
"myenv = Environment.from_conda_specification(name=\"env\", file_path=\"myenv.yml\")\n",
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
" memory_gb=1, \n",
" tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n",
" description='Tensorflow DNN on MNIST')\n",
"\n",
"service = Model.deploy(ws, 'tf-mnist-svc', [model], inference_config, aciconfig)\n",
"service.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(service.get_logs())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the scoring web service endpoint:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(service.scoring_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test the deployed model\n",
"Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n",
"\n",
"After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"# find 30 random samples from test set\n",
"n = 30\n",
"sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
"\n",
"test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n",
"test_samples = bytes(test_samples, encoding='utf8')\n",
"\n",
"# predict using the deployed model\n",
"result = service.run(input_data=test_samples)\n",
"\n",
"# compare actual value vs. the predicted values:\n",
"i = 0\n",
"plt.figure(figsize = (20, 1))\n",
"\n",
"for s in sample_indices:\n",
" plt.subplot(1, n, i + 1)\n",
" plt.axhline('')\n",
" plt.axvline('')\n",
" \n",
" # use different color for misclassified sample\n",
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
" \n",
" plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n",
" plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
" \n",
" i = i + 1\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also send raw HTTP request to the service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"# send a random row from the test set to score\n",
"random_index = np.random.randint(0, len(X_test)-1)\n",
"input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n",
"\n",
"headers = {'Content-Type':'application/json'}\n",
"\n",
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
"\n",
"print(\"POST to url\", service.scoring_uri)\n",
"print(\"input data:\", input_data)\n",
"print(\"label:\", y_test[random_index])\n",
"print(\"prediction:\", resp.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at the workspace after the web service was deployed. You should see \n",
"* a registered model named 'model' and with the id 'model:1'\n",
"* an image called 'tf-mnist' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n",
"* a webservice called 'tf-mnist' with some scoring URL"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"models = ws.models\n",
"for name, model in models.items():\n",
" print(\"Model: {}, ID: {}\".format(name, model.id))\n",
" \n",
"images = ws.images\n",
"for name, image in images.items():\n",
" print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
" \n",
"webservices = ws.webservices\n",
"for name, webservice in webservices.items():\n",
" print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"You can delete the ACI deployment with a simple delete API call."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"service.delete()"
] ]
} }
], ],

View File

@@ -0,0 +1,510 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Showcasing Dataset and PipelineParameter\n",
"\n",
"This notebook demonstrates how a [**FileDataset**](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) or [**TabularDataset**](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) can be parametrized with [**PipelineParameters**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelineparameter?view=azure-ml-py) in an AML [Pipeline](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline(class)?view=azure-ml-py). By parametrizing datasets, you can dynamically run pipeline experiments with different datasets without any code change.\n",
"\n",
"A common use case is building a training pipeline with a sample of your training data for quick iterative development. When you're ready to test and deploy your pipeline at scale, you can pass in your full training dataset to the pipeline experiment without making any changes to your training script. \n",
" \n",
"To see more about how parameters work between steps, please refer [aml-pipelines-with-data-dependency-steps](https://aka.ms/pl-data-dep).\n",
"\n",
"* [How to create a Pipeline with a Dataset PipelineParameter](#index1)\n",
"* [How to submit a Pipeline with a Dataset PipelineParameter](#index2)\n",
"* [How to submit a Pipeline and change the Dataset PipelineParameter value from the sdk](#index3)\n",
"* [How to submit a Pipeline and change the Dataset PipelineParameter value using a REST call](#index4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Azure Machine Learning and Pipeline SDK-specific imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"from azureml.core import Workspace, Experiment, Dataset\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.data.dataset_consumption_config import DatasetConsumptionConfig\n",
"from azureml.widgets import RunDetails\n",
"\n",
"from azureml.pipeline.core import PipelineParameter\n",
"from azureml.pipeline.core import Pipeline, PipelineRun\n",
"from azureml.pipeline.steps import PythonScriptStep\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json\n",
"\n",
"If you don't have a config.json file, go through the [configuration Notebook](https://aka.ms/pl-config) first.\n",
"\n",
"This sets you up with a working config file that has information on your workspace, subscription id, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Azure ML experiment\n",
"\n",
"Let's create an experiment named \"showcasing-dataset\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'showcasing-dataset'\n",
"source_directory = '.'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"experiment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach an AmlCompute cluster\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you get the default `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
" \n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 4)\n",
"\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, timeout_in_minutes = 10)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dataset Configuration\n",
"\n",
"The following steps detail how to create a [FileDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) and [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) from an external CSV file, and configure them to be used by a [Pipeline](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline(class)?view=azure-ml-py):\n",
"\n",
"1. Create a dataset from a csv file\n",
"2. Create a [PipelineParameter](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelineparameter?view=azure-ml-py) object and set the `default_value` to the dataset. [PipelineParameter](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelineparameter?view=azure-ml-py) objects enabled arguments to be passed into Pipelines when they are resubmitted after creation. The `name` is referenced later on when we submit additional pipeline runs with different input datasets. \n",
"3. Create a [DatasetConsumptionConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_consumption_config.datasetconsumptionconfig?view=azure-ml-py) object from the [PiepelineParameter](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelineparameter?view=azure-ml-py). The [DatasetConsumptionConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_consumption_config.datasetconsumptionconfig?view=azure-ml-py) object specifies how the dataset should be used by the remote compute where the pipeline is run. **NOTE** only [DatasetConsumptionConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_consumption_config.datasetconsumptionconfig?view=azure-ml-py) objects built on [FileDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) can be set `as_mount()` or `as_download()` on the remote compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"datapath-remarks-sample"
]
},
"outputs": [],
"source": [
"file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')\n",
"file_pipeline_param = PipelineParameter(name=\"file_ds_param\", default_value=file_dataset)\n",
"file_ds_consumption = DatasetConsumptionConfig(\"file_dataset\", file_pipeline_param).as_mount()\n",
"\n",
"tabular_dataset = Dataset.Tabular.from_delimited_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')\n",
"tabular_pipeline_param = PipelineParameter(name=\"tabular_ds_param\", default_value=tabular_dataset)\n",
"tabular_ds_consumption = DatasetConsumptionConfig(\"tabular_dataset\", tabular_pipeline_param)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will setup a training script to ingest our passed-in datasets and print their contents. **NOTE** the names of the datasets referenced inside the training script correspond to the `name` of their respective [DatasetConsumptionConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_consumption_config.datasetconsumptionconfig?view=azure-ml-py) objects we defined above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile train_with_dataset.py\n",
"from azureml.core import Run\n",
"\n",
"input_file_ds_path = Run.get_context().input_datasets['file_dataset']\n",
"with open(input_file_ds_path, 'r') as f:\n",
" content = f.read()\n",
" print(content)\n",
"\n",
"input_tabular_ds = Run.get_context().input_datasets['tabular_dataset']\n",
"tabular_df = input_tabular_ds.to_pandas_dataframe()\n",
"print(tabular_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='index1'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Pipeline with a Dataset PipelineParameter\n",
"\n",
"Note that the ```file_ds_consumption``` and ```tabular_ds_consumption``` are specified as both arguments and inputs to create a step."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_step = PythonScriptStep(\n",
" name=\"train_step\",\n",
" script_name=\"train_with_dataset.py\",\n",
" arguments=[\"--param1\", file_ds_consumption, \"--param2\", tabular_ds_consumption],\n",
" inputs=[file_ds_consumption, tabular_ds_consumption],\n",
" compute_target=compute_target,\n",
" source_directory=source_directory)\n",
"\n",
"print(\"train_step created\")\n",
"\n",
"pipeline = Pipeline(workspace=ws, steps=[train_step])\n",
"print(\"pipeline with the train_step created\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='index2'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit a Pipeline with a Dataset PipelineParameter\n",
"\n",
"Pipelines can be submitted with default values of PipelineParameters by not specifying any parameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Pipeline will run with default file_ds and tabular_ds\n",
"pipeline_run = experiment.submit(pipeline)\n",
"print(\"Pipeline is submitted for execution\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"RunDetails(pipeline_run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='index3'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit a Pipeline with a different Dataset PipelineParameter value from the SDK\n",
"\n",
"The training pipeline can be reused with different input datasets by passing them in as PipelineParameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iris_file_ds = Dataset.File.from_files('https://raw.githubusercontent.com/Azure/MachineLearningNotebooks/'\n",
" '4e7b3784d50e81c313c62bcdf9a330194153d9cd/how-to-use-azureml/work-with-data/'\n",
" 'datasets-tutorial/train-with-datasets/train-dataset/iris.csv')\n",
"\n",
"iris_tabular_ds = Dataset.Tabular.from_delimited_files('https://raw.githubusercontent.com/Azure/MachineLearningNotebooks/'\n",
" '4e7b3784d50e81c313c62bcdf9a330194153d9cd/how-to-use-azureml/work-with-data/'\n",
" 'datasets-tutorial/train-with-datasets/train-dataset/iris.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run_with_params = experiment.submit(pipeline, pipeline_parameters={'file_ds_param': iris_file_ds, 'tabular_ds_param': iris_tabular_ds}) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"RunDetails(pipeline_run_with_params).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run_with_params.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='index4'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dynamically Set the Dataset PipelineParameter Values using a REST Call\n",
"\n",
"Let's publish the pipeline we created previously, so we can generate a pipeline endpoint. We can then submit the iris datasets to the pipeline REST endpoint by passing in their IDs. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"published_pipeline = pipeline.publish(name=\"Dataset_Pipeline\", description=\"Pipeline to test Dataset PipelineParameter\", continue_on_step_failure=True)\n",
"published_pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"published_pipeline.submit(ws, experiment_name=\"publishedexperiment\", pipeline_parameters={'file_ds_param': iris_file_ds, 'tabular_ds_param': iris_tabular_ds})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
"import requests\n",
"\n",
"auth = InteractiveLoginAuthentication()\n",
"aad_token = auth.get_authentication_header()\n",
"\n",
"rest_endpoint = published_pipeline.endpoint\n",
"\n",
"print(\"You can perform HTTP POST on URL {} to trigger this pipeline\".format(rest_endpoint))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# specify the param when running the pipeline\n",
"response = requests.post(rest_endpoint, \n",
" headers=aad_token, \n",
" json={\"ExperimentName\": \"MyRestPipeline\",\n",
" \"RunSource\": \"SDK\",\n",
" \"DataSetDefinitionValueAssignments\": {\"file_ds_param\": {\"SavedDataSetReference\": {\"Id\": iris_file_ds.id}},\n",
" \"tabular_ds_param\": {\"SavedDataSetReference\": {\"Id\": iris_tabular_ds.id}}}\n",
" }\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" response.raise_for_status()\n",
"except Exception: \n",
" raise Exception('Received bad response from the endpoint: {}\\n'\n",
" 'Response Code: {}\\n'\n",
" 'Headers: {}\\n'\n",
" 'Content: {}'.format(rest_endpoint, response.status_code, response.headers, response.content))\n",
"\n",
"run_id = response.json().get('Id')\n",
"print('Submitted pipeline run: ', run_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"published_pipeline_run_via_rest = PipelineRun(ws.experiments[\"MyRestPipeline\"], run_id)\n",
"RunDetails(published_pipeline_run_via_rest).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"published_pipeline_run_via_rest.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='index5'></a>"
]
}
],
"metadata": {
"authors": [
{
"name": "rafarmah"
}
],
"category": "tutorial",
"compute": [
"AML Compute"
],
"datasets": [
"Custom"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"Azure ML"
],
"friendly_name": "How to use Dataset as a PipelineParameter",
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"order_index": 13,
"star_tag": [
"featured"
],
"tags": [
"None"
],
"task": "Demonstrates the use of Dataset as a PipelineParameter"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,5 @@
name: aml-pipelines-showcasing-dataset-and-pipelineparameter
dependencies:
- pip:
- azureml-sdk
- azureml-widgets

View File

@@ -510,7 +510,7 @@
" inputs=[step_1_input],\n", " inputs=[step_1_input],\n",
" num_workers=1,\n", " num_workers=1,\n",
" python_script_path=python_script_path,\n", " python_script_path=python_script_path,\n",
" python_script_params={'arg1', pipeline_param, 'arg2},\n", " python_script_params={'arg1', pipeline_param, 'arg2'},\n",
" run_name='DB_Python_demo',\n", " run_name='DB_Python_demo',\n",
" compute_target=databricks_compute,\n", " compute_target=databricks_compute,\n",
" allow_reuse=True\n", " allow_reuse=True\n",

View File

@@ -70,11 +70,7 @@
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n", "from azureml.train.automl import AutoMLConfig\n",
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"from azureml.core.dataset import Dataset\n", "from azureml.core.dataset import Dataset\n",
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n", "\n",
"from azureml.pipeline.steps import AutoMLStep\n", "from azureml.pipeline.steps import AutoMLStep\n",
"\n", "\n",
@@ -105,7 +101,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Create an Azure ML experiment\n", "## Create an Azure ML experiment\n",
"Let's create an experiment named \"automl-classification\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n", "Let's create an experiment named \"automlstep-classification\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n",
"\n", "\n",
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step." "The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
] ]
@@ -138,45 +134,25 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Choose a name for your cluster.\n", "from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n", "amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"found = False\n", "# Verify that cluster does not exist already\n",
"# Check if this compute target already exists in the workspace.\n", "try:\n",
"cts = ws.compute_targets\n", " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " print('Found existing cluster, use it.')\n",
" found = True\n", "except ComputeTargetException:\n",
" print('Found existing compute target.')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use \"STANDARD_NC6\"\n",
" compute_target = cts[amlcompute_cluster_name]\n", " #vm_priority = 'lowpriority', # optional\n",
" \n", " max_nodes=4)\n",
"if not found:\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 4)\n",
"\n", "\n",
" # Create the cluster.\n", "compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", "# For a more detailed view of current AmlCompute status, use get_status()."
" \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = 1, timeout_in_minutes = 10)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'])\n",
"conda_run_config.environment.python.conda_dependencies = cd\n",
"\n",
"print('run config is ready')"
] ]
}, },
{ {
@@ -192,19 +168,30 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n", "# Try to load the dataset from the Workspace. Otherwise, create it from the file\n",
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n", "found = False\n",
"dataset = Dataset.Tabular.from_delimited_files(example_data)\n", "key = \"Crime-Dataset\"\n",
"dataset.to_pandas_dataframe().describe()" "description_text = \"Crime Dataset (used in the the aml-pipelines-with-automated-machine-learning-step.ipynb notebook)\"\n",
] "\n",
}, "if key in ws.datasets.keys(): \n",
{ " found = True\n",
"cell_type": "code", " dataset = ws.datasets[key] \n",
"execution_count": null, "\n",
"metadata": {}, "if not found:\n",
"outputs": [], " # Create AML Dataset and register it into Workspace\n",
"source": [ " # The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"dataset.take(5).to_pandas_dataframe()" " example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
" dataset = Dataset.Tabular.from_delimited_files(example_data)\n",
" dataset = dataset.drop_columns(['FBI Code'])\n",
" \n",
" #Register Dataset in Workspace\n",
" dataset = dataset.register(workspace=ws,\n",
" name=key,\n",
" description=description_text)\n",
"\n",
"\n",
"df = dataset.to_pandas_dataframe()\n",
"df.describe()"
] ]
}, },
{ {
@@ -224,9 +211,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n", "dataset.take(5).to_pandas_dataframe()"
"y = dataset.keep_columns(columns=['Primary Type'], validate=True)\n",
"print('X and y are ready!')"
] ]
}, },
{ {
@@ -244,19 +229,18 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"automl_settings = {\n", "automl_settings = {\n",
" \"iteration_timeout_minutes\" : 5,\n", " \"experiment_timeout_minutes\": 20,\n",
" \"iterations\" : 2,\n", " \"max_concurrent_iterations\": 4,\n",
" \"primary_metric\" : 'AUC_weighted',\n", " \"primary_metric\" : 'AUC_weighted'\n",
" \"preprocess\" : True,\n",
" \"verbosity\" : logging.INFO\n",
"}\n", "}\n",
"automl_config = AutoMLConfig(task = 'classification',\n", "automl_config = AutoMLConfig(compute_target=compute_target,\n",
" debug_log = 'automl_errors.log',\n", " task = \"classification\",\n",
" training_data=dataset,\n",
" label_column_name=\"Primary Type\", \n",
" path = project_folder,\n", " path = project_folder,\n",
" compute_target=compute_target,\n", " enable_early_stopping= True,\n",
" run_configuration=conda_run_config,\n", " featurization= 'auto',\n",
" X = X,\n", " debug_log = \"automl_errors.log\",\n",
" y = y,\n",
" **automl_settings\n", " **automl_settings\n",
" )" " )"
] ]
@@ -265,6 +249,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Create Pipeline and AutoMLStep\n",
"\n",
"You can define outputs for the AutoMLStep using TrainingOutput." "You can define outputs for the AutoMLStep using TrainingOutput."
] ]
}, },
@@ -300,7 +286,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"automlstep-remarks-sample1"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"automl_step = AutoMLStep(\n", "automl_step = AutoMLStep(\n",
@@ -313,7 +303,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"automlstep-remarks-sample2"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.pipeline.core import Pipeline\n", "from azureml.pipeline.core import Pipeline\n",
@@ -378,8 +372,8 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"import json\n", "import json\n",
"with open(metrics_output._path_on_datastore) as f: \n", "with open(metrics_output._path_on_datastore) as f:\n",
" metrics_output_result = f.read()\n", " metrics_output_result = f.read()\n",
" \n", " \n",
"deserialized_metrics_output = json.loads(metrics_output_result)\n", "deserialized_metrics_output = json.loads(metrics_output_result)\n",
"df = pd.DataFrame(deserialized_metrics_output)\n", "df = pd.DataFrame(deserialized_metrics_output)\n",
@@ -399,6 +393,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Retrieve best model from Pipeline Run\n",
"best_model_output = pipeline_run.get_pipeline_output(best_model_output_name)\n", "best_model_output = pipeline_run.get_pipeline_output(best_model_output_name)\n",
"num_file_downloaded = best_model_output.download('.', show_progress=True)" "num_file_downloaded = best_model_output.download('.', show_progress=True)"
] ]
@@ -416,6 +411,15 @@
"best_model" "best_model"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_model.steps"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -431,11 +435,11 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"dataset = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n", "dataset_test = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n",
"df_test = dataset_test.to_pandas_dataframe()\n", "df_test = dataset_test.to_pandas_dataframe()\n",
"df_test = df_test[pd.notnull(df['Primary Type'])]\n", "df_test = df_test[pd.notnull(df_test['Primary Type'])]\n",
"\n", "\n",
"y_test = df_test[['Primary Type']]\n", "y_test = df_test['Primary Type']\n",
"X_test = df_test.drop(['Primary Type', 'FBI Code'], axis=1)" "X_test = df_test.drop(['Primary Type', 'FBI Code'], axis=1)"
] ]
}, },
@@ -454,15 +458,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from pandas_ml import ConfusionMatrix\n", "from sklearn.metrics import confusion_matrix\n",
"\n",
"ypred = best_model.predict(X_test)\n", "ypred = best_model.predict(X_test)\n",
"\n", "cm = confusion_matrix(y_test, ypred)"
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n", ]
"\n", },
"print(cm)\n", {
"\n", "cell_type": "code",
"cm.plot()" "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Visualize the confusion matrix\n",
"pd.DataFrame(cm).style.background_gradient(cmap='Blues', low=0, high=0.9)"
] ]
} }
], ],

View File

@@ -16,16 +16,12 @@
"\n", "\n",
"You can combine the two part tutorial into one using AzureML Pipelines as Pipelines provide a way to stitch together various steps involved (like data preparation and training in this case) in a machine learning workflow.\n", "You can combine the two part tutorial into one using AzureML Pipelines as Pipelines provide a way to stitch together various steps involved (like data preparation and training in this case) in a machine learning workflow.\n",
"\n", "\n",
"In this notebook, you learn how to prepare data for regression modeling by using the [Azure Machine Learning Data Prep SDK](https://aka.ms/data-prep-sdk) for Python. You run various transformations to filter and combine two different NYC taxi data sets. Once you prepare the NYC taxi data for regression modeling, then you will use [AutoMLStep](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py) available with [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) to define your machine learning goals and constraints as well as to launch the automated machine learning process. The automated machine learning technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n", "In this notebook, you learn how to prepare data for regression modeling by using open source library [pandas](https://pandas.pydata.org/). You run various transformations to filter and combine two different NYC taxi datasets. Once you prepare the NYC taxi data for regression modeling, then you will use [AutoMLStep](https://docs.microsoft.com/python/api/azureml-train-automl-runtime/azureml.train.automl.runtime.automl_step.automlstep?view=azure-ml-py) available with [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) to define your machine learning goals and constraints as well as to launch the automated machine learning process. The automated machine learning technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n",
"\n", "\n",
"After you complete building the model, you can predict the cost of a taxi trip by training a model on data features. These features include the pickup day and time, the number of passengers, and the pickup location.\n", "After you complete building the model, you can predict the cost of a taxi trip by training a model on data features. These features include the pickup day and time, the number of passengers, and the pickup location.\n",
"\n", "\n",
"## Prerequisite\n", "## Prerequisite\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc."
"\n",
"We will run various transformations to filter and combine two different NYC taxi data sets. We will use DataPrep SDK for this preparing data. \n",
"\n",
"Perform `pip install azureml-dataprep` if you have't already done so."
] ]
}, },
{ {
@@ -108,7 +104,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import azureml.dataprep as dprep\n",
"from IPython.display import display\n", "from IPython.display import display\n",
"\n", "\n",
"display(green_df_raw.head(5))\n", "display(green_df_raw.head(5))\n",
@@ -144,8 +139,8 @@
"if not os.path.exists(yelloDir):\n", "if not os.path.exists(yelloDir):\n",
" os.mkdir(yelloDir)\n", " os.mkdir(yelloDir)\n",
" \n", " \n",
"greenTaxiData = greenDir + \"/part-00000\"\n", "greenTaxiData = greenDir + \"/unprepared.parquet\"\n",
"yellowTaxiData = yelloDir + \"/part-00000\"\n", "yellowTaxiData = yelloDir + \"/unprepared.parquet\"\n",
"\n", "\n",
"green_df_raw.to_csv(greenTaxiData, index=False)\n", "green_df_raw.to_csv(greenTaxiData, index=False)\n",
"yellow_df_raw.to_csv(yellowTaxiData, index=False)\n", "yellow_df_raw.to_csv(yellowTaxiData, index=False)\n",
@@ -169,17 +164,54 @@
"\n", "\n",
"default_store.upload_files([greenTaxiData], \n", "default_store.upload_files([greenTaxiData], \n",
" target_path = 'green', \n", " target_path = 'green', \n",
" overwrite = False, \n", " overwrite = True, \n",
" show_progress = True)\n", " show_progress = True)\n",
"\n", "\n",
"default_store.upload_files([yellowTaxiData], \n", "default_store.upload_files([yellowTaxiData], \n",
" target_path = 'yellow', \n", " target_path = 'yellow', \n",
" overwrite = False, \n", " overwrite = True, \n",
" show_progress = True)\n", " show_progress = True)\n",
"\n", "\n",
"print(\"Upload calls completed.\")" "print(\"Upload calls completed.\")"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create and register datasets\n",
"\n",
"By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. You can learn more about the what subsetting capabilities are supported by referring to [our documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py#remarks). The data remains in its existing location, so no extra storage cost is incurred."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Dataset\n",
"green_taxi_data = Dataset.Tabular.from_delimited_files(default_store.path('green/unprepared.parquet'))\n",
"yellow_taxi_data = Dataset.Tabular.from_delimited_files(default_store.path('yellow/unprepared.parquet'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Register the taxi datasets with the workspace so that you can reuse them in other experiments or share with your colleagues who have access to your workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"green_taxi_data = green_taxi_data.register(ws, 'green_taxi_data')\n",
"yellow_taxi_data = yellow_taxi_data.register(ws, 'yellow_taxi_data')"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -194,20 +226,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"aml_compute = ws.get_default_compute_target(\"CPU\")\n", "# Choose a name for your CPU cluster\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"if aml_compute is None:\n", "# Verify that cluster does not exist already\n",
" amlcompute_cluster_name = \"cpu-cluster\"\n", "try:\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", " aml_compute = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
" max_nodes = 4)\n", " print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" max_nodes=4)\n",
" aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n", "\n",
" aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", "aml_compute.wait_for_completion(show_output=True)"
" aml_compute.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"aml_compute"
] ]
}, },
{ {
@@ -215,7 +249,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Define RunConfig for the compute\n", "#### Define RunConfig for the compute\n",
"We need `azureml-dataprep` SDK for all the steps below. We will also use `pandas`, `scikit-learn` and `automl` for the training step. Defining the `runconfig` for that." "We will also use `pandas`, `scikit-learn` and `automl`, `pyarrow` for the pipeline steps. Defining the `runconfig` for that."
] ]
}, },
{ {
@@ -242,14 +276,10 @@
"# Use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", "# Use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"aml_run_config.environment.python.user_managed_dependencies = False\n", "aml_run_config.environment.python.user_managed_dependencies = False\n",
"\n", "\n",
"# Auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
"aml_run_config.auto_prepare_environment = True\n",
"\n",
"# Specify CondaDependencies obj, add necessary packages\n", "# Specify CondaDependencies obj, add necessary packages\n",
"aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n", "aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n",
" conda_packages=['pandas','scikit-learn'], \n", " conda_packages=['pandas','scikit-learn'], \n",
" pip_packages=['azureml-sdk', 'azureml-dataprep', 'azureml-train-automl'], \n", " pip_packages=['azureml-sdk[automl,explain]', 'pyarrow'])\n",
" pin_sdk_version=False)\n",
"\n", "\n",
"print (\"Run configuration created.\")" "print (\"Run configuration created.\")"
] ]
@@ -259,7 +289,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Prepare data\n", "### Prepare data\n",
"Now we will prepare for regression modeling by using the `Azure Machine Learning Data Prep SDK for Python`. We run various transformations to filter and combine two different NYC taxi data sets.\n", "Now we will prepare for regression modeling by using `pandas`. We run various transformations to filter and combine two different NYC taxi datasets.\n",
"\n", "\n",
"We achieve this by creating a separate step for each transformation as this allows us to reuse the steps and saves us from running all over again in case of any change. We will keep data preparation scripts in one subfolder and training scripts in another.\n", "We achieve this by creating a separate step for each transformation as this allows us to reuse the steps and saves us from running all over again in case of any change. We will keep data preparation scripts in one subfolder and training scripts in another.\n",
"\n", "\n",
@@ -270,7 +300,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Define Useful Colums\n", "#### Define Useful Columns\n",
"Here we are defining a set of \"useful\" columns for both Green and Yellow taxi data." "Here we are defining a set of \"useful\" columns for both Green and Yellow taxi data."
] ]
}, },
@@ -304,18 +334,12 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.data.data_reference import DataReference \n",
"from azureml.pipeline.core import PipelineData\n", "from azureml.pipeline.core import PipelineData\n",
"from azureml.pipeline.steps import PythonScriptStep\n", "from azureml.pipeline.steps import PythonScriptStep\n",
"\n", "\n",
"# python scripts folder\n", "# python scripts folder\n",
"prepare_data_folder = './scripts/prepdata'\n", "prepare_data_folder = './scripts/prepdata'\n",
"\n", "\n",
"blob_green_data = DataReference(\n",
" datastore=default_store,\n",
" data_reference_name=\"green_taxi_data\",\n",
" path_on_datastore=\"green/part-00000\")\n",
"\n",
"# rename columns as per Azure Machine Learning NYC Taxi tutorial\n", "# rename columns as per Azure Machine Learning NYC Taxi tutorial\n",
"green_columns = str({ \n", "green_columns = str({ \n",
" \"vendorID\": \"vendor\",\n", " \"vendorID\": \"vendor\",\n",
@@ -332,7 +356,7 @@
"}).replace(\",\", \";\")\n", "}).replace(\",\", \";\")\n",
"\n", "\n",
"# Define output after cleansing step\n", "# Define output after cleansing step\n",
"cleansed_green_data = PipelineData(\"green_taxi_data\", datastore=default_store)\n", "cleansed_green_data = PipelineData(\"cleansed_green_data\", datastore=default_store).as_dataset()\n",
"\n", "\n",
"print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n",
"\n", "\n",
@@ -341,11 +365,10 @@
"cleansingStepGreen = PythonScriptStep(\n", "cleansingStepGreen = PythonScriptStep(\n",
" name=\"Cleanse Green Taxi Data\",\n", " name=\"Cleanse Green Taxi Data\",\n",
" script_name=\"cleanse.py\", \n", " script_name=\"cleanse.py\", \n",
" arguments=[\"--input_cleanse\", blob_green_data, \n", " arguments=[\"--useful_columns\", useful_columns,\n",
" \"--useful_columns\", useful_columns,\n",
" \"--columns\", green_columns,\n", " \"--columns\", green_columns,\n",
" \"--output_cleanse\", cleansed_green_data],\n", " \"--output_cleanse\", cleansed_green_data],\n",
" inputs=[blob_green_data],\n", " inputs=[green_taxi_data.as_named_input('raw_data')],\n",
" outputs=[cleansed_green_data],\n", " outputs=[cleansed_green_data],\n",
" compute_target=aml_compute,\n", " compute_target=aml_compute,\n",
" runconfig=aml_run_config,\n", " runconfig=aml_run_config,\n",
@@ -369,11 +392,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"blob_yellow_data = DataReference(\n",
" datastore=default_store,\n",
" data_reference_name=\"yellow_taxi_data\",\n",
" path_on_datastore=\"yellow/part-00000\")\n",
"\n",
"yellow_columns = str({\n", "yellow_columns = str({\n",
" \"vendorID\": \"vendor\",\n", " \"vendorID\": \"vendor\",\n",
" \"tpepPickupDateTime\": \"pickup_datetime\",\n", " \"tpepPickupDateTime\": \"pickup_datetime\",\n",
@@ -389,7 +407,7 @@
"}).replace(\",\", \";\")\n", "}).replace(\",\", \";\")\n",
"\n", "\n",
"# Define output after cleansing step\n", "# Define output after cleansing step\n",
"cleansed_yellow_data = PipelineData(\"yellow_taxi_data\", datastore=default_store)\n", "cleansed_yellow_data = PipelineData(\"cleansed_yellow_data\", datastore=default_store).as_dataset()\n",
"\n", "\n",
"print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n",
"\n", "\n",
@@ -398,11 +416,10 @@
"cleansingStepYellow = PythonScriptStep(\n", "cleansingStepYellow = PythonScriptStep(\n",
" name=\"Cleanse Yellow Taxi Data\",\n", " name=\"Cleanse Yellow Taxi Data\",\n",
" script_name=\"cleanse.py\", \n", " script_name=\"cleanse.py\", \n",
" arguments=[\"--input_cleanse\", blob_yellow_data, \n", " arguments=[\"--useful_columns\", useful_columns,\n",
" \"--useful_columns\", useful_columns,\n",
" \"--columns\", yellow_columns,\n", " \"--columns\", yellow_columns,\n",
" \"--output_cleanse\", cleansed_yellow_data],\n", " \"--output_cleanse\", cleansed_yellow_data],\n",
" inputs=[blob_yellow_data],\n", " inputs=[yellow_taxi_data.as_named_input('raw_data')],\n",
" outputs=[cleansed_yellow_data],\n", " outputs=[cleansed_yellow_data],\n",
" compute_target=aml_compute,\n", " compute_target=aml_compute,\n",
" runconfig=aml_run_config,\n", " runconfig=aml_run_config,\n",
@@ -428,7 +445,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# Define output after merging step\n", "# Define output after merging step\n",
"merged_data = PipelineData(\"merged_data\", datastore=default_store)\n", "merged_data = PipelineData(\"merged_data\", datastore=default_store).as_dataset()\n",
"\n", "\n",
"print('Merge script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "print('Merge script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n",
"\n", "\n",
@@ -437,10 +454,9 @@
"mergingStep = PythonScriptStep(\n", "mergingStep = PythonScriptStep(\n",
" name=\"Merge Taxi Data\",\n", " name=\"Merge Taxi Data\",\n",
" script_name=\"merge.py\", \n", " script_name=\"merge.py\", \n",
" arguments=[\"--input_green_merge\", cleansed_green_data, \n", " arguments=[\"--output_merge\", merged_data],\n",
" \"--input_yellow_merge\", cleansed_yellow_data,\n", " inputs=[cleansed_green_data.parse_parquet_files(file_extension=None),\n",
" \"--output_merge\", merged_data],\n", " cleansed_yellow_data.parse_parquet_files(file_extension=None)],\n",
" inputs=[cleansed_green_data, cleansed_yellow_data],\n",
" outputs=[merged_data],\n", " outputs=[merged_data],\n",
" compute_target=aml_compute,\n", " compute_target=aml_compute,\n",
" runconfig=aml_run_config,\n", " runconfig=aml_run_config,\n",
@@ -466,7 +482,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# Define output after merging step\n", "# Define output after merging step\n",
"filtered_data = PipelineData(\"filtered_data\", datastore=default_store)\n", "filtered_data = PipelineData(\"filtered_data\", datastore=default_store).as_dataset()\n",
"\n", "\n",
"print('Filter script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "print('Filter script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n",
"\n", "\n",
@@ -475,9 +491,8 @@
"filterStep = PythonScriptStep(\n", "filterStep = PythonScriptStep(\n",
" name=\"Filter Taxi Data\",\n", " name=\"Filter Taxi Data\",\n",
" script_name=\"filter.py\", \n", " script_name=\"filter.py\", \n",
" arguments=[\"--input_filter\", merged_data, \n", " arguments=[\"--output_filter\", filtered_data],\n",
" \"--output_filter\", filtered_data],\n", " inputs=[merged_data.parse_parquet_files(file_extension=None)],\n",
" inputs=[merged_data],\n",
" outputs=[filtered_data],\n", " outputs=[filtered_data],\n",
" compute_target=aml_compute,\n", " compute_target=aml_compute,\n",
" runconfig = aml_run_config,\n", " runconfig = aml_run_config,\n",
@@ -503,7 +518,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# Define output after normalize step\n", "# Define output after normalize step\n",
"normalized_data = PipelineData(\"normalized_data\", datastore=default_store)\n", "normalized_data = PipelineData(\"normalized_data\", datastore=default_store).as_dataset()\n",
"\n", "\n",
"print('Normalize script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "print('Normalize script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n",
"\n", "\n",
@@ -512,9 +527,8 @@
"normalizeStep = PythonScriptStep(\n", "normalizeStep = PythonScriptStep(\n",
" name=\"Normalize Taxi Data\",\n", " name=\"Normalize Taxi Data\",\n",
" script_name=\"normalize.py\", \n", " script_name=\"normalize.py\", \n",
" arguments=[\"--input_normalize\", filtered_data, \n", " arguments=[\"--output_normalize\", normalized_data],\n",
" \"--output_normalize\", normalized_data],\n", " inputs=[filtered_data.parse_parquet_files(file_extension=None)],\n",
" inputs=[filtered_data],\n",
" outputs=[normalized_data],\n", " outputs=[normalized_data],\n",
" compute_target=aml_compute,\n", " compute_target=aml_compute,\n",
" runconfig = aml_run_config,\n", " runconfig = aml_run_config,\n",
@@ -544,8 +558,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Define output after transforme step\n", "# Define output after transform step\n",
"transformed_data = PipelineData(\"transformed_data\", datastore=default_store)\n", "transformed_data = PipelineData(\"transformed_data\", datastore=default_store).as_dataset()\n",
"\n", "\n",
"print('Transform script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "print('Transform script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n",
"\n", "\n",
@@ -554,9 +568,8 @@
"transformStep = PythonScriptStep(\n", "transformStep = PythonScriptStep(\n",
" name=\"Transform Taxi Data\",\n", " name=\"Transform Taxi Data\",\n",
" script_name=\"transform.py\", \n", " script_name=\"transform.py\", \n",
" arguments=[\"--input_transform\", normalized_data,\n", " arguments=[\"--output_transform\", transformed_data],\n",
" \"--output_transform\", transformed_data],\n", " inputs=[normalized_data.parse_parquet_files(file_extension=None)],\n",
" inputs=[normalized_data],\n",
" outputs=[transformed_data],\n", " outputs=[transformed_data],\n",
" compute_target=aml_compute,\n", " compute_target=aml_compute,\n",
" runconfig = aml_run_config,\n", " runconfig = aml_run_config,\n",
@@ -571,8 +584,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Extract features\n", "### Split the data into train and test sets\n",
"Add the following columns to be features for our model creation. The prediction value will be *cost*." "This function segregates the data into dataset for model training and dataset for testing."
] ]
}, },
{ {
@@ -581,92 +594,11 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"feature_columns = str(['pickup_weekday','pickup_hour', 'distance','passengers', 'vendor']).replace(\",\", \";\")\n",
"\n",
"train_model_folder = './scripts/trainmodel'\n", "train_model_folder = './scripts/trainmodel'\n",
"\n", "\n",
"print('Extract script is in {}.'.format(os.path.realpath(train_model_folder)))\n",
"\n",
"# features data after transform step\n",
"features_data = PipelineData(\"features_data\", datastore=default_store)\n",
"\n",
"# featurization step creation\n",
"# See the featurization.py for details about input and output\n",
"featurizationStep = PythonScriptStep(\n",
" name=\"Extract Features\",\n",
" script_name=\"featurization.py\", \n",
" arguments=[\"--input_featurization\", transformed_data, \n",
" \"--useful_columns\", feature_columns,\n",
" \"--output_featurization\", features_data],\n",
" inputs=[transformed_data],\n",
" outputs=[features_data],\n",
" compute_target=aml_compute,\n",
" runconfig = aml_run_config,\n",
" source_directory=train_model_folder,\n",
" allow_reuse=True\n",
")\n",
"\n",
"print(\"featurizationStep created.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extract label"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"label_columns = str(['cost']).replace(\",\", \";\")\n",
"\n",
"# label data after transform step\n",
"label_data = PipelineData(\"label_data\", datastore=default_store)\n",
"\n",
"print('Extract script is in {}.'.format(os.path.realpath(train_model_folder)))\n",
"\n",
"# label step creation\n",
"# See the featurization.py for details about input and output\n",
"labelStep = PythonScriptStep(\n",
" name=\"Extract Labels\",\n",
" script_name=\"featurization.py\", \n",
" arguments=[\"--input_featurization\", transformed_data, \n",
" \"--useful_columns\", label_columns,\n",
" \"--output_featurization\", label_data],\n",
" inputs=[transformed_data],\n",
" outputs=[label_data],\n",
" compute_target=aml_compute,\n",
" runconfig = aml_run_config,\n",
" source_directory=train_model_folder,\n",
" allow_reuse=True\n",
")\n",
"\n",
"print(\"labelStep created.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Split the data into train and test sets\n",
"This function segregates the data into the **x**, features, dataset for model training and **y**, values to predict, dataset for testing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# train and test splits output\n", "# train and test splits output\n",
"output_split_train_x = PipelineData(\"output_split_train_x\", datastore=default_store)\n", "output_split_train = PipelineData(\"output_split_train\", datastore=default_store).as_dataset()\n",
"output_split_train_y = PipelineData(\"output_split_train_y\", datastore=default_store)\n", "output_split_test = PipelineData(\"output_split_test\", datastore=default_store).as_dataset()\n",
"output_split_test_x = PipelineData(\"output_split_test_x\", datastore=default_store)\n",
"output_split_test_y = PipelineData(\"output_split_test_y\", datastore=default_store)\n",
"\n", "\n",
"print('Data spilt script is in {}.'.format(os.path.realpath(train_model_folder)))\n", "print('Data spilt script is in {}.'.format(os.path.realpath(train_model_folder)))\n",
"\n", "\n",
@@ -675,14 +607,10 @@
"testTrainSplitStep = PythonScriptStep(\n", "testTrainSplitStep = PythonScriptStep(\n",
" name=\"Train Test Data Split\",\n", " name=\"Train Test Data Split\",\n",
" script_name=\"train_test_split.py\", \n", " script_name=\"train_test_split.py\", \n",
" arguments=[\"--input_split_features\", features_data, \n", " arguments=[\"--output_split_train\", output_split_train,\n",
" \"--input_split_labels\", label_data,\n", " \"--output_split_test\", output_split_test],\n",
" \"--output_split_train_x\", output_split_train_x,\n", " inputs=[transformed_data.parse_parquet_files(file_extension=None)],\n",
" \"--output_split_train_y\", output_split_train_y,\n", " outputs=[output_split_train, output_split_test],\n",
" \"--output_split_test_x\", output_split_test_x,\n",
" \"--output_split_test_y\", output_split_test_y],\n",
" inputs=[features_data, label_data],\n",
" outputs=[output_split_train_x, output_split_train_y, output_split_test_x, output_split_test_y],\n",
" compute_target=aml_compute,\n", " compute_target=aml_compute,\n",
" runconfig = aml_run_config,\n", " runconfig = aml_run_config,\n",
" source_directory=train_model_folder,\n", " source_directory=train_model_folder,\n",
@@ -697,7 +625,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Use automated machine learning to build regression model\n", "## Use automated machine learning to build regression model\n",
"Now we will use **automated machine learning** to build the regression model. We will use [AutoMLStep](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py) in AML Pipelines for this part. These functions use various features from the data set and allow an automated model to build relationships between the features and the price of a taxi trip." "Now we will use **automated machine learning** to build the regression model. We will use [AutoMLStep](https://docs.microsoft.com/python/api/azureml-train-automl-runtime/azureml.train.automl.runtime.automl_step.automlstep?view=azure-ml-py) in AML Pipelines for this part. Perform `pip install azureml-sdk[automl]`to get the automated machine learning package. These functions use various features from the data set and allow an automated model to build relationships between the features and the price of a taxi trip."
] ]
}, },
{ {
@@ -727,52 +655,13 @@
"print(\"Experiment created\")" "print(\"Experiment created\")"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create get_data script\n",
"\n",
"A script with `get_data()` function is necessary to fetch training features(X) and labels(Y) on remote compute, from input data. Here we use mounted path of `train_test_split` step to get the x and y train values. They are added as environment variable on compute machine by default\n",
"\n",
"Note: Every DataReference are added as environment variable on compute machine since the defualt mode is mount"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('get_data.py will be written to {}.'.format(os.path.realpath(train_model_folder)))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile $train_model_folder/get_data.py\n",
"import os\n",
"import pandas as pd\n",
"\n",
"def get_data():\n",
" print(\"In get_data\")\n",
" print(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'])\n",
" X_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'] + \"/part-00000\", header=0)\n",
" y_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_y'] + \"/part-00000\", header=0)\n",
" \n",
" return { \"X\" : X_train.values, \"y\" : y_train.values.flatten() }"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Define settings for autogeneration and tuning\n", "#### Define settings for autogeneration and tuning\n",
"\n", "\n",
"Here we define the experiment parameter and model settings for autogeneration and tuning. We can specify automl_settings as **kwargs as well. Also note that we have to use a get_data() function for remote excutions. See get_data script for more details.\n", "Here we define the experiment parameter and model settings for autogeneration and tuning. We can specify automl_settings as **kwargs as well.\n",
"\n", "\n",
"Use your defined training settings as a parameter to an `AutoMLConfig` object. Additionally, specify your training data and the type of model, which is `regression` in this case.\n", "Use your defined training settings as a parameter to an `AutoMLConfig` object. Additionally, specify your training data and the type of model, which is `regression` in this case.\n",
"\n", "\n",
@@ -793,17 +682,18 @@
" \"iteration_timeout_minutes\" : 10,\n", " \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n", " \"iterations\" : 2,\n",
" \"primary_metric\" : 'spearman_correlation',\n", " \"primary_metric\" : 'spearman_correlation',\n",
" \"preprocess\" : True,\n",
" \"verbosity\" : logging.INFO,\n",
" \"n_cross_validations\": 5\n", " \"n_cross_validations\": 5\n",
"}\n", "}\n",
"\n", "\n",
"training_dataset = output_split_train.parse_parquet_files(file_extension=None).keep_columns(['pickup_weekday','pickup_hour', 'distance','passengers', 'vendor', 'cost'])\n",
"\n",
"automl_config = AutoMLConfig(task = 'regression',\n", "automl_config = AutoMLConfig(task = 'regression',\n",
" debug_log = 'automated_ml_errors.log',\n", " debug_log = 'automated_ml_errors.log',\n",
" path = train_model_folder,\n", " path = train_model_folder,\n",
" compute_target=aml_compute,\n", " compute_target = aml_compute,\n",
" run_configuration=aml_run_config,\n", " featurization = 'auto',\n",
" data_script = train_model_folder + \"/get_data.py\",\n", " training_data = training_dataset,\n",
" label_column_name = 'cost',\n",
" **automl_settings)\n", " **automl_settings)\n",
" \n", " \n",
"print(\"AutoML config created.\")" "print(\"AutoML config created.\")"
@@ -822,15 +712,12 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.automl.runtime import AutoMLStep\n", "from azureml.pipeline.steps import AutoMLStep\n",
"\n",
"trainWithAutomlStep = AutoMLStep(\n",
" name='AutoML_Regression',\n",
" automl_config=automl_config,\n",
" inputs=[output_split_train_x, output_split_train_y],\n",
" allow_reuse=True,\n",
" hash_paths=[os.path.realpath(train_model_folder)])\n",
"\n", "\n",
"trainWithAutomlStep = AutoMLStep(name='AutoML_Regression',\n",
" automl_config=automl_config,\n",
" passthru_automl_config=False,\n",
" allow_reuse=True)\n",
"print(\"trainWithAutomlStep created.\")" "print(\"trainWithAutomlStep created.\")"
] ]
}, },
@@ -892,12 +779,11 @@
" return path\n", " return path\n",
"\n", "\n",
"def fetch_df(step, output_name):\n", "def fetch_df(step, output_name):\n",
" output_data = step.get_output_data(output_name)\n", " output_data = step.get_output_data(output_name) \n",
" \n",
" download_path = './outputs/' + output_name\n", " download_path = './outputs/' + output_name\n",
" output_data.download(download_path)\n", " output_data.download(download_path, overwrite=True)\n",
" df_path = get_download_path(download_path, output_name) + '/part-00000'\n", " df_path = get_download_path(download_path, output_name) + '/processed.parquet'\n",
" return dprep.auto_read_file(path=df_path)" " return pd.read_parquet(df_path)"
] ]
}, },
{ {
@@ -939,7 +825,7 @@
"merge_step = pipeline_run.find_step_run(mergingStep.name)[0]\n", "merge_step = pipeline_run.find_step_run(mergingStep.name)[0]\n",
"combined_df = fetch_df(merge_step, merged_data.name)\n", "combined_df = fetch_df(merge_step, merged_data.name)\n",
"\n", "\n",
"display(combined_df.get_profile())" "display(combined_df.describe())"
] ]
}, },
{ {
@@ -958,7 +844,7 @@
"filter_step = pipeline_run.find_step_run(filterStep.name)[0]\n", "filter_step = pipeline_run.find_step_run(filterStep.name)[0]\n",
"filtered_df = fetch_df(filter_step, filtered_data.name)\n", "filtered_df = fetch_df(filter_step, filtered_data.name)\n",
"\n", "\n",
"display(filtered_df.get_profile())" "display(filtered_df.describe())"
] ]
}, },
{ {
@@ -996,7 +882,7 @@
"transform_step = pipeline_run.find_step_run(transformStep.name)[0]\n", "transform_step = pipeline_run.find_step_run(transformStep.name)[0]\n",
"transformed_df = fetch_df(transform_step, transformed_data.name)\n", "transformed_df = fetch_df(transform_step, transformed_data.name)\n",
"\n", "\n",
"display(transformed_df.get_profile())\n", "display(transformed_df.describe())\n",
"display(transformed_df.head(5))" "display(transformed_df.head(5))"
] ]
}, },
@@ -1014,16 +900,10 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n", "split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n",
"train_split_x = fetch_df(split_step, output_split_train_x.name)\n", "train_split = fetch_df(split_step, output_split_train.name)\n",
"train_split_y = fetch_df(split_step, output_split_train_y.name)\n",
"\n", "\n",
"display_x_train = train_split_x.keep_columns(columns=[\"vendor\", \"pickup_weekday\", \"pickup_hour\", \"passengers\", \"distance\"])\n", "display(train_split.describe())\n",
"display_y_train = train_split_y.rename_columns(column_pairs={\"Column1\": \"cost\"})\n", "display(train_split.head(5))"
"\n",
"display(display_x_train.get_profile())\n",
"display(display_x_train.head(5))\n",
"display(display_y_train.get_profile())\n",
"display(display_y_train.head(5))"
] ]
}, },
{ {
@@ -1125,14 +1005,11 @@
"source": [ "source": [
"# split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n", "# split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n",
"\n", "\n",
"# x_test = fetch_df(split_step, output_split_test_x.name)\n", "# x_test = fetch_df(split_step, output_split_test.name)[['distance','passengers', 'vendor','pickup_weekday','pickup_hour']]\n",
"# y_test = fetch_df(split_step, output_split_test_y.name)\n", "# y_test = fetch_df(split_step, output_split_test.name)[['cost']]\n",
"\n", "\n",
"# display(x_test.keep_columns(columns=[\"vendor\", \"pickup_weekday\", \"pickup_hour\", \"passengers\", \"distance\"]).head(5))\n", "# display(x_test.head(5))\n",
"# display(y_test.rename_columns(column_pairs={\"Column1\": \"cost\"}).head(5))\n", "# display(y_test.head(5))"
"\n",
"# x_test = x_test.to_pandas_dataframe()\n",
"# y_test = y_test.to_pandas_dataframe()"
] ]
}, },
{ {
@@ -1150,9 +1027,9 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# y_predict = fitted_model.predict(x_test.values)\n", "# y_predict = fitted_model.predict(x_test)\n",
"\n", "\n",
"# y_actual = y_test.iloc[:,0].values.tolist()\n", "# y_actual = y_test.values.tolist()\n",
"\n", "\n",
"# display(pd.DataFrame({'Actual':y_actual, 'Predicted':y_predict}).head(5))" "# display(pd.DataFrame({'Actual':y_actual, 'Predicted':y_predict}).head(5))"
] ]
@@ -1168,7 +1045,7 @@
"# fig = plt.figure(figsize=(14, 10))\n", "# fig = plt.figure(figsize=(14, 10))\n",
"# ax1 = fig.add_subplot(111)\n", "# ax1 = fig.add_subplot(111)\n",
"\n", "\n",
"# distance_vals = [x[4] for x in x_test.values]\n", "# distance_vals = [x[0] for x in x_test.values]\n",
"\n", "\n",
"# ax1.scatter(distance_vals[:100], y_predict[:100], s=18, c='b', marker=\"s\", label='Predicted')\n", "# ax1.scatter(distance_vals[:100], y_predict[:100], s=18, c='b', marker=\"s\", label='Predicted')\n",
"# ax1.scatter(distance_vals[:100], y_actual[:100], s=18, c='r', marker=\"o\", label='Actual')\n", "# ax1.scatter(distance_vals[:100], y_actual[:100], s=18, c='r', marker=\"o\", label='Actual')\n",
@@ -1204,7 +1081,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.7" "version": "3.6.9"
} }
}, },
"nbformat": 4, "nbformat": 4,

View File

@@ -4,6 +4,7 @@ dependencies:
- azureml-sdk - azureml-sdk
- azureml-widgets - azureml-widgets
- azureml-opendatasets - azureml-opendatasets
- azureml-dataprep
- azureml-train-automl - azureml-train-automl
- matplotlib - matplotlib
- pandas
- pyarrow

View File

@@ -3,15 +3,14 @@
import argparse import argparse
import os import os
import pandas as pd from azureml.core import Run
import azureml.dataprep as dprep
def get_dict(dict_str): def get_dict(dict_str):
pairs = dict_str.strip("{}").split("\;") pairs = dict_str.strip("{}").split("\;")
new_dict = {} new_dict = {}
for pair in pairs: for pair in pairs:
key, value = pair.strip('\\').split(":") key, value = pair.strip().split(":")
new_dict[key.strip().strip("'")] = value.strip().strip("'") new_dict[key.strip().strip("'")] = value.strip().strip("'")
return new_dict return new_dict
@@ -19,40 +18,37 @@ def get_dict(dict_str):
print("Cleans the input data") print("Cleans the input data")
# Get the input green_taxi_data. To learn more about how to access dataset in your script, please
# see https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets.
run = Run.get_context()
raw_data = run.input_datasets["raw_data"]
parser = argparse.ArgumentParser("cleanse") parser = argparse.ArgumentParser("cleanse")
parser.add_argument("--input_cleanse", type=str, help="raw taxi data")
parser.add_argument("--output_cleanse", type=str, help="cleaned taxi data directory") parser.add_argument("--output_cleanse", type=str, help="cleaned taxi data directory")
parser.add_argument("--useful_columns", type=str, help="useful columns to keep") parser.add_argument("--useful_columns", type=str, help="useful columns to keep")
parser.add_argument("--columns", type=str, help="rename column pattern") parser.add_argument("--columns", type=str, help="rename column pattern")
args = parser.parse_args() args = parser.parse_args()
print("Argument 1(input taxi data path): %s" % args.input_cleanse) print("Argument 1(columns to keep): %s" % str(args.useful_columns.strip("[]").split("\;")))
print("Argument 2(columns to keep): %s" % str(args.useful_columns.strip("[]").split("\;"))) print("Argument 2(columns renaming mapping): %s" % str(args.columns.strip("{}").split("\;")))
print("Argument 3(columns renaming mapping): %s" % str(args.columns.strip("{}").split("\;"))) print("Argument 3(output cleansed taxi data path): %s" % args.output_cleanse)
print("Argument 4(output cleansed taxi data path): %s" % args.output_cleanse)
raw_df = dprep.read_csv(path=args.input_cleanse, header=dprep.PromoteHeadersMode.GROUPED) # These functions ensure that null data is removed from the dataset,
# These functions ensure that null data is removed from the data set,
# which will help increase machine learning model accuracy. # which will help increase machine learning model accuracy.
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep
# for more details
useful_columns = [s.strip().strip("'") for s in args.useful_columns.strip("[]").split("\;")] useful_columns = [s.strip().strip("'") for s in args.useful_columns.strip("[]").split("\;")]
columns = get_dict(args.columns) columns = get_dict(args.columns)
all_columns = dprep.ColumnSelector(term=".*", use_regex=True) new_df = (raw_data.to_pandas_dataframe()
drop_if_all_null = [all_columns, dprep.ColumnRelationship(dprep.ColumnRelationship.ALL)] .dropna(how='all')
.rename(columns=columns))[useful_columns]
new_df = (raw_df new_df.reset_index(inplace=True, drop=True)
.replace_na(columns=all_columns)
.drop_nulls(*drop_if_all_null)
.rename_columns(column_pairs=columns)
.keep_columns(columns=useful_columns))
if not (args.output_cleanse is None): if not (args.output_cleanse is None):
os.makedirs(args.output_cleanse, exist_ok=True) os.makedirs(args.output_cleanse, exist_ok=True)
print("%s created" % args.output_cleanse) print("%s created" % args.output_cleanse)
write_df = new_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_cleanse)) path = args.output_cleanse + "/processed.parquet"
write_df.run_local() write_df = new_df.to_parquet(path)

View File

@@ -1,55 +1,47 @@
import argparse import argparse
import os import os
import azureml.dataprep as dprep from azureml.core import Run
print("Filters out coordinates for locations that are outside the city border.", print("Filters out coordinates for locations that are outside the city border.",
"Chain the column filter commands within the filter() function", "Chain the column filter commands within the filter() function",
"and define the minimum and maximum bounds for each field.") "and define the minimum and maximum bounds for each field.")
run = Run.get_context()
# To learn more about how to access dataset in your script, please
# see https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets.
merged_data = run.input_datasets["merged_data"]
combined_df = merged_data.to_pandas_dataframe()
parser = argparse.ArgumentParser("filter") parser = argparse.ArgumentParser("filter")
parser.add_argument("--input_filter", type=str, help="merged taxi data directory")
parser.add_argument("--output_filter", type=str, help="filter out out of city locations") parser.add_argument("--output_filter", type=str, help="filter out out of city locations")
args = parser.parse_args() args = parser.parse_args()
print("Argument 1(input taxi data path): %s" % args.input_filter) print("Argument (output filtered taxi data path): %s" % args.output_filter)
print("Argument 2(output filtered taxi data path): %s" % args.output_filter)
combined_df = dprep.read_csv(args.input_filter + '/part-*')
# These functions filter out coordinates for locations that are outside the city border. # These functions filter out coordinates for locations that are outside the city border.
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details
# Create a condensed view of the dataflow to just show the lat/long fields,
# which makes it easier to evaluate missing or out-of-scope coordinates
decimal_type = dprep.TypeConverter(data_type=dprep.FieldType.DECIMAL)
combined_df = combined_df.set_column_types(type_conversions={
"pickup_longitude": decimal_type,
"pickup_latitude": decimal_type,
"dropoff_longitude": decimal_type,
"dropoff_latitude": decimal_type
})
# Filter out coordinates for locations that are outside the city border. # Filter out coordinates for locations that are outside the city border.
# Chain the column filter commands within the filter() function # Chain the column filter commands within the filter() function
# and define the minimum and maximum bounds for each field # and define the minimum and maximum bounds for each field
latlong_filtered_df = (combined_df
.drop_nulls(columns=["pickup_longitude", combined_df = combined_df.astype({"pickup_longitude": 'float64', "pickup_latitude": 'float64',
"pickup_latitude", "dropoff_longitude": 'float64', "dropoff_latitude": 'float64'})
"dropoff_longitude",
"dropoff_latitude"], latlong_filtered_df = combined_df[(combined_df.pickup_longitude <= -73.72) &
column_relationship=dprep.ColumnRelationship(dprep.ColumnRelationship.ANY)) (combined_df.pickup_longitude >= -74.09) &
.filter(dprep.f_and(dprep.col("pickup_longitude") <= -73.72, (combined_df.pickup_latitude <= 40.88) &
dprep.col("pickup_longitude") >= -74.09, (combined_df.pickup_latitude >= 40.53) &
dprep.col("pickup_latitude") <= 40.88, (combined_df.dropoff_longitude <= -73.72) &
dprep.col("pickup_latitude") >= 40.53, (combined_df.dropoff_longitude >= -74.72) &
dprep.col("dropoff_longitude") <= -73.72, (combined_df.dropoff_latitude <= 40.88) &
dprep.col("dropoff_longitude") >= -74.09, (combined_df.dropoff_latitude >= 40.53)]
dprep.col("dropoff_latitude") <= 40.88,
dprep.col("dropoff_latitude") >= 40.53))) latlong_filtered_df.reset_index(inplace=True, drop=True)
if not (args.output_filter is None): if not (args.output_filter is None):
os.makedirs(args.output_filter, exist_ok=True) os.makedirs(args.output_filter, exist_ok=True)
print("%s created" % args.output_filter) print("%s created" % args.output_filter)
write_df = latlong_filtered_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_filter)) path = args.output_filter + "/processed.parquet"
write_df.run_local() write_df = latlong_filtered_df.to_parquet(path)

Some files were not shown because too many files have changed in this diff Show More