Compare commits

...

89 Commits

Author SHA1 Message Date
vizhur
59fcb54998 update samples from Release-42 as a part of SDK release 2020-03-20 18:10:08 +00:00
Harneet Virk
e0ea99a6bb Merge pull request #862 from Azure/release_update/Release-41
update samples from Release-41 as a part of  SDK release
2020-03-13 14:57:58 -07:00
vizhur
b06f5ce269 update samples from Release-41 as a part of SDK release 2020-03-13 21:57:04 +00:00
Harneet Virk
ed0ce9e895 Merge pull request #856 from Azure/release_update/Release-40
update samples from Release-40 as a part of  SDK release
2020-03-12 12:28:18 -07:00
vizhur
71053d705b update samples from Release-40 as a part of SDK release 2020-03-12 19:25:26 +00:00
Harneet Virk
77f98bf75f Merge pull request #852 from Azure/release_update_stable/Release-6
update samples from Release-6 as a part of 1.1.5 SDK stable release
2020-03-11 15:37:59 -06:00
vizhur
e443fd1342 update samples from Release-6 as a part of 1.1.5rc0 SDK stable release 2020-03-11 19:51:02 +00:00
Harneet Virk
2165cf308e update samples from Release-25 as a part of 1.1.2rc0 SDK experimental release (#829)
Co-authored-by: vizhur <vizhur@live.com>
2020-03-02 15:42:04 -05:00
Harneet Virk
3d6caa10a3 Merge pull request #801 from Azure/release_update/Release-39
update samples from Release-39 as a part of  SDK release
2020-02-13 19:03:36 -07:00
vizhur
4df079db1c update samples from Release-39 as a part of SDK release 2020-02-14 02:01:41 +00:00
Sander Vanhove
67d0b02ef9 Fix broken link in README (#797) 2020-02-13 08:20:28 -05:00
Harneet Virk
4e7b3784d5 Merge pull request #788 from Azure/release_update/Release-38
update samples from Release-38 as a part of  SDK release
2020-02-11 13:16:15 -07:00
vizhur
ed91e39d7e update samples from Release-38 as a part of SDK release 2020-02-11 20:00:16 +00:00
Harneet Virk
a09a1a16a7 Merge pull request #780 from Azure/release_update/Release-37
update samples from Release-37 as a part of  SDK release
2020-02-07 21:52:34 -07:00
vizhur
9662505517 update samples from Release-37 as a part of SDK release 2020-02-08 04:49:27 +00:00
Harneet Virk
8e103c02ff Merge pull request #779 from Azure/release_update/Release-36
update samples from Release-36 as a part of  SDK release
2020-02-07 21:40:57 -07:00
vizhur
ecb5157add update samples from Release-36 as a part of SDK release 2020-02-08 04:35:14 +00:00
Shané Winner
d7d23d5e7c Update index.md 2020-02-05 22:41:22 -08:00
Harneet Virk
83a21ba53a update samples from Release-35 as a part of SDK release (#765)
Co-authored-by: vizhur <vizhur@live.com>
2020-02-05 20:03:41 -05:00
Harneet Virk
3c9cb89c1a update samples from Release-18 as a part of 1.1.0rc0 SDK experimental release (#760)
Co-authored-by: vizhur <vizhur@live.com>
2020-02-04 22:19:52 -05:00
Sheri Gilley
cca7c2e26f add cell metadata 2020-02-04 11:31:07 -06:00
Harneet Virk
e895d7c2bf update samples - test (#758)
Co-authored-by: vizhur <vizhur@live.com>
2020-01-31 15:19:58 -05:00
Shané Winner
3588eb9665 Update index.md 2020-01-23 15:46:43 -08:00
Harneet Virk
a09e726f31 update samples - test (#748)
Co-authored-by: vizhur <vizhur@live.com>
2020-01-23 16:50:29 -05:00
Shané Winner
4fb1d9ee5b Update index.md 2020-01-22 11:38:24 -08:00
Harneet Virk
b05ff80e9d update samples from Release-169 as a part of 1.0.85 SDK release (#742)
Co-authored-by: vizhur <vizhur@live.com>
2020-01-21 18:00:15 -05:00
Shané Winner
512630472b Update index.md 2020-01-08 14:52:23 -08:00
vizhur
ae1337fe70 Merge pull request #724 from Azure/release_update/Release-167
update samples from Release-167 as a part of 1.0.83 SDK release
2020-01-06 15:38:25 -05:00
vizhur
c95f970dc8 update samples from Release-167 as a part of 1.0.83 SDK release 2020-01-06 20:16:21 +00:00
Shané Winner
9b9d112719 Update index.md 2019-12-24 07:40:48 -08:00
vizhur
fe8fcd4b48 Merge pull request #712 from Azure/release_update/Release-31
update samples - test
2019-12-23 20:28:02 -05:00
vizhur
296ae01587 update samples - test 2019-12-24 00:42:48 +00:00
Shané Winner
8f4efe15eb Update index.md 2019-12-10 09:05:23 -08:00
vizhur
d179080467 Merge pull request #690 from Azure/release_update/Release-163
update samples from Release-163 as a part of 1.0.79 SDK release
2019-12-09 15:41:03 -05:00
vizhur
0040644e7a update samples from Release-163 as a part of 1.0.79 SDK release 2019-12-09 20:09:30 +00:00
Shané Winner
8aa04307fb Update index.md 2019-12-03 10:24:18 -08:00
Shané Winner
a525da4488 Update index.md 2019-11-27 13:08:21 -08:00
Shané Winner
e149565a8a Merge pull request #679 from Azure/release_update/Release-30
update samples - test
2019-11-27 13:05:00 -08:00
vizhur
75610ec31c update samples - test 2019-11-27 21:02:21 +00:00
Shané Winner
0c2c450b6b Update index.md 2019-11-25 14:34:48 -08:00
Shané Winner
0d548eabff Merge pull request #677 from Azure/release_update/Release-29
update samples - test
2019-11-25 14:31:50 -08:00
vizhur
e4029801e6 update samples - test 2019-11-25 22:24:09 +00:00
Shané Winner
156974ee7b Update index.md 2019-11-25 11:42:53 -08:00
Shané Winner
1f05157d24 Merge pull request #676 from Azure/release_update/Release-160
update samples from Release-160 as a part of 1.0.76 SDK release
2019-11-25 11:39:27 -08:00
vizhur
2214ea8616 update samples from Release-160 as a part of 1.0.76 SDK release 2019-11-25 19:28:19 +00:00
Sheri Gilley
b54b2566de Merge pull request #667 from Azure/sdk-codetest
remove deprecated auto_prepare_environment
2019-11-21 09:25:15 -06:00
Sheri Gilley
57b0f701f8 remove deprecated auto_prepare_environment 2019-11-20 17:28:44 -06:00
Shané Winner
d658c85208 Update index.md 2019-11-12 14:59:15 -08:00
vizhur
a5f627a9b6 Merge pull request #655 from Azure/release_update/Release-28
update samples - test
2019-11-12 17:11:45 -05:00
vizhur
a8b08bdff0 update samples - test 2019-11-12 21:53:12 +00:00
Shané Winner
0dc3f34b86 Update index.md 2019-11-11 14:49:44 -08:00
Shané Winner
9ba7d5e5bb Update index.md 2019-11-11 14:48:05 -08:00
Shané Winner
c6ad2f8ec0 Merge pull request #654 from Azure/release_update/Release-158
update samples from Release-158 as a part of 1.0.74 SDK release
2019-11-11 10:25:18 -08:00
vizhur
33d6def8c3 update samples from Release-158 as a part of 1.0.74 SDK release 2019-11-11 16:57:02 +00:00
Shané Winner
69d4344dff Update index.md 2019-11-04 10:09:41 -08:00
Shané Winner
34aeec1439 Update index.md 2019-11-04 10:08:10 -08:00
Shané Winner
a9b9ebbf7d Merge pull request #641 from Azure/release_update/Release-27
update samples - test
2019-11-04 10:02:25 -08:00
vizhur
41fa508d53 update samples - test 2019-11-04 17:57:28 +00:00
Shané Winner
e1bfa98844 Update index.md 2019-11-04 08:41:15 -08:00
Shané Winner
2bcee9aa20 Update index.md 2019-11-04 08:40:29 -08:00
Shané Winner
37541b1071 Merge pull request #638 from Azure/release_update/Release-26
update samples - test
2019-11-04 08:31:59 -08:00
Shané Winner
4aff1310a7 Merge branch 'master' into release_update/Release-26 2019-11-04 08:31:37 -08:00
Shané Winner
51ecb7c54f Update index.md 2019-11-01 10:38:46 -07:00
Shané Winner
4e7fc7c82c Update index.md 2019-11-01 10:36:02 -07:00
Sheri Gilley
7db93bcb1d update comments 2019-01-22 17:18:19 -06:00
Sheri Gilley
fcbe925640 Merge branch 'sdk-codetest' of https://github.com/Azure/MachineLearningNotebooks into sdk-codetest 2019-01-07 13:06:12 -06:00
Sheri Gilley
bedfbd649e fix files 2019-01-07 13:06:02 -06:00
Sheri Gilley
fb760f648d Delete temp.py 2019-01-07 12:58:32 -06:00
Sheri Gilley
a9a0713d2f Delete donotupload.py 2019-01-07 12:57:58 -06:00
Sheri Gilley
c9d018b52c remove prepare environment 2019-01-07 12:56:54 -06:00
Sheri Gilley
53dbd0afcf hdi run config code 2019-01-07 11:29:40 -06:00
Sheri Gilley
e3a64b1f16 code for remote vm 2019-01-04 12:51:11 -06:00
Sheri Gilley
732eecfc7c update names 2019-01-04 12:45:28 -06:00
Sheri Gilley
6995c086ff change snippet names 2019-01-03 22:39:06 -06:00
Sheri Gilley
80bba4c7ae code for amlcompute section 2019-01-03 18:55:31 -06:00
Sheri Gilley
3c581b533f for local computer 2019-01-03 18:07:12 -06:00
Sheri Gilley
cc688caa4e change names 2019-01-03 08:53:49 -06:00
Sheri Gilley
da225e116e new code 2019-01-03 08:02:35 -06:00
Sheri Gilley
73c5d02880 Update quickstart.py 2018-12-17 12:23:03 -06:00
Sheri Gilley
e472b54f1b Update quickstart.py 2018-12-17 12:22:40 -06:00
Sheri Gilley
716c6d8bb1 add quickstart code 2018-11-06 11:27:58 -06:00
Sheri Gilley
23189c6f40 move folder 2018-10-17 16:24:46 -05:00
Sheri Gilley
361b57ed29 change all names to camelCase 2018-10-17 11:47:09 -05:00
Sheri Gilley
3f531fd211 try camelCase 2018-10-17 11:09:46 -05:00
Sheri Gilley
111f5e8d73 playing around 2018-10-17 10:46:33 -05:00
Sheri Gilley
96c59d5c2b testing 2018-10-17 09:56:04 -05:00
Sheri Gilley
ce3214b7c6 fix name 2018-10-16 17:33:24 -05:00
Sheri Gilley
53199d17de add delete 2018-10-16 16:54:08 -05:00
Sheri Gilley
54c883412c add test service 2018-10-16 16:49:41 -05:00
236 changed files with 8479 additions and 8492 deletions

View File

@@ -2,7 +2,7 @@
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud. This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
![Azure ML Workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/concept-azure-machine-learning-architecture/workflow.png) ![Azure ML Workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/media/concept-azure-machine-learning-architecture/workflow.png)
## Quick installation ## Quick installation
@@ -13,15 +13,15 @@ Read more detailed instructions on [how to set up your environment](./NBSETUP.md
## How to navigate and use the example notebooks? ## How to navigate and use the example notebooks?
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
This [index](.index.md) should assist in navigating the Azure Machine Learning notebook samples and encourage efficient retrieval of topics and content. This [index](./index.md) should assist in navigating the Azure Machine Learning notebook samples and encourage efficient retrieval of topics and content.
If you want to... If you want to...
* ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/img-classification-part2-deploy.ipynb). * ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/image-classification-mnist-data/img-classification-part2-deploy.ipynb).
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb). * ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb). * ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb). * ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring). * ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb). * ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb).
## Tutorials ## Tutorials

View File

@@ -103,7 +103,7 @@
"source": [ "source": [
"import azureml.core\n", "import azureml.core\n",
"\n", "\n",
"print(\"This notebook was created using version 1.0.72 of the Azure ML SDK\")\n", "print(\"This notebook was created using version 1.1.5 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
] ]
}, },

View File

@@ -9,7 +9,6 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure. * [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs. * [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
* [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history. * [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history.
* [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management.
* [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service. * [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service.
* [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service. * [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service.

View File

@@ -144,7 +144,7 @@ jupyter notebook
- Dataset: forecasting for a bike-sharing - Dataset: forecasting for a bike-sharing
- Example of training an automated ML forecasting model on multiple time-series - Example of training an automated ML forecasting model on multiple time-series
- [automl-forecasting-function.ipynb](forecasting-high-frequency/automl-forecasting-function.ipynb) - [auto-ml-forecasting-function.ipynb](forecasting-high-frequency/auto-ml-forecasting-function.ipynb)
- Example of training an automated ML forecasting model on multiple time-series - Example of training an automated ML forecasting model on multiple time-series
- [auto-ml-forecasting-beer-remote.ipynb](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb) - [auto-ml-forecasting-beer-remote.ipynb](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb)
@@ -154,6 +154,12 @@ jupyter notebook
- [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb) - [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb)
- Continous retraining using Pipelines and Time-Series TabularDataset - Continous retraining using Pipelines and Time-Series TabularDataset
- [auto-ml-classification-text-dnn.ipynb](classification-text-dnn/auto-ml-classification-text-dnn.ipynb)
- Classification with text data using deep learning in AutoML
- AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data.
- Depending on the compute cluster the user provides, AutoML tried out Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used.
- Bidirectional Long-Short Term neural network (BiLSTM) when a CPU compute is used, thereby optimizing the choice of DNN for the uesr's setup.
<a name="documentation"></a> <a name="documentation"></a>
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments. See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
@@ -191,6 +197,17 @@ If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execut
4) Check that the region is one of the supported regions: `eastus2`, `eastus`, `westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus` 4) Check that the region is one of the supported regions: `eastus2`, `eastus`, `westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`
5) Check that you have access to the region using the Azure Portal. 5) Check that you have access to the region using the Azure Portal.
## import AutoMLConfig fails after upgrade from before 1.0.76 to 1.0.76 or later
There were package changes in automated machine learning version 1.0.76, which require the previous version to be uninstalled before upgrading to the new version.
If you have manually upgraded from a version of automated machine learning before 1.0.76 to 1.0.76 or later, you may get the error:
`ImportError: cannot import name 'AutoMLConfig'`
This can be resolved by running:
`pip uninstall azureml-train-automl` and then
`pip install azureml-train-automl`
The automl_setup.cmd script does this automatically.
## workspace.from_config fails ## workspace.from_config fails
If the call `ws = Workspace.from_config()` fails: If the call `ws = Workspace.from_config()` fails:
1) Make sure that you have run the `configuration.ipynb` notebook successfully. 1) Make sure that you have run the `configuration.ipynb` notebook successfully.

View File

@@ -2,8 +2,9 @@ name: azure_automl
dependencies: dependencies:
# The python interpreter version. # The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later. # Currently Azure ML only supports 3.5.2 and later.
- pip - pip<=19.3.1
- python>=3.5.2,<3.6.8 - python>=3.5.2,<3.6.8
- wheel==0.30.0
- nb_conda - nb_conda
- matplotlib==2.1.0 - matplotlib==2.1.0
- numpy>=1.16.0,<=1.16.2 - numpy>=1.16.0,<=1.16.2
@@ -13,16 +14,24 @@ dependencies:
- scikit-learn>=0.19.0,<=0.20.3 - scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<=0.23.4 - pandas>=0.22.0,<=0.23.4
- py-xgboost<=0.80 - py-xgboost<=0.80
- pyarrow>=0.11.0 - fbprophet==0.5
- conda-forge::fbprophet==0.5 - pytorch=1.1.0
- cudatoolkit=9.0
- pip: - pip:
# Required packages for AzureML execution, history, and data preparation. # Required packages for AzureML execution, history, and data preparation.
- azureml-defaults - azureml-defaults
- azureml-dataprep[pandas]
- azureml-train-automl - azureml-train-automl
- azureml-train - azureml-train
- azureml-widgets - azureml-widgets
- azureml-explain-model - azureml-pipeline
- azureml-contrib-interpret - azureml-contrib-interpret
- pandas_ml - pytorch-transformers==1.0.0
- spacy==2.1.8
- onnxruntime==1.0.0
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
channels:
- conda-forge
- pytorch

View File

@@ -2,9 +2,10 @@ name: azure_automl
dependencies: dependencies:
# The python interpreter version. # The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later. # Currently Azure ML only supports 3.5.2 and later.
- pip - pip<=19.3.1
- nomkl - nomkl
- python>=3.5.2,<3.6.8 - python>=3.5.2,<3.6.8
- wheel==0.30.0
- nb_conda - nb_conda
- matplotlib==2.1.0 - matplotlib==2.1.0
- numpy>=1.16.0,<=1.16.2 - numpy>=1.16.0,<=1.16.2
@@ -14,16 +15,24 @@ dependencies:
- scikit-learn>=0.19.0,<=0.20.3 - scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<0.23.0 - pandas>=0.22.0,<0.23.0
- py-xgboost<=0.80 - py-xgboost<=0.80
- pyarrow>=0.11.0 - fbprophet==0.5
- conda-forge::fbprophet==0.5 - pytorch=1.1.0
- cudatoolkit=9.0
- pip: - pip:
# Required packages for AzureML execution, history, and data preparation. # Required packages for AzureML execution, history, and data preparation.
- azureml-defaults - azureml-defaults
- azureml-dataprep[pandas]
- azureml-train-automl - azureml-train-automl
- azureml-train - azureml-train
- azureml-widgets - azureml-widgets
- azureml-explain-model - azureml-pipeline
- azureml-contrib-interpret - azureml-contrib-interpret
- pandas_ml - pytorch-transformers==1.0.0
- spacy==2.1.8
- onnxruntime==1.0.0
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
channels:
- conda-forge
- pytorch

View File

@@ -14,8 +14,9 @@ IF "%CONDA_EXE%"=="" GOTO CondaMissing
call conda activate %conda_env_name% 2>nul: call conda activate %conda_env_name% 2>nul:
if not errorlevel 1 ( if not errorlevel 1 (
echo Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment %conda_env_name% echo Upgrading existing conda environment %conda_env_name%
call pip install --upgrade azureml-sdk[automl,notebooks,explain] call pip uninstall azureml-train-automl -y -q
call conda env update --name %conda_env_name% --file %automl_env_file%
if errorlevel 1 goto ErrorExit if errorlevel 1 goto ErrorExit
) else ( ) else (
call conda env create -f %automl_env_file% -n %conda_env_name% call conda env create -f %automl_env_file% -n %conda_env_name%

View File

@@ -22,8 +22,9 @@ fi
if source activate $CONDA_ENV_NAME 2> /dev/null if source activate $CONDA_ENV_NAME 2> /dev/null
then then
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME echo "Upgrading existing conda environment" $CONDA_ENV_NAME
pip install --upgrade azureml-sdk[automl,notebooks,explain] && pip uninstall azureml-train-automl -y -q
conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
jupyter nbextension uninstall --user --py azureml.widgets jupyter nbextension uninstall --user --py azureml.widgets
else else
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME && conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&

View File

@@ -22,8 +22,9 @@ fi
if source activate $CONDA_ENV_NAME 2> /dev/null if source activate $CONDA_ENV_NAME 2> /dev/null
then then
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME echo "Upgrading existing conda environment" $CONDA_ENV_NAME
pip install --upgrade azureml-sdk[automl,notebooks,explain] && pip uninstall azureml-train-automl -y -q
conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
jupyter nbextension uninstall --user --py azureml.widgets jupyter nbextension uninstall --user --py azureml.widgets
else else
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME && conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&

View File

@@ -92,6 +92,32 @@
"from azureml.explain.model._internal.explanation_client import ExplanationClient" "from azureml.explain.model._internal.explanation_client import ExplanationClient"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Accessing the Azure ML workspace requires authentication with Azure.\n",
"\n",
"The default authentication is interactive authentication using the default tenant. Executing the `ws = Workspace.from_config()` line in the cell below will prompt for authentication the first time that it is run.\n",
"\n",
"If you have multiple Azure tenants, you can specify the tenant by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
"\n",
"```\n",
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
"auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n",
"ws = Workspace.from_config(auth = auth)\n",
"```\n",
"\n",
"If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
"\n",
"```\n",
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
"auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n",
"ws = Workspace.from_config(auth = auth)\n",
"```\n",
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -285,15 +311,15 @@
"|**task**|classification or regression or forecasting|\n", "|**task**|classification or regression or forecasting|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n", "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**blacklist_models** or **whitelist_models** |*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|\n", "|**blacklist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run. <br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|\n",
"| **whitelist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to use in this run. Same values listed above for **blacklist_models** allowed for **whitelist_models**.|\n",
"|**experiment_exit_score**| Value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n", "|**experiment_exit_score**| Value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
"|**experiment_timeout_minutes**| Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.|\n", "|**experiment_timeout_hours**| Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n",
"|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n", "|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
"|**featurization**| 'auto' / 'off' Indicator for whether featurization step should be done automatically or not. Note: If the input data is sparse, featurization cannot be turned on.|\n", "|**featurization**| 'auto' / 'off' Indicator for whether featurization step should be done automatically or not. Note: If the input data is sparse, featurization cannot be turned on.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n",
"|**training_data**|Input dataset, containing both features and label column.|\n", "|**training_data**|Input dataset, containing both features and label column.|\n",
"|**label_column_name**|The name of the label column.|\n", "|**label_column_name**|The name of the label column.|\n",
"|**model_explainability**|Indicate to explain each trained pipeline or not.|\n",
"\n", "\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)" "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
] ]
@@ -305,7 +331,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"automl_settings = {\n", "automl_settings = {\n",
" \"experiment_timeout_minutes\" : 20,\n", " \"experiment_timeout_hours\" : 0.3,\n",
" \"enable_early_stopping\" : True,\n", " \"enable_early_stopping\" : True,\n",
" \"iteration_timeout_minutes\": 5,\n", " \"iteration_timeout_minutes\": 5,\n",
" \"max_concurrent_iterations\": 4,\n", " \"max_concurrent_iterations\": 4,\n",
@@ -325,7 +351,6 @@
" training_data = train_data,\n", " training_data = train_data,\n",
" label_column_name = label,\n", " label_column_name = label,\n",
" validation_data = validation_dataset,\n", " validation_data = validation_dataset,\n",
" model_explainability=True,\n",
" **automl_settings\n", " **automl_settings\n",
" )" " )"
] ]
@@ -334,8 +359,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while."
"In this example, we specify `show_output = True` to print currently running iterations to the console."
] ]
}, },
{ {
@@ -382,6 +406,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Wait for the remote run to complete\n",
"remote_run.wait_for_completion()" "remote_run.wait_for_completion()"
] ]
}, },
@@ -463,8 +488,31 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Retrieve the Best Model's explanation\n", "### Retrieve the Best Model's explanation\n",
"Retrieve the explanation from the best_run which includes explanations for engineered features and raw features.\n", "Retrieve the explanation from the best_run which includes explanations for engineered features and raw features. Make sure that the run for generating explanations for the best model is completed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait for the best model explanation run to complete\n",
"from azureml.core.run import Run\n",
"model_explainability_run_id = remote_run.get_properties().get('ModelExplainRunId')\n",
"print(model_explainability_run_id)\n",
"if model_explainability_run_id is not None:\n",
" model_explainability_run = Run(experiment=experiment, run_id=model_explainability_run_id)\n",
" model_explainability_run.wait_for_completion()\n",
"\n", "\n",
"# Get the best run object\n",
"best_run, fitted_model = remote_run.get_output()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download engineered feature importance from artifact store\n", "#### Download engineered feature importance from artifact store\n",
"You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run." "You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run."
] ]
@@ -475,13 +523,32 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"best_run, fitted_model = remote_run.get_output()\n",
"client = ExplanationClient.from_run(best_run)\n", "client = ExplanationClient.from_run(best_run)\n",
"engineered_explanations = client.download_model_explanation(raw=False)\n", "engineered_explanations = client.download_model_explanation(raw=False)\n",
"exp_data = engineered_explanations.get_feature_importance_dict()\n", "exp_data = engineered_explanations.get_feature_importance_dict()\n",
"exp_data" "exp_data"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download raw feature importance from artifact store\n",
"You can use ExplanationClient to download the raw feature explanations from the artifact store of the best_run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = ExplanationClient.from_run(best_run)\n",
"engineered_explanations = client.download_model_explanation(raw=True)\n",
"exp_data = engineered_explanations.get_feature_importance_dict()\n",
"exp_data"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -515,7 +582,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.automl.core.onnx_convert import OnnxConverter\n", "from azureml.automl.runtime.onnx_convert import OnnxConverter\n",
"onnx_fl_path = \"./best_model.onnx\"\n", "onnx_fl_path = \"./best_model.onnx\"\n",
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)" "OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
] ]
@@ -527,15 +594,6 @@
"### Predict with the ONNX model, using onnxruntime package" "### Predict with the ONNX model, using onnxruntime package"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_df = test_dataset.to_pandas_dataframe()"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -552,12 +610,8 @@
"else:\n", "else:\n",
" python_version_compatible = False\n", " python_version_compatible = False\n",
"\n", "\n",
"try:\n", "import onnxruntime\n",
" import onnxruntime\n", "from azureml.automl.runtime.onnx_convert import OnnxInferenceHelper\n",
" from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n",
" onnxrt_present = True\n",
"except ImportError:\n",
" onnxrt_present = False\n",
"\n", "\n",
"def get_onnx_res(run):\n", "def get_onnx_res(run):\n",
" res_path = 'onnx_resource.json'\n", " res_path = 'onnx_resource.json'\n",
@@ -566,7 +620,8 @@
" onnx_res = json.load(f)\n", " onnx_res = json.load(f)\n",
" return onnx_res\n", " return onnx_res\n",
"\n", "\n",
"if onnxrt_present and python_version_compatible: \n", "if python_version_compatible:\n",
" test_df = test_dataset.to_pandas_dataframe()\n",
" mdl_bytes = onnx_mdl.SerializeToString()\n", " mdl_bytes = onnx_mdl.SerializeToString()\n",
" onnx_res = get_onnx_res(best_run)\n", " onnx_res = get_onnx_res(best_run)\n",
"\n", "\n",
@@ -576,10 +631,7 @@
" print(pred_onnx)\n", " print(pred_onnx)\n",
" print(pred_prob_onnx)\n", " print(pred_prob_onnx)\n",
"else:\n", "else:\n",
" if not python_version_compatible:\n", " print('Please use Python version 3.6 or 3.7 to run the inference helper.')"
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
" if not onnxrt_present:\n",
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
] ]
}, },
{ {
@@ -613,20 +665,6 @@
"best_run, fitted_model = remote_run.get_output()" "best_run, fitted_model = remote_run.get_output()"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import shutil\n",
"\n",
"sript_folder = os.path.join(os.getcwd(), 'inference')\n",
"project_folder = '/inference'\n",
"os.makedirs(project_folder, exist_ok=True)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -680,10 +718,10 @@
"from azureml.core.webservice import AciWebservice\n", "from azureml.core.webservice import AciWebservice\n",
"from azureml.core.webservice import Webservice\n", "from azureml.core.webservice import Webservice\n",
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"from azureml.core.environment import Environment\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime = \"python\", \n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=conda_env_file_name)\n",
" entry_script = script_file_name,\n", "inference_config = InferenceConfig(entry_script=script_file_name, environment=myenv)\n",
" conda_file = conda_env_file_name)\n",
"\n", "\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n", " memory_gb = 1, \n",
@@ -826,13 +864,6 @@
"\n", "\n",
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014" "[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
] ]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
} }
], ],
"metadata": { "metadata": {

View File

@@ -2,12 +2,10 @@ name: auto-ml-classification-bank-marketing-all-features
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- interpret
- azureml-defaults
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml - interpret
- onnxruntime - onnxruntime==1.0.0
- azureml-explain-model - azureml-explain-model
- azureml-contrib-interpret - azureml-contrib-interpret

View File

@@ -210,10 +210,9 @@
"automl_settings = {\n", "automl_settings = {\n",
" \"n_cross_validations\": 3,\n", " \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'average_precision_score_weighted',\n", " \"primary_metric\": 'average_precision_score_weighted',\n",
" \"preprocess\": True,\n",
" \"enable_early_stopping\": True,\n", " \"enable_early_stopping\": True,\n",
" \"max_concurrent_iterations\": 2, # This is a limit for testing purpose, please increase it as per cluster size\n", " \"max_concurrent_iterations\": 2, # This is a limit for testing purpose, please increase it as per cluster size\n",
" \"experiment_timeout_minutes\": 10, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n", " \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n",
" \"verbosity\": logging.INFO,\n", " \"verbosity\": logging.INFO,\n",
"}\n", "}\n",
"\n", "\n",
@@ -230,8 +229,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while.\n", "Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while."
"In this example, we specify `show_output = True` to print currently running iterations to the console."
] ]
}, },
{ {
@@ -284,7 +282,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"widget-rundetails-sample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.widgets import RunDetails\n", "from azureml.widgets import RunDetails\n",
@@ -306,7 +308,7 @@
"source": [ "source": [
"#### Explain model\n", "#### Explain model\n",
"\n", "\n",
"Automated ML models can be explained and visualized using the SDK Explainability library. [Learn how to use the explainer](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.ipynb)." "Automated ML models can be explained and visualized using the SDK Explainability library. "
] ]
}, },
{ {
@@ -335,17 +337,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Print the properties of the model\n", "#### Print the properties of the model\n",
"The fitted_model is a python object and you can read the different properties of the object.\n", "The fitted_model is a python object and you can read the different properties of the object.\n"
"See *Print the properties of the model* section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy\n",
"\n",
"To deploy the model into a web service endpoint, see _Deploy_ section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb)"
] ]
}, },
{ {
@@ -452,7 +444,7 @@
"AML Compute" "AML Compute"
], ],
"datasets": [ "datasets": [
"creditcard" "Creditcard"
], ],
"deployment": [ "deployment": [
"None" "None"

View File

@@ -2,10 +2,8 @@ name: auto-ml-classification-credit-card-fraud
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- interpret
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml - interpret
- azureml-explain-model

View File

@@ -0,0 +1,579 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Text Classification Using Deep Learning**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Evaluate](#Evaluate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"This notebook demonstrates classification with text data using deep learning in AutoML.\n",
"\n",
"AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data. Depending on the compute cluster the user provides, AutoML tried out Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used, and Bidirectional Long-Short Term neural network (BiLSTM) when a CPU compute is used, thereby optimizing the choice of DNN for the uesr's setup.\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"An Enterprise workspace is required for this notebook. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade).\n",
"\n",
"Notebook synopsis:\n",
"1. Creating an Experiment in an existing Workspace\n",
"2. Configuration and remote run of AutoML for a text dataset (20 Newsgroups dataset from scikit-learn) for classification\n",
"3. Evaluating the final model on a test set\n",
"4. Deploying the model on ACI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import shutil\n",
"\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"from azureml.core.run import Run\n",
"from azureml.widgets import RunDetails\n",
"from azureml.core.model import Model \n",
"from helper import run_inference, get_result_df\n",
"from azureml.train.automl import AutoMLConfig\n",
"from sklearn.datasets import fetch_20newsgroups"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose an experiment name.\n",
"experiment_name = 'automl-classification-text-dnn'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up a compute cluster\n",
"This section uses a user-provided compute cluster (named \"dnntext-cluster\" in this example). If a cluster with this name does not exist in the user's workspace, the below code will create a new cluster. You can choose the parameters of the cluster as mentioned in the comments.\n",
"\n",
"Whether you provide/select a CPU or GPU cluster, AutoML will choose the appropriate DNN for that setup - BiLSTM or BERT text featurizer will be included in the candidate featurizers on CPU and GPU respectively. If your goal is to obtain the most accurate model, we recommend you use GPU clusters since BERT featurizers usually outperform BiLSTM featurizers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"dnntext-cluster\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
"\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\", # CPU for BiLSTM, such as \"STANDARD_D2_V2\" \n",
" # To use BERT (this is recommended for best performance), select a GPU such as \"STANDARD_NC6\" \n",
" # or similar GPU option\n",
" # available in your workspace\n",
" max_nodes = 1)\n",
"\n",
" # Create the cluster\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get data\n",
"For this notebook we will use 20 Newsgroups data from scikit-learn. We filter the data to contain four classes and take a sample as training data. Please note that for accuracy improvement, more data is needed. For this notebook we provide a small-data example so that you can use this template to use with your larger sized data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_dir = \"text-dnn-data\" # Local directory to store data\n",
"blobstore_datadir = data_dir # Blob store directory to store data in\n",
"target_column_name = 'y'\n",
"feature_column_name = 'X'\n",
"\n",
"def get_20newsgroups_data():\n",
" '''Fetches 20 Newsgroups data from scikit-learn\n",
" Returns them in form of pandas dataframes\n",
" '''\n",
" remove = ('headers', 'footers', 'quotes')\n",
" categories = [\n",
" 'alt.atheism',\n",
" 'talk.religion.misc',\n",
" 'comp.graphics',\n",
" 'sci.space',\n",
" ]\n",
"\n",
" data = fetch_20newsgroups(subset = 'train', categories = categories,\n",
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
" data = pd.DataFrame({feature_column_name: data.data, target_column_name: data.target})\n",
"\n",
" data_train = data[:200]\n",
" data_test = data[200:300] \n",
"\n",
" data_train = remove_blanks_20news(data_train, feature_column_name, target_column_name)\n",
" data_test = remove_blanks_20news(data_test, feature_column_name, target_column_name)\n",
" \n",
" return data_train, data_test\n",
" \n",
"def remove_blanks_20news(data, feature_column_name, target_column_name):\n",
" \n",
" data[feature_column_name] = data[feature_column_name].replace(r'\\n', ' ', regex=True).apply(lambda x: x.strip())\n",
" data = data[data[feature_column_name] != '']\n",
" \n",
" return data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Fetch data and upload to datastore for use in training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_train, data_test = get_20newsgroups_data()\n",
"\n",
"if not os.path.isdir(data_dir):\n",
" os.mkdir(data_dir)\n",
" \n",
"train_data_fname = data_dir + '/train_data.csv'\n",
"test_data_fname = data_dir + '/test_data.csv'\n",
"\n",
"data_train.to_csv(train_data_fname, index=False)\n",
"data_test.to_csv(test_data_fname, index=False)\n",
"\n",
"datastore = ws.get_default_datastore()\n",
"datastore.upload(src_dir=data_dir, target_path=blobstore_datadir,\n",
" overwrite=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, blobstore_datadir + '/train_data.csv')])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare AutoML run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"experiment_timeout_minutes\": 20,\n",
" \"primary_metric\": 'accuracy',\n",
" \"max_concurrent_iterations\": 4, \n",
" \"max_cores_per_iteration\": -1,\n",
" \"enable_dnn\": True,\n",
" \"enable_early_stopping\": True,\n",
" \"validation_size\": 0.3,\n",
" \"verbosity\": logging.INFO,\n",
" \"enable_voting_ensemble\": False,\n",
" \"enable_stack_ensemble\": False,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" compute_target=compute_target,\n",
" training_data=train_dataset,\n",
" label_column_name=target_column_name,\n",
" **automl_settings\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit AutoML Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_run = experiment.submit(automl_config, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"Below we select the best model pipeline from our iterations, use it to test on test data on the same compute cluster."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can test the model locally to get a feel of the input/output. This step may require additional package installations such as pytorch."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = automl_run.get_output()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can now see what text transformations are used to convert text data to features for this dataset, including deep learning transformations based on BiLSTM or Transformer (BERT is one implementation of a Transformer) models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text_transformations_used = []\n",
"for column_group in fitted_model.named_steps['datatransformer'].get_featurization_summary():\n",
" text_transformations_used.extend(column_group['Transformations'])\n",
"text_transformations_used"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploying the model\n",
"We now use the best fitted model from the AutoML Run to make predictions on the test set. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get results stats, extract the best model from AutoML run, download and register the resultant best model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"summary_df = get_result_df(automl_run)\n",
"best_dnn_run_id = summary_df['run_id'].iloc[0]\n",
"best_dnn_run = Run(experiment, best_dnn_run_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_dir = 'Model' # Local folder where the model will be stored temporarily\n",
"if not os.path.isdir(model_dir):\n",
" os.mkdir(model_dir)\n",
" \n",
"best_dnn_run.download_file('outputs/model.pkl', model_dir + '/model.pkl')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Register the model in your Azure Machine Learning Workspace. If you previously registered a model, please make sure to delete it so as to replace it with this new model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Register the model\n",
"model_name = 'textDNN-20News'\n",
"model = Model.register(model_path = model_dir + '/model.pkl',\n",
" model_name = model_name,\n",
" tags=None,\n",
" workspace=ws)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate on Test Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now use the best fitted model from the AutoML Run to make predictions on the test set. \n",
"\n",
"Test set schema should match that of the training set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, blobstore_datadir + '/test_data.csv')])\n",
"\n",
"# preview the first 3 rows of the dataset\n",
"test_dataset.take(3).to_pandas_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_experiment = Experiment(ws, experiment_name + \"_test\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"script_folder = os.path.join(os.getcwd(), 'inference')\n",
"os.makedirs(script_folder, exist_ok=True)\n",
"shutil.copy2('infer.py', script_folder)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run, test_dataset,\n",
" target_column_name, model_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Display computed metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"RunDetails(test_run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_run.wait_for_completion()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pd.Series(test_run.get_metrics())"
]
}
],
"metadata": {
"authors": [
{
"name": "anshirga"
}
],
"compute": [
"AML Compute"
],
"datasets": [
"None"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"None"
],
"friendly_name": "DNN Text Featurization",
"index_order": 2,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"tags": [
"None"
],
"task": "Text featurization using DNNs for classification"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,13 @@
name: auto-ml-classification-text-dnn
dependencies:
- pip:
- azureml-sdk
- azureml-train-automl
- azureml-widgets
- matplotlib
- azurmel-train
- https://download.pytorch.org/whl/cpu/torch-1.1.0-cp35-cp35m-win_amd64.whl
- sentencepiece==0.1.82
- pytorch-transformers==1.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz

View File

@@ -0,0 +1,60 @@
import pandas as pd
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.train.estimator import Estimator
from azureml.core.run import Run
def run_inference(test_experiment, compute_target, script_folder, train_run,
test_dataset, target_column_name, model_name):
train_run.download_file('outputs/conda_env_v_1_0_0.yml',
'inference/condafile.yml')
inference_env = Environment("myenv")
inference_env.docker.enabled = True
inference_env.python.conda_dependencies = CondaDependencies(
conda_dependencies_file_path='inference/condafile.yml')
est = Estimator(source_directory=script_folder,
entry_script='infer.py',
script_params={
'--target_column_name': target_column_name,
'--model_name': model_name
},
inputs=[test_dataset.as_named_input('test_data')],
compute_target=compute_target,
environment_definition=inference_env)
run = test_experiment.submit(
est, tags={
'training_run_id': train_run.id,
'run_algorithm': train_run.properties['run_algorithm'],
'valid_score': train_run.properties['score'],
'primary_metric': train_run.properties['primary_metric']
})
run.log("run_algorithm", run.tags['run_algorithm'])
return run
def get_result_df(remote_run):
children = list(remote_run.get_children(recursive=True))
summary_df = pd.DataFrame(index=['run_id', 'run_algorithm',
'primary_metric', 'Score'])
goal_minimize = False
for run in children:
if('run_algorithm' in run.properties and 'score' in run.properties):
summary_df[run.id] = [run.id, run.properties['run_algorithm'],
run.properties['primary_metric'],
float(run.properties['score'])]
if('goal' in run.properties):
goal_minimize = run.properties['goal'].split('_')[-1] == 'min'
summary_df = summary_df.T.sort_values(
'Score',
ascending=goal_minimize).drop_duplicates(['run_algorithm'])
summary_df = summary_df.set_index('run_algorithm')
return summary_df

View File

@@ -0,0 +1,54 @@
import numpy as np
import argparse
from azureml.core import Run
from sklearn.externals import joblib
from azureml.automl.core._vendor.automl.client.core.common import metrics
from automl.client.core.common import constants
from azureml.core.model import Model
parser = argparse.ArgumentParser()
parser.add_argument(
'--target_column_name', type=str, dest='target_column_name',
help='Target Column Name')
parser.add_argument(
'--model_name', type=str, dest='model_name',
help='Name of registered model')
args = parser.parse_args()
target_column_name = args.target_column_name
model_name = args.model_name
print('args passed are: ')
print('Target column name: ', target_column_name)
print('Name of registered model: ', model_name)
model_path = Model.get_model_path(model_name)
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
run = Run.get_context()
# get input dataset by name
test_dataset = run.input_datasets['test_data']
X_test_df = test_dataset.drop_columns(columns=[target_column_name]) \
.to_pandas_dataframe()
y_test_df = test_dataset.with_timestamp_columns(None) \
.keep_columns(columns=[target_column_name]) \
.to_pandas_dataframe()
predicted = model.predict_proba(X_test_df)
# use automl metrics module
scores = metrics.compute_metrics_classification(
np.array(predicted),
np.array(y_test_df),
class_labels=model.classes_,
metrics=list(constants.Metric.SCALAR_CLASSIFICATION_SET)
)
print("scores:")
print(scores)
for key, value in scores.items():
run.log(key, value)

View File

@@ -197,7 +197,7 @@
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", "conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', 'applicationinsights', 'azureml-opendatasets'], \n", "cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', 'applicationinsights', 'azureml-opendatasets'], \n",
" conda_packages=['numpy', 'py-xgboost'], \n", " conda_packages=['numpy==1.16.2'], \n",
" pin_sdk_version=False)\n", " pin_sdk_version=False)\n",
"#cd.add_pip_package('azureml-explain-model')\n", "#cd.add_pip_package('azureml-explain-model')\n",
"conda_run_config.environment.python.conda_dependencies = cd\n", "conda_run_config.environment.python.conda_dependencies = cd\n",
@@ -210,7 +210,24 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Data Ingestion Pipeline \n", "## Data Ingestion Pipeline \n",
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You can replace this with your own dataset, or you can skip this pipeline if you already have a time-series based `TabularDataset`.\n", "For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You can replace this with your own dataset, or you can skip this pipeline if you already have a time-series based `TabularDataset`.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The name and target column of the Dataset to create \n",
"dataset = \"NOAA-Weather-DS4\"\n",
"target_column_name = \"temperature\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "\n",
"### Upload Data Step\n", "### Upload Data Step\n",
"The data ingestion pipeline has a single step with a script to query the latest weather data and upload it to the blob store. During the first run, the script will create and register a time-series based `TabularDataset` with the past one week of weather data. For each subsequent run, the script will create a partition in the blob store by querying NOAA for new weather data since the last modified time of the dataset (`dataset.data_changed_time`) and creating a data.csv file." "The data ingestion pipeline has a single step with a script to query the latest weather data and upload it to the blob store. During the first run, the script will create and register a time-series based `TabularDataset` with the past one week of weather data. For each subsequent run, the script will create a partition in the blob store by querying NOAA for new weather data since the last modified time of the dataset (`dataset.data_changed_time`) and creating a data.csv file."
@@ -225,8 +242,6 @@
"from azureml.pipeline.core import Pipeline, PipelineParameter\n", "from azureml.pipeline.core import Pipeline, PipelineParameter\n",
"from azureml.pipeline.steps import PythonScriptStep\n", "from azureml.pipeline.steps import PythonScriptStep\n",
"\n", "\n",
"# The name of the Dataset to create \n",
"dataset = \"NOAA-Weather-DS4\"\n",
"ds_name = PipelineParameter(name=\"ds_name\", default_value=dataset)\n", "ds_name = PipelineParameter(name=\"ds_name\", default_value=dataset)\n",
"upload_data_step = PythonScriptStep(script_name=\"upload_weather_data.py\", \n", "upload_data_step = PythonScriptStep(script_name=\"upload_weather_data.py\", \n",
" allow_reuse=False,\n", " allow_reuse=False,\n",
@@ -262,7 +277,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"data_pipeline_run.wait_for_completion()" "data_pipeline_run.wait_for_completion(show_output=False)"
] ]
}, },
{ {
@@ -272,7 +287,7 @@
"## Training Pipeline\n", "## Training Pipeline\n",
"### Prepare Training Data Step\n", "### Prepare Training Data Step\n",
"\n", "\n",
"Script to bring data into common X,y format. We need to set allow_reuse flag to False to allow the pipeline to run even when inputs don't change. We also need the name of the model to check the time the model was last trained." "Script to check if new data is available since the model was last trained. If no new data is available, we cancel the remaining pipeline steps. We need to set allow_reuse flag to False to allow the pipeline to run even when inputs don't change. We also need the name of the model to check the time the model was last trained."
] ]
}, },
{ {
@@ -283,11 +298,8 @@
"source": [ "source": [
"from azureml.pipeline.core import PipelineData\n", "from azureml.pipeline.core import PipelineData\n",
"\n", "\n",
"target_column = PipelineParameter(\"target_column\", default_value=\"y\")\n",
"# The model name with which to register the trained model in the workspace.\n", "# The model name with which to register the trained model in the workspace.\n",
"model_name = PipelineParameter(\"model_name\", default_value=\"y\")\n", "model_name = PipelineParameter(\"model_name\", default_value=\"noaaweatherds\")"
"output_x = PipelineData(\"output_x\", datastore=dstor)\n",
"output_y = PipelineData(\"output_y\", datastore=dstor)"
] ]
}, },
{ {
@@ -299,16 +311,23 @@
"data_prep_step = PythonScriptStep(script_name=\"check_data.py\", \n", "data_prep_step = PythonScriptStep(script_name=\"check_data.py\", \n",
" allow_reuse=False,\n", " allow_reuse=False,\n",
" name=\"check_data\",\n", " name=\"check_data\",\n",
" arguments=[\"--target_column\", target_column,\n", " arguments=[\"--ds_name\", ds_name,\n",
" \"--output_x\", output_x,\n",
" \"--output_y\", output_y,\n",
" \"--ds_name\", ds_name,\n",
" \"--model_name\", model_name],\n", " \"--model_name\", model_name],\n",
" outputs=[output_x, output_y], \n",
" compute_target=compute_target, \n", " compute_target=compute_target, \n",
" runconfig=conda_run_config)" " runconfig=conda_run_config)"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Dataset\n",
"train_ds = Dataset.get_by_name(ws, dataset)\n",
"train_ds = train_ds.drop_columns([\"partition_date\"])"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -323,14 +342,14 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.automl import AutoMLStep, AutoMLConfig\n", "from azureml.train.automl import AutoMLConfig\n",
"from azureml.pipeline.steps import AutoMLStep\n",
"\n", "\n",
"automl_settings = {\n", "automl_settings = {\n",
" \"iteration_timeout_minutes\": 20,\n", " \"iteration_timeout_minutes\": 10,\n",
" \"experiment_timeout_minutes\": 30,\n", " \"experiment_timeout_hours\": 0.25,\n",
" \"n_cross_validations\": 3,\n", " \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'r2_score',\n", " \"primary_metric\": 'r2_score',\n",
" \"preprocess\": True,\n",
" \"max_concurrent_iterations\": 3,\n", " \"max_concurrent_iterations\": 3,\n",
" \"max_cores_per_iteration\": -1,\n", " \"max_cores_per_iteration\": -1,\n",
" \"verbosity\": logging.INFO,\n", " \"verbosity\": logging.INFO,\n",
@@ -341,8 +360,8 @@
" debug_log = 'automl_errors.log',\n", " debug_log = 'automl_errors.log',\n",
" path = \".\",\n", " path = \".\",\n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
" run_configuration=conda_run_config,\n", " training_data = train_ds,\n",
" data_script = \"get_data.py\",\n", " label_column_name = target_column_name,\n",
" **automl_settings\n", " **automl_settings\n",
" )" " )"
] ]
@@ -358,7 +377,7 @@
"metrics_output_name = 'metrics_output'\n", "metrics_output_name = 'metrics_output'\n",
"best_model_output_name = 'best_model_output'\n", "best_model_output_name = 'best_model_output'\n",
"\n", "\n",
"metirics_data = PipelineData(name='metrics_data',\n", "metrics_data = PipelineData(name='metrics_data',\n",
" datastore=dstor,\n", " datastore=dstor,\n",
" pipeline_output_name=metrics_output_name,\n", " pipeline_output_name=metrics_output_name,\n",
" training_output=TrainingOutput(type='Metrics'))\n", " training_output=TrainingOutput(type='Metrics'))\n",
@@ -377,8 +396,7 @@
"automl_step = AutoMLStep(\n", "automl_step = AutoMLStep(\n",
" name='automl_module',\n", " name='automl_module',\n",
" automl_config=automl_config,\n", " automl_config=automl_config,\n",
" inputs=[output_x, output_y],\n", " outputs=[metrics_data, model_data],\n",
" outputs=[metirics_data, model_data],\n",
" allow_reuse=False)" " allow_reuse=False)"
] ]
}, },
@@ -431,7 +449,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"training_pipeline_run = experiment.submit(training_pipeline, pipeline_parameters={\n", "training_pipeline_run = experiment.submit(training_pipeline, pipeline_parameters={\n",
" \"target_column\": \"temperature\", \"ds_name\": dataset, \"model_name\": \"noaaweatherds\"})" " \"ds_name\": dataset, \"model_name\": \"noaaweatherds\"})"
] ]
}, },
{ {
@@ -440,7 +458,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"training_pipeline_run.wait_for_completion()" "training_pipeline_run.wait_for_completion(show_output=False)"
] ]
}, },
{ {
@@ -474,7 +492,7 @@
"source": [ "source": [
"from azureml.pipeline.core import Schedule\n", "from azureml.pipeline.core import Schedule\n",
"schedule = Schedule.create(workspace=ws, name=\"RetrainingSchedule\",\n", "schedule = Schedule.create(workspace=ws, name=\"RetrainingSchedule\",\n",
" pipeline_parameters={\"target_column\": \"temperature\",\"ds_name\": dataset, \"model_name\": \"noaaweatherds\"},\n", " pipeline_parameters={\"ds_name\": dataset, \"model_name\": \"noaaweatherds\"},\n",
" pipeline_id=published_pipeline.id, \n", " pipeline_id=published_pipeline.id, \n",
" experiment_name=experiment_name, \n", " experiment_name=experiment_name, \n",
" datastore=dstor,\n", " datastore=dstor,\n",

View File

@@ -3,7 +3,6 @@ dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-train-automl - azureml-train-automl
- azureml-pipeline
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml - azureml-pipeline

View File

@@ -15,32 +15,16 @@ if type(run) == _OfflineRun:
else: else:
ws = run.experiment.workspace ws = run.experiment.workspace
print("Check for new data.")
def write_output(df, path):
os.makedirs(path, exist_ok=True)
print("%s created" % path)
df.to_csv(path + "/part-00000", index=False)
print("Check for new data and prepare the data")
parser = argparse.ArgumentParser("split") parser = argparse.ArgumentParser("split")
parser.add_argument("--target_column", type=str, help="input split features")
parser.add_argument("--ds_name", help="input dataset name") parser.add_argument("--ds_name", help="input dataset name")
parser.add_argument("--model_name", help="name of the deployed model") parser.add_argument("--model_name", help="name of the deployed model")
parser.add_argument("--output_x", type=str,
help="output features")
parser.add_argument("--output_y", type=str,
help="output labels")
args = parser.parse_args() args = parser.parse_args()
print("Argument 1(ds_name): %s" % args.ds_name) print("Argument 1(ds_name): %s" % args.ds_name)
print("Argument 2(target_column): %s" % args.target_column) print("Argument 2(model_name): %s" % args.model_name)
print("Argument 3(model_name): %s" % args.model_name)
print("Argument 4(output_x): %s" % args.output_x)
print("Argument 5(output_y): %s" % args.output_y)
# Get the latest registered model # Get the latest registered model
try: try:
@@ -54,22 +38,9 @@ except Exception as e:
train_ds = Dataset.get_by_name(ws, args.ds_name) train_ds = Dataset.get_by_name(ws, args.ds_name)
dataset_changed_time = train_ds.data_changed_time dataset_changed_time = train_ds.data_changed_time
if dataset_changed_time > last_train_time: if not dataset_changed_time > last_train_time:
# New data is available since the model was last trained
print("Dataset was last updated on {0}. Retraining...".format(dataset_changed_time))
train_ds = train_ds.drop_columns(["partition_date"])
X_train = train_ds.drop_columns(
columns=[args.target_column]).to_pandas_dataframe()
y_train = train_ds.keep_columns(
columns=[args.target_column]).to_pandas_dataframe()
non_null = y_train[args.target_column].notnull()
y = y_train[non_null]
X = X_train[non_null]
if not (args.output_x is None and args.output_y is None):
write_output(X, args.output_x)
write_output(y, args.output_y)
else:
print("Cancelling run since there is no new data.") print("Cancelling run since there is no new data.")
run.parent.cancel() run.parent.cancel()
else:
# New data is available since the model was last trained
print("Dataset was last updated on {0}. Retraining...".format(dataset_changed_time))

View File

@@ -1,15 +0,0 @@
import os
import pandas as pd
def get_data():
print("In get_data")
print(os.environ['AZUREML_DATAREFERENCE_output_x'])
X_train = pd.read_csv(
os.environ['AZUREML_DATAREFERENCE_output_x'] + "/part-00000")
y_train = pd.read_csv(
os.environ['AZUREML_DATAREFERENCE_output_y'] + "/part-00000")
print(X_train.head(3))
return {"X": X_train.values, "y": y_train.values.flatten()}

View File

@@ -58,7 +58,7 @@ except Exception as e:
print(traceback.format_exc()) print(traceback.format_exc())
print("Dataset with name {0} not found, registering new dataset.".format(args.ds_name)) print("Dataset with name {0} not found, registering new dataset.".format(args.ds_name))
register_dataset = True register_dataset = True
end_time_last_slice = datetime.today() - relativedelta(weeks=1) end_time_last_slice = datetime.today() - relativedelta(weeks=2)
end_time = datetime.utcnow() end_time = datetime.utcnow()
train_df = get_noaa_data(end_time_last_slice, end_time) train_df = get_noaa_data(end_time_last_slice, end_time)
@@ -80,10 +80,10 @@ if train_df.size > 0:
target_path=folder_name, target_path=folder_name,
overwrite=True, overwrite=True,
show_progress=True) show_progress=True)
if register_dataset:
ds = Dataset.Tabular.from_delimited_files(dstor.path("{}/**/*.csv".format(
args.ds_name)), partition_format='/{partition_date:yyyy/MM/dd/hh/mm/ss}/data.csv')
ds.register(ws, name=args.ds_name)
else: else:
print("No new data since {0}.".format(end_time_last_slice)) print("No new data since {0}.".format(end_time_last_slice))
if register_dataset:
ds = Dataset.Tabular.from_delimited_files(dstor.path("{}/**/*.csv".format(
args.ds_name)), partition_format='/{partition_date:yyyy/MM/dd/HH/mm/ss}/data.csv')
ds.register(ws, name=args.ds_name)

View File

@@ -301,7 +301,7 @@
"source": [ "source": [
"### Setting forecaster maximum horizon \n", "### Setting forecaster maximum horizon \n",
"\n", "\n",
"The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 4 periods (i.e. 4 months). Notice that this is much shorter than the number of days in the test set; we will need to use a rolling test to evaluate the performance on the whole test set. For more discussion of forecast horizons and guiding principles for setting them, please see the [energy demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand). " "The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 12 periods (i.e. 12 months). Notice that this is much shorter than the number of months in the test set; we will need to use a rolling test to evaluate the performance on the whole test set. For more discussion of forecast horizons and guiding principles for setting them, please see the [energy demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand). "
] ]
}, },
{ {
@@ -358,12 +358,12 @@
"\n", "\n",
"automl_config = AutoMLConfig(task='forecasting', \n", "automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n", " primary_metric='normalized_root_mean_squared_error',\n",
" experiment_timeout_minutes = 60,\n", " experiment_timeout_hours = 1,\n",
" training_data=train_dataset,\n", " training_data=train_dataset,\n",
" label_column_name=target_column_name,\n", " label_column_name=target_column_name,\n",
" validation_data=valid_dataset, \n", " validation_data=valid_dataset, \n",
" verbosity=logging.INFO,\n", " verbosity=logging.INFO,\n",
" compute_target = compute_target,\n", " compute_target=compute_target,\n",
" max_concurrent_iterations=4,\n", " max_concurrent_iterations=4,\n",
" max_cores_per_iteration=-1,\n", " max_cores_per_iteration=-1,\n",
" **automl_settings)" " **automl_settings)"
@@ -376,7 +376,7 @@
"hidePrompt": false "hidePrompt": false
}, },
"source": [ "source": [
"We will now run the experiment, starting with 10 iterations of model search. The experiment can be continued for more iterations if more accurate results are required. You will see the currently running iterations printing to the console." "We will now run the experiment, starting with 10 iterations of model search. The experiment can be continued for more iterations if more accurate results are required."
] ]
}, },
{ {
@@ -555,10 +555,8 @@
"import shutil\n", "import shutil\n",
"\n", "\n",
"script_folder = os.path.join(os.getcwd(), 'inference')\n", "script_folder = os.path.join(os.getcwd(), 'inference')\n",
"project_folder = './inference'\n", "os.makedirs(script_folder, exist_ok=True)\n",
"os.makedirs(project_folder, exist_ok=True)\n", "shutil.copy2('infer.py', script_folder)"
"\n",
"!copy infer.py inference"
] ]
}, },
{ {

View File

@@ -4,9 +4,8 @@ dependencies:
- py-xgboost<=0.80 - py-xgboost<=0.80
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy==1.16.2
- azureml-train-automl - azureml-train-automl
- azureml-train
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml - azureml-train
- statsmodels

View File

@@ -76,9 +76,12 @@ def get_result_df(remote_run):
def run_inference(test_experiment, compute_target, script_folder, train_run, def run_inference(test_experiment, compute_target, script_folder, train_run,
test_dataset, lookback_dataset, max_horizon, test_dataset, lookback_dataset, max_horizon,
target_column_name, time_column_name, freq): target_column_name, time_column_name, freq):
train_run.download_file('outputs/model.pkl', 'inference/model.pkl') model_base_name = 'model.pkl'
train_run.download_file('outputs/conda_env_v_1_0_0.yml', if 'model_data_location' in train_run.properties:
'inference/condafile.yml') model_location = train_run.properties['model_data_location']
_, model_base_name = model_location.rsplit('/', 1)
train_run.download_file('outputs/{}'.format(model_base_name), 'inference/{}'.format(model_base_name))
train_run.download_file('outputs/conda_env_v_1_0_0.yml', 'inference/condafile.yml')
inference_env = Environment("myenv") inference_env = Environment("myenv")
inference_env.docker.enabled = True inference_env.docker.enabled = True
@@ -91,7 +94,8 @@ def run_inference(test_experiment, compute_target, script_folder, train_run,
'--max_horizon': max_horizon, '--max_horizon': max_horizon,
'--target_column_name': target_column_name, '--target_column_name': target_column_name,
'--time_column_name': time_column_name, '--time_column_name': time_column_name,
'--frequency': freq '--frequency': freq,
'--model_path': model_base_name
}, },
inputs=[test_dataset.as_named_input('test_data'), inputs=[test_dataset.as_named_input('test_data'),
lookback_dataset.as_named_input('lookback_data')], lookback_dataset.as_named_input('lookback_data')],

View File

@@ -232,6 +232,9 @@ parser.add_argument(
parser.add_argument( parser.add_argument(
'--frequency', type=str, dest='freq', '--frequency', type=str, dest='freq',
help='Frequency of prediction') help='Frequency of prediction')
parser.add_argument(
'--model_path', type=str, dest='model_path',
default='model.pkl', help='Filename of model to be loaded')
args = parser.parse_args() args = parser.parse_args()
@@ -239,6 +242,7 @@ max_horizon = args.max_horizon
target_column_name = args.target_column_name target_column_name = args.target_column_name
time_column_name = args.time_column_name time_column_name = args.time_column_name
freq = args.freq freq = args.freq
model_path = args.model_path
print('args passed are: ') print('args passed are: ')
@@ -246,6 +250,7 @@ print(max_horizon)
print(target_column_name) print(target_column_name)
print(time_column_name) print(time_column_name)
print(freq) print(freq)
print(model_path)
run = Run.get_context() run = Run.get_context()
# get input dataset by name # get input dataset by name
@@ -267,7 +272,8 @@ X_lookback_df = lookback_dataset.drop_columns(columns=[target_column_name])
y_lookback_df = lookback_dataset.with_timestamp_columns( y_lookback_df = lookback_dataset.with_timestamp_columns(
None).keep_columns(columns=[target_column_name]) None).keep_columns(columns=[target_column_name])
fitted_model = joblib.load('model.pkl') fitted_model = joblib.load(model_path)
if hasattr(fitted_model, 'get_lookback'): if hasattr(fitted_model, 'get_lookback'):
lookback = fitted_model.get_lookback() lookback = fitted_model.get_lookback()

View File

@@ -42,7 +42,7 @@
"\n", "\n",
"AutoML highlights here include built-in holiday featurization, accessing engineered feature names, and working with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n", "AutoML highlights here include built-in holiday featurization, accessing engineered feature names, and working with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n",
"\n", "\n",
"Make sure you have executed the [configuration](../configuration.ipynb) before running this notebook.\n", "Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
"\n", "\n",
"Notebook synopsis:\n", "Notebook synopsis:\n",
"1. Creating an Experiment in an existing Workspace\n", "1. Creating an Experiment in an existing Workspace\n",
@@ -98,6 +98,7 @@
"output['SDK version'] = azureml.core.VERSION\n", "output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['SKU'] = ws.sku\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n", "output['Location'] = ws.location\n",
"output['Run History Name'] = experiment_name\n", "output['Run History Name'] = experiment_name\n",
@@ -127,7 +128,7 @@
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute import ComputeTarget\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpu-cluster-6\"\n", "amlcompute_cluster_name = \"cpu-cluster-bike\"\n",
"\n", "\n",
"found = False\n", "found = False\n",
"# Check if this compute target already exists in the workspace.\n", "# Check if this compute target already exists in the workspace.\n",
@@ -160,7 +161,7 @@
"source": [ "source": [
"## Data\n", "## Data\n",
"\n", "\n",
"The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace), is paired with the storage account, which contains the default data store. We will use it to upload the bike share data and create [tabular dataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation." "The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace) is paired with the storage account, which contains the default data store. We will use it to upload the bike share data and create [tabular dataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
] ]
}, },
{ {
@@ -201,7 +202,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'dataset/bike-no.csv')]).with_timestamp_columns(fine_grain_timestamp=time_column_name) \n", "dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'dataset/bike-no.csv')]).with_timestamp_columns(fine_grain_timestamp=time_column_name) \n",
"dataset.take(5).to_pandas_dataframe()" "dataset.take(5).to_pandas_dataframe().reset_index(drop=True)"
] ]
}, },
{ {
@@ -220,8 +221,8 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# select data that occurs before a specified date\n", "# select data that occurs before a specified date\n",
"train = dataset.time_before(datetime(2012, 9, 1))\n", "train = dataset.time_before(datetime(2012, 8, 31), include_boundary=True)\n",
"train.to_pandas_dataframe().tail(5)" "train.to_pandas_dataframe().tail(5).reset_index(drop=True)"
] ]
}, },
{ {
@@ -230,8 +231,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"test = dataset.time_after(datetime(2012, 8, 31))\n", "test = dataset.time_after(datetime(2012, 9, 1), include_boundary=True)\n",
"test.to_pandas_dataframe().head(5)" "test.to_pandas_dataframe().head(5).reset_index(drop=True)"
] ]
}, },
{ {
@@ -246,8 +247,8 @@
"|-|-|\n", "|-|-|\n",
"|**task**|forecasting|\n", "|**task**|forecasting|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n", "|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
"|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.constants.supportedmodels.regression?view=azure-ml-py).|\n", "|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n",
"|**experiment_timeout_minutes**|Experimentation timeout in minutes.|\n", "|**experiment_timeout_hours**|Experimentation timeout in hours.|\n",
"|**training_data**|Input dataset, containing both features and label column.|\n", "|**training_data**|Input dataset, containing both features and label column.|\n",
"|**label_column_name**|The name of the label column.|\n", "|**label_column_name**|The name of the label column.|\n",
"|**compute_target**|The remote compute for training.|\n", "|**compute_target**|The remote compute for training.|\n",
@@ -259,7 +260,7 @@
"|**target_lags**|The target_lags specifies how far back we will construct the lags of the target variable.|\n", "|**target_lags**|The target_lags specifies how far back we will construct the lags of the target variable.|\n",
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n", "|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
"\n", "\n",
"This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_minutes parameter value to get results." "This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
] ]
}, },
{ {
@@ -304,11 +305,11 @@
"automl_config = AutoMLConfig(task='forecasting', \n", "automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n", " primary_metric='normalized_root_mean_squared_error',\n",
" blacklist_models = ['ExtremeRandomTrees'], \n", " blacklist_models = ['ExtremeRandomTrees'], \n",
" experiment_timeout_minutes=20,\n", " experiment_timeout_hours=0.3,\n",
" training_data=train,\n", " training_data=train,\n",
" label_column_name=target_column_name,\n", " label_column_name=target_column_name,\n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
" enable_early_stopping = True,\n", " enable_early_stopping=True,\n",
" n_cross_validations=3, \n", " n_cross_validations=3, \n",
" max_concurrent_iterations=4,\n", " max_concurrent_iterations=4,\n",
" max_cores_per_iteration=-1,\n", " max_cores_per_iteration=-1,\n",
@@ -445,13 +446,12 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"import os\n", "import os\n",
"import shutil\n",
"\n", "\n",
"script_folder = os.path.join(os.getcwd(), 'forecast')\n", "script_folder = os.path.join(os.getcwd(), 'forecast')\n",
"project_folder = './forecast'\n", "os.makedirs(script_folder, exist_ok=True)\n",
"os.makedirs(project_folder, exist_ok=True)\n", "shutil.copy2('forecasting_script.py', script_folder)\n",
"\n", "shutil.copy2('forecasting_helper.py', script_folder)"
"!copy forecasting_script.py forecast\n",
"!copy forecasting_helper.py forecast"
] ]
}, },
{ {
@@ -586,7 +586,7 @@
], ],
"category": "tutorial", "category": "tutorial",
"compute": [ "compute": [
"remote" "Remote"
], ],
"datasets": [ "datasets": [
"BikeShare" "BikeShare"
@@ -625,7 +625,7 @@
"tags": [ "tags": [
"Forecasting" "Forecasting"
], ],
"task": "forecasting", "task": "Forecasting",
"version": 3 "version": 3
}, },
"nbformat": 4, "nbformat": 4,

View File

@@ -4,8 +4,7 @@ dependencies:
- py-xgboost<=0.80 - py-xgboost<=0.80
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy==1.16.2
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml
- statsmodels

View File

@@ -1,6 +1,6 @@
import argparse import argparse
import azureml.train.automl import azureml.train.automl
from azureml.automl.core._vendor.automl.client.core.runtime import forecasting_models from azureml.automl.runtime._vendor.automl.client.core.runtime import forecasting_models
from azureml.core import Run from azureml.core import Run
from sklearn.externals import joblib from sklearn.externals import joblib
import forecasting_helper import forecasting_helper
@@ -32,18 +32,17 @@ test_dataset = run.input_datasets['test_data']
grain_column_names = [] grain_column_names = []
df = test_dataset.to_pandas_dataframe() df = test_dataset.to_pandas_dataframe().reset_index(drop=True)
X_test_df = test_dataset.drop_columns(columns=[target_column_name]) X_test_df = test_dataset.drop_columns(columns=[target_column_name]).to_pandas_dataframe().reset_index(drop=True)
y_test_df = test_dataset.with_timestamp_columns( y_test_df = test_dataset.with_timestamp_columns(None).keep_columns(columns=[target_column_name]).to_pandas_dataframe()
None).keep_columns(columns=[target_column_name])
fitted_model = joblib.load('model.pkl') fitted_model = joblib.load('model.pkl')
df_all = forecasting_helper.do_rolling_forecast( df_all = forecasting_helper.do_rolling_forecast(
fitted_model, fitted_model,
X_test_df.to_pandas_dataframe(), X_test_df,
y_test_df.to_pandas_dataframe().values.T[0], y_test_df.values.T[0],
target_column_name, target_column_name,
time_column_name, time_column_name,
max_horizon, max_horizon,

View File

@@ -31,8 +31,8 @@
"1. [Results](#Results)\n", "1. [Results](#Results)\n",
"\n", "\n",
"Advanced Forecasting\n", "Advanced Forecasting\n",
"1. [Advanced Training](#Advanced Training)\n", "1. [Advanced Training](#advanced_training)\n",
"1. [Advanced Results](#Advanced Results)" "1. [Advanced Results](#advanced_results)"
] ]
}, },
{ {
@@ -211,7 +211,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"dataset = Dataset.Tabular.from_delimited_files(path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/nyc_energy.csv\").with_timestamp_columns(fine_grain_timestamp=time_column_name) \n", "dataset = Dataset.Tabular.from_delimited_files(path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/nyc_energy.csv\").with_timestamp_columns(fine_grain_timestamp=time_column_name) \n",
"dataset.take(5).to_pandas_dataframe()" "dataset.take(5).to_pandas_dataframe().reset_index(drop=True)"
] ]
}, },
{ {
@@ -253,7 +253,7 @@
"source": [ "source": [
"# split into train based on time\n", "# split into train based on time\n",
"train = dataset.time_before(datetime(2017, 8, 8, 5), include_boundary=True)\n", "train = dataset.time_before(datetime(2017, 8, 8, 5), include_boundary=True)\n",
"train.to_pandas_dataframe().sort_values(time_column_name).tail(5)" "train.to_pandas_dataframe().reset_index(drop=True).sort_values(time_column_name).tail(5)"
] ]
}, },
{ {
@@ -263,8 +263,8 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# split into test based on time\n", "# split into test based on time\n",
"test = dataset.time_between(datetime(2017, 8, 8, 5), datetime(2017, 8, 10, 5))\n", "test = dataset.time_between(datetime(2017, 8, 8, 6), datetime(2017, 8, 10, 5))\n",
"test.to_pandas_dataframe().head(5)" "test.to_pandas_dataframe().reset_index(drop=True).head(5)"
] ]
}, },
{ {
@@ -301,8 +301,8 @@
"|-|-|\n", "|-|-|\n",
"|**task**|forecasting|\n", "|**task**|forecasting|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n", "|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
"|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.constants.supportedmodels.regression?view=azure-ml-py).|\n", "|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n",
"|**experiment_timeout_minutes**|Maximum amount of time in minutes that the experiment take before it terminates.|\n", "|**experiment_timeout_hours**|Maximum amount of time in hours that the experiment take before it terminates.|\n",
"|**training_data**|The training data to be used within the experiment.|\n", "|**training_data**|The training data to be used within the experiment.|\n",
"|**label_column_name**|The name of the label column.|\n", "|**label_column_name**|The name of the label column.|\n",
"|**compute_target**|The remote compute for training.|\n", "|**compute_target**|The remote compute for training.|\n",
@@ -316,7 +316,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_minutes parameter value to get results." "This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
] ]
}, },
{ {
@@ -333,11 +333,11 @@
"automl_config = AutoMLConfig(task='forecasting', \n", "automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n", " primary_metric='normalized_root_mean_squared_error',\n",
" blacklist_models = ['ExtremeRandomTrees', 'AutoArima', 'Prophet'], \n", " blacklist_models = ['ExtremeRandomTrees', 'AutoArima', 'Prophet'], \n",
" experiment_timeout_minutes=20,\n", " experiment_timeout_hours=0.3,\n",
" training_data=train,\n", " training_data=train,\n",
" label_column_name=target_column_name,\n", " label_column_name=target_column_name,\n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
" enable_early_stopping = True,\n", " enable_early_stopping=True,\n",
" n_cross_validations=3, \n", " n_cross_validations=3, \n",
" verbosity=logging.INFO,\n", " verbosity=logging.INFO,\n",
" **automl_settings)" " **automl_settings)"
@@ -454,7 +454,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X_test = test.to_pandas_dataframe()\n", "X_test = test.to_pandas_dataframe().reset_index(drop=True)\n",
"y_test = X_test.pop(target_column_name).values" "y_test = X_test.pop(target_column_name).values"
] ]
}, },
@@ -463,11 +463,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Forecast Function\n", "### Forecast Function\n",
"For forecasting, we will use the forecast function instead of the predict function. There are two reasons for this.\n", "For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see notebook on [high frequency forecasting](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb)."
"\n",
"We need to pass the recent values of the target variable y, whereas the scikit-compatible predict function only takes the non-target variables 'test'. In our case, the test data immediately follows the training data, and we fill the target variable with NaN. The NaN serves as a question mark for the forecaster to fill with the actuals. Using the forecast function will produce forecasts using the shortest possible forecast horizon. The last time at which a definite (non-NaN) value is seen is the forecast origin - the last time when the value of the target is known.\n",
"\n",
"Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use."
] ]
}, },
{ {
@@ -476,15 +472,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Replace ALL values in y by NaN.\n",
"# The forecast origin will be at the beginning of the first forecast period.\n",
"# (Which is the same time as the end of the last training period.)\n",
"y_query = y_test.copy().astype(np.float)\n",
"y_query.fill(np.nan)\n",
"# The featurized data, aligned to y, will also be returned.\n", "# The featurized data, aligned to y, will also be returned.\n",
"# This contains the assumptions that were made in the forecast\n", "# This contains the assumptions that were made in the forecast\n",
"# and helps align the forecast to the original data\n", "# and helps align the forecast to the original data\n",
"y_predictions, X_trans = fitted_model.forecast(X_test, y_query)" "y_predictions, X_trans = fitted_model.forecast(X_test)"
] ]
}, },
{ {
@@ -557,7 +548,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Advanced Training\n", "## Advanced Training <a id=\"advanced_training\"></a>\n",
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation." "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
] ]
}, },
@@ -587,7 +578,7 @@
"automl_config = AutoMLConfig(task='forecasting', \n", "automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n", " primary_metric='normalized_root_mean_squared_error',\n",
" blacklist_models = ['ElasticNet','ExtremeRandomTrees','GradientBoosting','XGBoostRegressor','ExtremeRandomTrees', 'AutoArima', 'Prophet'], #These models are blacklisted for tutorial purposes, remove this for real use cases. \n", " blacklist_models = ['ElasticNet','ExtremeRandomTrees','GradientBoosting','XGBoostRegressor','ExtremeRandomTrees', 'AutoArima', 'Prophet'], #These models are blacklisted for tutorial purposes, remove this for real use cases. \n",
" experiment_timeout_minutes=20,\n", " experiment_timeout_hours=0.3,\n",
" training_data=train,\n", " training_data=train,\n",
" label_column_name=target_column_name,\n", " label_column_name=target_column_name,\n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
@@ -642,7 +633,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Advanced Results\n", "## Advanced Results<a id=\"advanced_results\"></a>\n",
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation." "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
] ]
}, },
@@ -652,15 +643,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Replace ALL values in y by NaN.\n",
"# The forecast origin will be at the beginning of the first forecast period.\n",
"# (Which is the same time as the end of the last training period.)\n",
"y_query = y_test.copy().astype(np.float)\n",
"y_query.fill(np.nan)\n",
"# The featurized data, aligned to y, will also be returned.\n", "# The featurized data, aligned to y, will also be returned.\n",
"# This contains the assumptions that were made in the forecast\n", "# This contains the assumptions that were made in the forecast\n",
"# and helps align the forecast to the original data\n", "# and helps align the forecast to the original data\n",
"y_predictions, X_trans = fitted_model_lags.forecast(X_test, y_query)" "y_predictions, X_trans = fitted_model_lags.forecast(X_test)"
] ]
}, },
{ {
@@ -730,14 +716,7 @@
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.8" "version": "3.6.8"
}, }
"star_tag": [
"featured"
],
"tags": [
""
],
"task": "Forecasting"
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2

View File

@@ -2,11 +2,10 @@ name: auto-ml-forecasting-energy-demand
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- interpret - numpy==1.16.2
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml - interpret
- statsmodels
- azureml-explain-model - azureml-explain-model
- azureml-contrib-interpret - azureml-contrib-interpret

View File

@@ -103,6 +103,7 @@
"output['SDK version'] = azureml.core.VERSION\n", "output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['SKU'] = ws.sku\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n", "output['Location'] = ws.location\n",
"output['Run History Name'] = experiment_name\n", "output['Run History Name'] = experiment_name\n",
@@ -257,7 +258,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"amlcompute_cluster_name = \"cpu-cluster-8\"\n", "amlcompute_cluster_name = \"cpu-cluster-fcfn\"\n",
" \n", " \n",
"found = False\n", "found = False\n",
"# Check if this compute target already exists in the workspace.\n", "# Check if this compute target already exists in the workspace.\n",
@@ -334,7 +335,7 @@
"automl_config = AutoMLConfig(task='forecasting',\n", "automl_config = AutoMLConfig(task='forecasting',\n",
" debug_log='automl_forecasting_function.log',\n", " debug_log='automl_forecasting_function.log',\n",
" primary_metric='normalized_root_mean_squared_error',\n", " primary_metric='normalized_root_mean_squared_error',\n",
" experiment_timeout_minutes=15,\n", " experiment_timeout_hours=0.25,\n",
" enable_early_stopping=True,\n", " enable_early_stopping=True,\n",
" training_data=train_data,\n", " training_data=train_data,\n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
@@ -376,9 +377,7 @@
"\n", "\n",
"![Forecasting after training](forecast_function_at_train.png)\n", "![Forecasting after training](forecast_function_at_train.png)\n",
"\n", "\n",
"The `X_test` and `y_query` below, taken together, form the **forecast request**. The two are interpreted as aligned - `y_query` could actally be a column in `X_test`. `NaN`s in `y_query` are the question marks. These will be filled with the forecasts.\n", "We use `X_test` as a **forecast request** to generate the predictions."
"\n",
"When the forecast period immediately follows the training period, the models retain the last few points of data. You can simply fill `y_query` filled with question marks - the model has the data for the lookback already.\n"
] ]
}, },
{ {
@@ -407,8 +406,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"y_query = np.repeat(np.NaN, X_test.shape[0])\n", "y_pred_no_gap, xy_nogap = fitted_model.forecast(X_test)\n",
"y_pred_no_gap, xy_nogap = fitted_model.forecast(X_test, y_query)\n",
"\n", "\n",
"# xy_nogap contains the predictions in the _automl_target_col column.\n", "# xy_nogap contains the predictions in the _automl_target_col column.\n",
"# Those same numbers are output in y_pred_no_gap\n", "# Those same numbers are output in y_pred_no_gap\n",
@@ -436,7 +434,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"quantiles = fitted_model.forecast_quantiles(X_test, y_query)\n", "quantiles = fitted_model.forecast_quantiles(X_test)\n",
"quantiles" "quantiles"
] ]
}, },
@@ -447,7 +445,7 @@
"#### Distribution forecasts\n", "#### Distribution forecasts\n",
"\n", "\n",
"Often the figure of interest is not just the point prediction, but the prediction at some quantile of the distribution. \n", "Often the figure of interest is not just the point prediction, but the prediction at some quantile of the distribution. \n",
"This arises when the forecast is used to control some kind of inventory, for example of grocery items of virtual machines for a cloud service. In such case, the control point is usually something like \"we want the item to be in stock and not run out 99% of the time\". This is called a \"service level\". Here is how you get quantile forecasts." "This arises when the forecast is used to control some kind of inventory, for example of grocery items or virtual machines for a cloud service. In such case, the control point is usually something like \"we want the item to be in stock and not run out 99% of the time\". This is called a \"service level\". Here is how you get quantile forecasts."
] ]
}, },
{ {
@@ -459,10 +457,10 @@
"# specify which quantiles you would like \n", "# specify which quantiles you would like \n",
"fitted_model.quantiles = [0.01, 0.5, 0.95]\n", "fitted_model.quantiles = [0.01, 0.5, 0.95]\n",
"# use forecast_quantiles function, not the forecast() one\n", "# use forecast_quantiles function, not the forecast() one\n",
"y_pred_quantiles = fitted_model.forecast_quantiles(X_test, y_query)\n", "y_pred_quantiles = fitted_model.forecast_quantiles(X_test)\n",
"\n", "\n",
"# it all nicely aligns column-wise\n", "# quantile forecasts returned in a Dataframe along with the time and grain columns \n",
"pd.concat([X_test.reset_index(), pd.DataFrame({'query' : y_query}), y_pred_quantiles], axis=1)" "y_pred_quantiles"
] ]
}, },
{ {
@@ -471,7 +469,7 @@
"source": [ "source": [
"#### Destination-date forecast: \"just do something\"\n", "#### Destination-date forecast: \"just do something\"\n",
"\n", "\n",
"In some scenarios, the X_test is not known. The forecast is likely to be weak, becaus it is missing contemporaneous predictors, which we will need to impute. If you still wish to predict forward under the assumption that the last known values will be carried forward, you can forecast out to \"destination date\". The destination date still needs to fit within the maximum horizon from training." "In some scenarios, the X_test is not known. The forecast is likely to be weak, because it is missing contemporaneous predictors, which we will need to impute. If you still wish to predict forward under the assumption that the last known values will be carried forward, you can forecast out to \"destination date\". The destination date still needs to fit within the maximum horizon from training."
] ]
}, },
{ {
@@ -538,9 +536,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"try: \n", "try: \n",
" y_query = y_away.copy()\n", " y_pred_away, xy_away = fitted_model.forecast(X_away)\n",
" y_query.fill(np.NaN)\n",
" y_pred_away, xy_away = fitted_model.forecast(X_away, y_query)\n",
" xy_away\n", " xy_away\n",
"except Exception as e:\n", "except Exception as e:\n",
" print(e)" " print(e)"
@@ -550,7 +546,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"How should we read that eror message? The forecast origin is at the last time themodel saw an actual values of `y` (the target). That was at the end of the training data! Because the model received all `NaN` (and not an actual target value), it is attempting to forecast from the end of training data. But the requested forecast periods are past the maximum horizon. We need to provide a define `y` value to establish the forecast origin.\n", "How should we read that eror message? The forecast origin is at the last time the model saw an actual value of `y` (the target). That was at the end of the training data! The model is attempting to forecast from the end of training data. But the requested forecast periods are past the maximum horizon. We need to provide a define `y` value to establish the forecast origin.\n",
"\n", "\n",
"We will use this helper function to take the required amount of context from the data preceding the testing data. It's definition is intentionally simplified to keep the idea in the clear." "We will use this helper function to take the required amount of context from the data preceding the testing data. It's definition is intentionally simplified to keep the idea in the clear."
] ]
@@ -705,12 +701,12 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "erwright, nirovins" "name": "erwright"
} }
], ],
"category": "tutorial", "category": "tutorial",
"compute": [ "compute": [
"remote" "Remote"
], ],
"datasets": [ "datasets": [
"None" "None"
@@ -739,13 +735,13 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.7" "version": "3.6.8"
}, },
"tags": [ "tags": [
"Forecasting", "Forecasting",
"Confidence Intervals" "Confidence Intervals"
], ],
"task": "forecasting" "task": "Forecasting"
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2

View File

@@ -1,11 +1,10 @@
name: automl-forecasting-function name: auto-ml-forecasting-function
dependencies: dependencies:
- fbprophet==0.5 - fbprophet==0.5
- py-xgboost<=0.80 - py-xgboost<=0.80
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy==1.16.2
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- pandas_ml
- statsmodels
- matplotlib - matplotlib

View File

@@ -40,7 +40,7 @@
"## Introduction\n", "## Introduction\n",
"In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.\n", "In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.\n",
"\n", "\n",
"Make sure you have executed the [configuration notebook](../configuration.ipynb) before running this notebook.\n", "Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
"\n", "\n",
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area." "The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
] ]
@@ -92,6 +92,7 @@
"output['SDK version'] = azureml.core.VERSION\n", "output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n", "output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n", "output['Workspace'] = ws.name\n",
"output['SKU'] = ws.sku\n",
"output['Resource Group'] = ws.resource_group\n", "output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n", "output['Location'] = ws.location\n",
"output['Run History Name'] = experiment_name\n", "output['Run History Name'] = experiment_name\n",
@@ -121,7 +122,7 @@
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute import ComputeTarget\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpu-cluster-7\"\n", "amlcompute_cluster_name = \"cpu-cluster-oj\"\n",
"\n", "\n",
"found = False\n", "found = False\n",
"# Check if this compute target already exists in the workspace.\n", "# Check if this compute target already exists in the workspace.\n",
@@ -135,7 +136,7 @@
" print('Creating a new compute target...')\n", " print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n", " #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 4)\n", " max_nodes = 6)\n",
"\n", "\n",
" # Create the cluster.\n", " # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
@@ -324,9 +325,9 @@
"\n", "\n",
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n", "For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
"\n", "\n",
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up-to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n", "The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
"\n", "\n",
"Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *X_valid* and *y_valid* parameters of AutoMLConfig.\n", "Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *validation_data* parameter of AutoMLConfig.\n",
"\n", "\n",
"Here is a summary of AutoMLConfig parameters used for training the OJ model:\n", "Here is a summary of AutoMLConfig parameters used for training the OJ model:\n",
"\n", "\n",
@@ -334,7 +335,7 @@
"|-|-|\n", "|-|-|\n",
"|**task**|forecasting|\n", "|**task**|forecasting|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n", "|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
"|**experiment_timeout_minutes**|Experimentation timeout in minutes.|\n", "|**experiment_timeout_hours**|Experimentation timeout in hours.|\n",
"|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n", "|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n",
"|**training_data**|Input dataset, containing both features and label column.|\n", "|**training_data**|Input dataset, containing both features and label column.|\n",
"|**label_column_name**|The name of the label column.|\n", "|**label_column_name**|The name of the label column.|\n",
@@ -365,14 +366,12 @@
"automl_config = AutoMLConfig(task='forecasting',\n", "automl_config = AutoMLConfig(task='forecasting',\n",
" debug_log='automl_oj_sales_errors.log',\n", " debug_log='automl_oj_sales_errors.log',\n",
" primary_metric='normalized_mean_absolute_error',\n", " primary_metric='normalized_mean_absolute_error',\n",
" experiment_timeout_minutes=15,\n", " experiment_timeout_hours=0.25,\n",
" training_data=train_dataset,\n", " training_data=train_dataset,\n",
" label_column_name=target_column_name,\n", " label_column_name=target_column_name,\n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
" enable_early_stopping = True,\n", " enable_early_stopping=True,\n",
" n_cross_validations=3,\n", " n_cross_validations=3,\n",
" max_concurrent_iterations=4,\n",
" max_cores_per_iteration=-1,\n",
" verbosity=logging.INFO,\n", " verbosity=logging.INFO,\n",
" **time_series_settings)" " **time_series_settings)"
] ]
@@ -455,9 +454,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data. \n", "To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data."
"\n",
"We will first create a query `y_query`, which is aligned index-for-index to `X_test`. This is a vector of target values where each `NaN` serves the function of the question mark to be replaced by forecast. Passing definite values in the `y` argument allows the `forecast` function to make predictions on data that does not immediately follow the train data which contains `y`. In each grain, the last time point where the model sees a definite value of `y` is that grain's _forecast origin_."
] ]
}, },
{ {
@@ -466,15 +463,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Replace ALL values in y by NaN.\n",
"# The forecast origin will be at the beginning of the first forecast period.\n",
"# (Which is the same time as the end of the last training period.)\n",
"y_query = y_test.copy().astype(np.float)\n",
"y_query.fill(np.nan)\n",
"# The featurized data, aligned to y, will also be returned.\n", "# The featurized data, aligned to y, will also be returned.\n",
"# This contains the assumptions that were made in the forecast\n", "# This contains the assumptions that were made in the forecast\n",
"# and helps align the forecast to the original data\n", "# and helps align the forecast to the original data\n",
"y_predictions, X_trans = fitted_model.forecast(X_test, y_query)" "y_predictions, X_trans = fitted_model.forecast(X_test)"
] ]
}, },
{ {
@@ -570,138 +562,7 @@
"source": [ "source": [
"### Develop the scoring script\n", "### Develop the scoring script\n",
"\n", "\n",
"Serializing and deserializing complex data frames may be tricky. We first develop the ```run()``` function of the scoring script locally, then write it into a scoring script. It is much easier to debug any quirks of the scoring function without crossing two compute environments. For this exercise, we handle a common quirk of how pandas dataframes serialize time stamp values." "For the deployment we need a function which will run the forecast on serialized data. It can be obtained from the best_run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this is where we test the run function of the scoring script interactively\n",
"# before putting it in the scoring script\n",
"\n",
"timestamp_columns = ['WeekStarting']\n",
"\n",
"def run(rawdata, test_model = None):\n",
" \"\"\"\n",
" Intended to process 'rawdata' string produced by\n",
" \n",
" {'X': X_test.to_json(), y' : y_test.to_json()}\n",
" \n",
" Don't convert the X payload to numpy.array, use it as pandas.DataFrame\n",
" \"\"\"\n",
" try:\n",
" # unpack the data frame with timestamp \n",
" rawobj = json.loads(rawdata) # rawobj is now a dict of strings \n",
" X_pred = pd.read_json(rawobj['X'], convert_dates=False) # load the pandas DF from a json string\n",
" for col in timestamp_columns: # fix timestamps\n",
" X_pred[col] = pd.to_datetime(X_pred[col], unit='ms') \n",
" \n",
" y_pred = np.array(rawobj['y']) # reconstitute numpy array from serialized list\n",
" \n",
" if test_model is None:\n",
" result = model.forecast(X_pred, y_pred) # use the global model from init function\n",
" else:\n",
" result = test_model.forecast(X_pred, y_pred) # use the model on which we are testing\n",
" \n",
" except Exception as e:\n",
" result = str(e)\n",
" return json.dumps({\"error\": result})\n",
" \n",
" forecast_as_list = result[0].tolist()\n",
" index_as_df = result[1].index.to_frame().reset_index(drop=True)\n",
" \n",
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
" })"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# test the run function here before putting in the scoring script\n",
"import json\n",
"\n",
"test_sample = json.dumps({'X': X_test.to_json(), 'y' : y_query.tolist()})\n",
"response = run(test_sample, fitted_model)\n",
"\n",
"# unpack the response, dealing with the timestamp serialization again\n",
"res_dict = json.loads(response)\n",
"y_fcst_all = pd.read_json(res_dict['index'])\n",
"y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
"y_fcst_all['forecast'] = res_dict['forecast']\n",
"y_fcst_all.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that the function works locally in the notebook, let's write it down into the scoring script. The scoring script is authored by the data scientist. Adjust it to taste, adding inputs, outputs and processing as needed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score_fcast.py\n",
"import pickle\n",
"import json\n",
"import numpy as np\n",
"import pandas as pd\n",
"import azureml.train.automl\n",
"from sklearn.externals import joblib\n",
"from azureml.core.model import Model\n",
"\n",
"\n",
"def init():\n",
" global model\n",
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
" # deserialize the model file back into a sklearn model\n",
" model = joblib.load(model_path)\n",
"\n",
"timestamp_columns = ['WeekStarting']\n",
"\n",
"def run(rawdata, test_model = None):\n",
" \"\"\"\n",
" Intended to process 'rawdata' string produced by\n",
" \n",
" {'X': X_test.to_json(), y' : y_test.to_json()}\n",
" \n",
" Don't convert the X payload to numpy.array, use it as pandas.DataFrame\n",
" \"\"\"\n",
" try:\n",
" # unpack the data frame with timestamp \n",
" rawobj = json.loads(rawdata) # rawobj is now a dict of strings \n",
" X_pred = pd.read_json(rawobj['X'], convert_dates=False) # load the pandas DF from a json string\n",
" for col in timestamp_columns: # fix timestamps\n",
" X_pred[col] = pd.to_datetime(X_pred[col], unit='ms') \n",
" \n",
" y_pred = np.array(rawobj['y']) # reconstitute numpy array from serialized list\n",
" \n",
" if test_model is None:\n",
" result = model.forecast(X_pred, y_pred) # use the global model from init function\n",
" else:\n",
" result = test_model.forecast(X_pred, y_pred) # use the model on which we are testing\n",
" \n",
" except Exception as e:\n",
" result = str(e)\n",
" return json.dumps({\"error\": result})\n",
" \n",
" # prepare to send over wire as json\n",
" forecast_as_list = result[0].tolist()\n",
" index_as_df = result[1].index.to_frame().reset_index(drop=True)\n",
" \n",
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
" })"
] ]
}, },
{ {
@@ -711,11 +572,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"script_file_name = 'score_fcast.py'\n", "script_file_name = 'score_fcast.py'\n",
"with open(script_file_name, 'r') as cefr:\n", "best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file_name)"
" content = cefr.read()\n",
"\n",
"with open(script_file_name, 'w') as cefw:\n",
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
] ]
}, },
{ {
@@ -773,14 +630,18 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# we send the data to the service serialized into a json string\n", "import json\n",
"test_sample = json.dumps({'X':X_test.to_json(), 'y' : y_query.tolist()})\n", "X_query = X_test.copy()\n",
"# We have to convert datetime to string, because Timestamps cannot be serialized to JSON.\n",
"X_query[time_column_name] = X_query[time_column_name].astype(str)\n",
"# The Service object accept the complex dictionary, which is internally converted to JSON string.\n",
"# The section 'data' contains the data frame in the form of dictionary.\n",
"test_sample = json.dumps({'data': X_query.to_dict(orient='records')})\n",
"response = aci_service.run(input_data = test_sample)\n", "response = aci_service.run(input_data = test_sample)\n",
"\n",
"# translate from networkese to datascientese\n", "# translate from networkese to datascientese\n",
"try: \n", "try: \n",
" res_dict = json.loads(response)\n", " res_dict = json.loads(response)\n",
" y_fcst_all = pd.read_json(res_dict['index'])\n", " y_fcst_all = pd.DataFrame(res_dict['index'])\n",
" y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n", " y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
" y_fcst_all['forecast'] = res_dict['forecast'] \n", " y_fcst_all['forecast'] = res_dict['forecast'] \n",
"except:\n", "except:\n",
@@ -823,7 +684,7 @@
"category": "tutorial", "category": "tutorial",
"celltoolbar": "Raw Cell Format", "celltoolbar": "Raw Cell Format",
"compute": [ "compute": [
"remote" "Remote"
], ],
"datasets": [ "datasets": [
"Orange Juice Sales" "Orange Juice Sales"
@@ -852,8 +713,11 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.7" "version": "3.6.8"
}, },
"tags": [
"None"
],
"task": "Forecasting" "task": "Forecasting"
}, },
"nbformat": 4, "nbformat": 4,

View File

@@ -4,8 +4,8 @@ dependencies:
- py-xgboost<=0.80 - py-xgboost<=0.80
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy==1.16.2
- pandas==0.23.4
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml
- statsmodels

View File

@@ -155,8 +155,7 @@
"automl_settings = {\n", "automl_settings = {\n",
" \"n_cross_validations\": 3,\n", " \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'average_precision_score_weighted',\n", " \"primary_metric\": 'average_precision_score_weighted',\n",
" \"preprocess\": True,\n", " \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ability to find the best model possible\n",
" \"experiment_timeout_minutes\": 10, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n",
" \"verbosity\": logging.INFO,\n", " \"verbosity\": logging.INFO,\n",
" \"enable_stack_ensemble\": False\n", " \"enable_stack_ensemble\": False\n",
"}\n", "}\n",
@@ -260,17 +259,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Print the properties of the model\n", "#### Print the properties of the model\n",
"The fitted_model is a python object and you can read the different properties of the object.\n", "The fitted_model is a python object and you can read the different properties of the object.\n"
"See *Print the properties of the model* section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy\n",
"\n",
"To deploy the model into a web service endpoint, see _Deploy_ section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb)"
] ]
}, },
{ {

View File

@@ -2,10 +2,8 @@ name: auto-ml-classification-credit-card-fraud-local
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- interpret
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml - interpret
- azureml-explain-model

View File

@@ -1,756 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Regression with Deployment using Hardware Performance Dataset**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)\n",
"1. [Acknowledgements](#Acknowledgements)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we use the Predicting Compressive Strength of Concrete Dataset to showcase how you can use AutoML for a regression problem. The regression goal is to predict the compressive strength of concrete based off of different ingredient combinations and the quantities of those ingredients.\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using local compute.\n",
"4. Explore the results.\n",
"5. Test the best fitted model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
" \n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"experiment_name = 'automl-regression-concrete'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlcl\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
" \n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
"# For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data\n",
"\n",
"Create a run configuration for the remote run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import pkg_resources\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Data\n",
"\n",
"Load the concrete strength dataset into X and y. X contains the training features, which are inputs to the model. y contains the training labels, which are the expected output of the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"X = dataset.drop_columns(columns=['CONCRETE'])\n",
"y = dataset.keep_columns(columns=['CONCRETE'], validate=True)\n",
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
"dataset.take(5).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\": 5,\n",
" \"iterations\": 10,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'spearman_correlation',\n",
" \"preprocess\": True,\n",
" \"max_concurrent_iterations\": 5,\n",
" \"verbosity\": logging.INFO,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'regression',\n",
" debug_log = 'automl.log',\n",
" run_configuration=conda_run_config,\n",
" X = X_train,\n",
" y = y_train,\n",
" **automl_settings\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"Widget for Monitoring Runs\n",
"The widget will first report a \u00e2\u20ac\u0153loading status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"Note: The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(remote_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve the Best Model\n",
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest root_mean_squared_error value (which turned out to be the same as the one with largest spearman_correlation value):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"root_mean_squared_error\"\n",
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 3\n",
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
"print(third_run)\n",
"print(third_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register the Fitted Model for Deployment\n",
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"model = remote_run.register_model(description = description, tags = tags)\n",
"\n",
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Scoring Script\n",
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import pickle\n",
"import json\n",
"import numpy\n",
"import azureml.train.automl\n",
"from sklearn.externals import joblib\n",
"from azureml.core.model import Model\n",
"\n",
"def init():\n",
" global model\n",
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
" # deserialize the model file back into a sklearn model\n",
" model = joblib.load(model_path)\n",
"\n",
"def run(rawdata):\n",
" try:\n",
" data = json.loads(rawdata)['data']\n",
" data = numpy.array(data)\n",
" result = model.predict(data)\n",
" except Exception as e:\n",
" result = str(e)\n",
" return json.dumps({\"error\": result})\n",
" return json.dumps({\"result\":result.tolist()})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a YAML File for the Environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-defaults','azureml-train-automl'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Substitute the actual version number in the environment file.\n",
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
"\n",
"with open(conda_env_file_name, 'r') as cefr:\n",
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",
"script_file_name = 'score.py'\n",
"\n",
"with open(script_file_name, 'r') as cefr:\n",
" content = cefr.read()\n",
"\n",
"with open(script_file_name, 'w') as cefw:\n",
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy the model as a Web Service on Azure Container Instance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n",
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import Model\n",
"\n",
"inference_config = InferenceConfig(runtime = \"python\", \n",
" entry_script = script_file_name,\n",
" conda_file = conda_env_file_name)\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n",
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
" description = 'sample service for Automl Regression')\n",
"\n",
"aci_service_name = 'automl-sample-concrete'\n",
"print(aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete a Web Service\n",
"\n",
"Deletes the specified web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get Logs from a Deployed Web Service\n",
"\n",
"Gets logs from a deployed web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.get_logs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test\n",
"\n",
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_test = X_test.to_pandas_dataframe()\n",
"y_test = y_test.to_pandas_dataframe()\n",
"y_test = np.array(y_test)\n",
"y_test = y_test[:,0]\n",
"X_train = X_train.to_pandas_dataframe()\n",
"y_train = y_train.to_pandas_dataframe()\n",
"y_train = np.array(y_train)\n",
"y_train = y_train[:,0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Predict on training and test set, and calculate residual values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_train = fitted_model.predict(X_train)\n",
"y_residual_train = y_train - y_pred_train\n",
"\n",
"y_pred_test = fitted_model.predict(X_test)\n",
"y_residual_test = y_test - y_pred_test\n",
"\n",
"y_residual_train.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from sklearn.metrics import mean_squared_error, r2_score\n",
"\n",
"# Set up a multi-plot chart.\n",
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
"f.set_figheight(6)\n",
"f.set_figwidth(16)\n",
"\n",
"# Plot residual values of training set.\n",
"a0.axis([0, 360, -200, 200])\n",
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
"a0.set_xlabel('Training samples', fontsize = 12)\n",
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
"\n",
"# Plot a histogram.\n",
"#a0.hist(y_residual_train, orientation = 'horizontal', color = ['b']*len(y_residual_train), bins = 10, histtype = 'step')\n",
"#a0.hist(y_residual_train, orientation = 'horizontal', color = ['b']*len(y_residual_train), alpha = 0.2, bins = 10)\n",
"\n",
"# Plot residual values of test set.\n",
"a1.axis([0, 90, -200, 200])\n",
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
"a1.set_xlabel('Test samples', fontsize = 12)\n",
"a1.set_yticklabels([])\n",
"\n",
"# Plot a histogram.\n",
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), bins = 10, histtype = 'step')\n",
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), alpha = 0.2, bins = 10)\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculate metrics for the prediction\n",
"\n",
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
"from the trained model that was returned."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plot outputs\n",
"%matplotlib notebook\n",
"test_pred = plt.scatter(y_test, y_pred_test, color='b')\n",
"test_test = plt.scatter(y_test, y_test, color='g')\n",
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acknowledgements\n",
"\n",
"This Predicting Compressive Strength of Concrete Dataset is made available under the CC0 1.0 Universal (CC0 1.0)\n",
"Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0)\n",
"Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set and http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength\n",
"\n",
"I-Cheng Yeh, \"Modeling of strength of high performance concrete using artificial neural networks,\" Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). \n",
"\n",
"Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science."
]
}
],
"metadata": {
"authors": [
{
"name": "v-rasav"
}
],
"category": "tutorial",
"compute": [
"AML Compute"
],
"datasets": [
"Concrete"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"Azure ML AutoML"
],
"friendly_name": "Regression with deployment using concrete dataset",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
},
"tags": [
""
],
"task": "Regression"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,12 +0,0 @@
name: auto-ml-regression-concrete-strength
dependencies:
- pip:
- azureml-sdk
- interpret
- azureml-defaults
- azureml-explain-model
- azureml-train-automl
- azureml-widgets
- matplotlib
- pandas_ml
- azureml-dataprep[pandas]

View File

@@ -206,9 +206,9 @@
"|-|-|\n", "|-|-|\n",
"|**task**|classification, regression or forecasting|\n", "|**task**|classification, regression or forecasting|\n",
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n", "|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
"|**experiment_timeout_minutes**| Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.|\n", "|**experiment_timeout_hours**| Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n",
"|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n", "|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
"|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Note: If the input data is sparse, featurization cannot be turned on.|\n", "|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*. Note: If the input data is sparse, featurization cannot be turned on.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n",
"|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n", "|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|" "|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|"
@@ -244,7 +244,7 @@
"source": [ "source": [
"featurization_config = FeaturizationConfig()\n", "featurization_config = FeaturizationConfig()\n",
"featurization_config.blocked_transformers = ['LabelEncoder']\n", "featurization_config.blocked_transformers = ['LabelEncoder']\n",
"#featurization_config.drop_columns = ['ERP', 'MMIN']\n", "#featurization_config.drop_columns = ['MMIN']\n",
"featurization_config.add_column_purpose('MYCT', 'Numeric')\n", "featurization_config.add_column_purpose('MYCT', 'Numeric')\n",
"featurization_config.add_column_purpose('VendorName', 'CategoricalHash')\n", "featurization_config.add_column_purpose('VendorName', 'CategoricalHash')\n",
"#default strategy mean, add transformer param for for 3 columns\n", "#default strategy mean, add transformer param for for 3 columns\n",
@@ -262,7 +262,7 @@
"source": [ "source": [
"automl_settings = {\n", "automl_settings = {\n",
" \"enable_early_stopping\": True, \n", " \"enable_early_stopping\": True, \n",
" \"experiment_timeout_minutes\" : 10,\n", " \"experiment_timeout_hours\" : 0.25,\n",
" \"max_concurrent_iterations\": 4,\n", " \"max_concurrent_iterations\": 4,\n",
" \"max_cores_per_iteration\": -1,\n", " \"max_cores_per_iteration\": -1,\n",
" \"n_cross_validations\": 5,\n", " \"n_cross_validations\": 5,\n",
@@ -514,7 +514,7 @@
" content = cefr.read()\n", " content = cefr.read()\n",
"\n", "\n",
"# Replace the values in train_explainer.py file with the appropriate values\n", "# Replace the values in train_explainer.py file with the appropriate values\n",
"content = content.replace('<<experimnet_name>>', automl_run.experiment.name) # your experiment name.\n", "content = content.replace('<<experiment_name>>', automl_run.experiment.name) # your experiment name.\n",
"content = content.replace('<<run_id>>', automl_run.id) # Run-id of the AutoML run for which you want to explain the model.\n", "content = content.replace('<<run_id>>', automl_run.id) # Run-id of the AutoML run for which you want to explain the model.\n",
"content = content.replace('<<target_column_name>>', 'ERP') # Your target column name\n", "content = content.replace('<<target_column_name>>', 'ERP') # Your target column name\n",
"content = content.replace('<<task>>', 'regression') # Training task type\n", "content = content.replace('<<task>>', 'regression') # Training task type\n",
@@ -532,8 +532,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Create conda configuration for model explanations experiment\n", "#### Create conda configuration for model explanations experiment from automl_run object"
"We need `azureml-explain-model`, `azureml-train-automl` and `azureml-core` packages for computing model explanations for your AutoML model on remote compute."
] ]
}, },
{ {
@@ -552,14 +551,9 @@
"# Set compute target to AmlCompute\n", "# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n", "conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n", "conda_run_config.environment.docker.enabled = True\n",
"azureml_pip_packages = [\n",
" 'azureml-train-automl', 'azureml-core', 'azureml-explain-model'\n",
"]\n",
"\n", "\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n", "conda_run_config.environment.python.conda_dependencies = automl_run.get_environment().python.conda_dependencies"
" conda_packages=['scikit-learn', 'numpy','py-xgboost<=0.80'],\n",
" pip_packages=azureml_pip_packages)"
] ]
}, },
{ {
@@ -634,7 +628,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n", "from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n",
"explainer_setup_class = automl_setup_model_explanations(fitted_model, 'regression', X_test=X_test)" "explainer_setup_class = automl_setup_model_explanations(fitted_model, 'regression', X_test=X_test)"
] ]
}, },
@@ -653,11 +647,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.explain.model._internal.explanation_client import ExplanationClient\n", "from azureml.explain.model._internal.explanation_client import ExplanationClient\n",
"from azureml.contrib.interpret.visualize import ExplanationDashboard\n", "from interpret_community.widget import ExplanationDashboard\n",
"client = ExplanationClient.from_run(automl_run)\n", "client = ExplanationClient.from_run(automl_run)\n",
"engineered_explanations = client.download_model_explanation(raw=False)\n", "engineered_explanations = client.download_model_explanation(raw=False)\n",
"print(engineered_explanations.get_feature_importance_dict())\n", "print(engineered_explanations.get_feature_importance_dict())\n",
"ExplanationDashboard(engineered_explanations, explainer_setup_class.automl_estimator, explainer_setup_class.X_test_transform)" "ExplanationDashboard(engineered_explanations, explainer_setup_class.automl_estimator, datasetX=explainer_setup_class.X_test_transform)"
] ]
}, },
{ {
@@ -676,7 +670,7 @@
"source": [ "source": [
"raw_explanations = client.download_model_explanation(raw=True)\n", "raw_explanations = client.download_model_explanation(raw=True)\n",
"print(raw_explanations.get_feature_importance_dict())\n", "print(raw_explanations.get_feature_importance_dict())\n",
"ExplanationDashboard(raw_explanations, explainer_setup_class.automl_pipeline, explainer_setup_class.X_test_raw)" "ExplanationDashboard(raw_explanations, explainer_setup_class.automl_pipeline, datasetX=explainer_setup_class.X_test_raw)"
] ]
}, },
{ {
@@ -718,20 +712,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "conda_dep = automl_run.get_environment().python.conda_dependencies\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-explain-model', 'azureml-train-automl', 'azureml-defaults'\n",
"]\n",
" \n",
"\n",
"# specify CondaDependencies obj\n",
"myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas', 'numpy', 'py-xgboost<=0.80'],\n",
" pip_packages=azureml_pip_packages,\n",
" pin_sdk_version=True)\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())\n", " f.write(conda_dep.serialize_to_string())\n",
"\n", "\n",
"with open(\"myenv.yml\",\"r\") as f:\n", "with open(\"myenv.yml\",\"r\") as f:\n",
" print(f.read())" " print(f.read())"
@@ -772,6 +756,7 @@
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n", "from azureml.core.webservice import AciWebservice\n",
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"from azureml.core.environment import Environment\n",
"\n", "\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
" memory_gb=1, \n", " memory_gb=1, \n",
@@ -779,9 +764,8 @@
" \"method\" : \"local_explanation\"}, \n", " \"method\" : \"local_explanation\"}, \n",
" description='Get local explanations for Machine test data')\n", " description='Get local explanations for Machine test data')\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= \"python\", \n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
" entry_script=\"score_explain.py\",\n", "inference_config = InferenceConfig(entry_script=\"score_explain.py\", environment=myenv)\n",
" conda_file=\"myenv.yml\")\n",
"\n", "\n",
"# Use configs and models generated above\n", "# Use configs and models generated above\n",
"service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n", "service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",

View File

@@ -2,12 +2,10 @@ name: auto-ml-regression-hardware-performance-explanation-and-featurization
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- interpret
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml - interpret
- azureml-explain-model
- azureml-explain-model - azureml-explain-model
- azureml-contrib-interpret - azureml-contrib-interpret

View File

@@ -5,7 +5,8 @@ import os
import pickle import pickle
import azureml.train.automl import azureml.train.automl
import azureml.explain.model import azureml.explain.model
from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \
automl_setup_model_explanations
from sklearn.externals import joblib from sklearn.externals import joblib
from azureml.core.model import Model from azureml.core.model import Model

View File

@@ -6,7 +6,8 @@ from azureml.core.run import Run
from azureml.core.experiment import Experiment from azureml.core.experiment import Experiment
from sklearn.externals import joblib from sklearn.externals import joblib
from azureml.core.dataset import Dataset from azureml.core.dataset import Dataset
from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \
automl_setup_model_explanations, automl_check_model_if_explainable
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
from azureml.explain.model.mimic_wrapper import MimicWrapper from azureml.explain.model.mimic_wrapper import MimicWrapper
from automl.client.core.common.constants import MODEL_PATH from automl.client.core.common.constants import MODEL_PATH
@@ -21,9 +22,14 @@ run = Run.get_context()
ws = run.experiment.workspace ws = run.experiment.workspace
# Get the AutoML run object from the experiment name and the workspace # Get the AutoML run object from the experiment name and the workspace
experiment = Experiment(ws, '<<experimnet_name>>') experiment = Experiment(ws, '<<experiment_name>>')
automl_run = Run(experiment=experiment, run_id='<<run_id>>') automl_run = Run(experiment=experiment, run_id='<<run_id>>')
# Check if this AutoML model is explainable
if not automl_check_model_if_explainable(automl_run):
raise Exception("Model explanations is currently not supported for " + automl_run.get_properties().get(
'run_algorithm'))
# Download the best model from the artifact store # Download the best model from the artifact store
automl_run.download_file(name=MODEL_PATH, output_file_path='model.pkl') automl_run.download_file(name=MODEL_PATH, output_file_path='model.pkl')

View File

@@ -1,761 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Regression with Deployment using Hardware Performance Dataset**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)\n",
"1. [Acknowledgements](#Acknowledgements)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using local compute.\n",
"4. Explore the results.\n",
"5. Test the best fitted model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
" \n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"experiment_name = 'automl-regression-hardware'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlcl\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
" \n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
"# For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data\n",
"\n",
"Create a run configuration for the remote run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import pkg_resources\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Data\n",
"\n",
"Load the hardware performance dataset into X and y. X contains the training features, which are inputs to the model. y contains the training labels, which are the expected output of the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"X = dataset.drop_columns(columns=['ERP'])\n",
"y = dataset.keep_columns(columns=['ERP'], validate=True)\n",
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
"y_train, y_test = y.random_split(percentage=0.8, seed=223)\n",
"dataset.take(5).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Train\n",
"\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\": 5,\n",
" \"iterations\": 10,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'spearman_correlation',\n",
" \"preprocess\": True,\n",
" \"max_concurrent_iterations\": 5,\n",
" \"verbosity\": logging.INFO,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'regression',\n",
" debug_log = 'automl_errors.log',\n",
" run_configuration=conda_run_config,\n",
" X = X_train,\n",
" y = y_train,\n",
" **automl_settings\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output = False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(remote_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve the Best Model\n",
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"root_mean_squared_error\"\n",
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 3\n",
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
"print(third_run)\n",
"print(third_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register the Fitted Model for Deployment\n",
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"model = remote_run.register_model(description = description, tags = tags)\n",
"\n",
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Scoring Script\n",
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import pickle\n",
"import json\n",
"import numpy\n",
"import azureml.train.automl\n",
"from sklearn.externals import joblib\n",
"from azureml.core.model import Model\n",
"\n",
"def init():\n",
" global model\n",
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
" # deserialize the model file back into a sklearn model\n",
" model = joblib.load(model_path)\n",
"\n",
"def run(rawdata):\n",
" try:\n",
" data = json.loads(rawdata)['data']\n",
" data = numpy.array(data)\n",
" result = model.predict(data)\n",
" except Exception as e:\n",
" result = str(e)\n",
" return json.dumps({\"error\": result})\n",
" return json.dumps({\"result\":result.tolist()})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a YAML File for the Environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-defaults','azureml-train-automl'])\n",
"\n",
"conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Substitute the actual version number in the environment file.\n",
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
"\n",
"with open(conda_env_file_name, 'r') as cefr:\n",
" content = cefr.read()\n",
"\n",
"with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n",
"# Substitute the actual model id in the script file.\n",
"\n",
"script_file_name = 'score.py'\n",
"\n",
"with open(script_file_name, 'r') as cefr:\n",
" content = cefr.read()\n",
"\n",
"with open(script_file_name, 'w') as cefw:\n",
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy the model as a Web Service on Azure Container Instance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n",
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import Model\n",
"\n",
"inference_config = InferenceConfig(runtime = \"python\", \n",
" entry_script = script_file_name,\n",
" conda_file = conda_env_file_name)\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n",
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
" description = 'sample service for Automl Regression')\n",
"\n",
"aci_service_name = 'automl-sample-hardware'\n",
"print(aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete a Web Service\n",
"\n",
"Deletes the specified web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get Logs from a Deployed Web Service\n",
"\n",
"Gets logs from a deployed web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.get_logs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test\n",
"\n",
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_test = X_test.to_pandas_dataframe()\n",
"y_test = y_test.to_pandas_dataframe()\n",
"y_test = np.array(y_test)\n",
"y_test = y_test[:,0]\n",
"X_train = X_train.to_pandas_dataframe()\n",
"y_train = y_train.to_pandas_dataframe()\n",
"y_train = np.array(y_train)\n",
"y_train = y_train[:,0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Predict on training and test set, and calculate residual values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_train = fitted_model.predict(X_train)\n",
"y_residual_train = y_train - y_pred_train\n",
"\n",
"y_pred_test = fitted_model.predict(X_test)\n",
"y_residual_test = y_test - y_pred_test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculate metrics for the prediction\n",
"\n",
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
"from the trained model that was returned."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from sklearn.metrics import mean_squared_error, r2_score\n",
"\n",
"# Set up a multi-plot chart.\n",
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
"f.set_figheight(6)\n",
"f.set_figwidth(16)\n",
"\n",
"# Plot residual values of training set.\n",
"a0.axis([0, 360, -200, 200])\n",
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)),fontsize = 12)\n",
"a0.set_xlabel('Training samples', fontsize = 12)\n",
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
"\n",
"# Plot residual values of test set.\n",
"a1.axis([0, 90, -200, 200])\n",
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)),fontsize = 12)\n",
"a1.set_xlabel('Test samples', fontsize = 12)\n",
"a1.set_yticklabels([])\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib notebook\n",
"test_pred = plt.scatter(y_test, y_pred_test, color='')\n",
"test_test = plt.scatter(y_test, y_test, color='g')\n",
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acknowledgements\n",
"This Predicting Hardware Performance Dataset is made available under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/faizunnabi/comp-hardware-performance and https://archive.ics.uci.edu/ml/datasets/Computer+Hardware\n",
"\n",
"_**Citation Found Here**_\n"
]
}
],
"metadata": {
"authors": [
{
"name": "v-rasav"
}
],
"category": "tutorial",
"compute": [
"AML Compute"
],
"datasets": [
"Concrete"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"Azure ML AutoML"
],
"friendly_name": "Regression with deployment using hardware performance dataset",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
},
"star_tag": [
"featured"
],
"tags": [
""
],
"task": "Regression"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,12 +0,0 @@
name: auto-ml-regression-hardware-performance
dependencies:
- pip:
- azureml-sdk
- interpret
- azureml-defaults
- azureml-explain-model
- azureml-train-automl
- azureml-widgets
- matplotlib
- pandas_ml
- azureml-dataprep[pandas]

View File

@@ -188,15 +188,18 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"automlconfig-remarks-sample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"automl_settings = {\n", "automl_settings = {\n",
" \"n_cross_validations\": 3,\n", " \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'r2_score',\n", " \"primary_metric\": 'r2_score',\n",
" \"preprocess\": True,\n",
" \"enable_early_stopping\": True, \n", " \"enable_early_stopping\": True, \n",
" \"experiment_timeout_minutes\": 20, #for real scenarios we reccommend a timeout of at least one hour \n", " \"experiment_timeout_hours\": 0.3, #for real scenarios we reccommend a timeout of at least one hour \n",
" \"max_concurrent_iterations\": 4,\n", " \"max_concurrent_iterations\": 4,\n",
" \"max_cores_per_iteration\": -1,\n", " \"max_cores_per_iteration\": -1,\n",
" \"verbosity\": logging.INFO,\n", " \"verbosity\": logging.INFO,\n",

View File

@@ -2,8 +2,7 @@ name: auto-ml-regression
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- pandas==0.23.4
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib
- pandas_ml
- paramiko<2.5.0

View File

@@ -140,6 +140,9 @@
"framework": [ "framework": [
"Azure ML AutoML" "Azure ML AutoML"
], ],
"tags": [
""
],
"friendly_name": "Forecasting with automated ML SQL integration", "friendly_name": "Forecasting with automated ML SQL integration",
"index_order": 1, "index_order": 1,
"kernelspec": { "kernelspec": {
@@ -151,9 +154,6 @@
"name": "sql", "name": "sql",
"version": "" "version": ""
}, },
"tags": [
""
],
"task": "Forecasting" "task": "Forecasting"
}, },
"nbformat": 4, "nbformat": 4,

View File

@@ -56,7 +56,7 @@ CREATE OR ALTER PROCEDURE [dbo].[AutoMLTrain]
@task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting. @task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.
@experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal. @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.
@iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline. @iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline.
@experiment_timeout_minutes INT = 60, -- The maximum time in minutes for training all pipelines. @experiment_timeout_hours FLOAT = 1, -- The maximum time in hours for training all pipelines.
@n_cross_validations INT = 3, -- The number of cross validations. @n_cross_validations INT = 3, -- The number of cross validations.
@blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used. @blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.
-- The list of possible models can be found at: -- The list of possible models can be found at:
@@ -131,8 +131,8 @@ if __name__.startswith("sqlindb"):
X_train = data_train X_train = data_train
if experiment_timeout_minutes == 0: if experiment_timeout_hours == 0:
experiment_timeout_minutes = None experiment_timeout_hours = None
if experiment_exit_score == 0: if experiment_exit_score == 0:
experiment_exit_score = None experiment_exit_score = None
@@ -163,7 +163,7 @@ if __name__.startswith("sqlindb"):
debug_log = log_file_name, debug_log = log_file_name,
primary_metric = primary_metric, primary_metric = primary_metric,
iteration_timeout_minutes = iteration_timeout_minutes, iteration_timeout_minutes = iteration_timeout_minutes,
experiment_timeout_minutes = experiment_timeout_minutes, experiment_timeout_hours = experiment_timeout_hours,
iterations = iterations, iterations = iterations,
n_cross_validations = n_cross_validations, n_cross_validations = n_cross_validations,
preprocess = preprocess, preprocess = preprocess,
@@ -204,7 +204,7 @@ if __name__.startswith("sqlindb"):
@iterations INT, @task NVARCHAR(40), @iterations INT, @task NVARCHAR(40),
@experiment_name NVARCHAR(32), @experiment_name NVARCHAR(32),
@iteration_timeout_minutes INT, @iteration_timeout_minutes INT,
@experiment_timeout_minutes INT, @experiment_timeout_hours FLOAT,
@n_cross_validations INT, @n_cross_validations INT,
@blacklist_models NVARCHAR(MAX), @blacklist_models NVARCHAR(MAX),
@whitelist_models NVARCHAR(MAX), @whitelist_models NVARCHAR(MAX),
@@ -223,7 +223,7 @@ if __name__.startswith("sqlindb"):
, @task = @task , @task = @task
, @experiment_name = @experiment_name , @experiment_name = @experiment_name
, @iteration_timeout_minutes = @iteration_timeout_minutes , @iteration_timeout_minutes = @iteration_timeout_minutes
, @experiment_timeout_minutes = @experiment_timeout_minutes , @experiment_timeout_hours = @experiment_timeout_hours
, @n_cross_validations = @n_cross_validations , @n_cross_validations = @n_cross_validations
, @blacklist_models = @blacklist_models , @blacklist_models = @blacklist_models
, @whitelist_models = @whitelist_models , @whitelist_models = @whitelist_models

View File

@@ -235,7 +235,7 @@
" @task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.\r\n", " @task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.\r\n",
" @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.\r\n", " @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.\r\n",
" @iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline. \r\n", " @iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline. \r\n",
" @experiment_timeout_minutes INT = 60, -- The maximum time in minutes for training all pipelines.\r\n", " @experiment_timeout_hours FLOAT = 1, -- The maximum time in hours for training all pipelines.\r\n",
" @n_cross_validations INT = 3, -- The number of cross validations.\r\n", " @n_cross_validations INT = 3, -- The number of cross validations.\r\n",
" @blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.\r\n", " @blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.\r\n",
" -- The list of possible models can be found at:\r\n", " -- The list of possible models can be found at:\r\n",
@@ -307,8 +307,8 @@
"\r\n", "\r\n",
" X_train = data_train\r\n", " X_train = data_train\r\n",
"\r\n", "\r\n",
" if experiment_timeout_minutes == 0:\r\n", " if experiment_timeout_hours == 0:\r\n",
" experiment_timeout_minutes = None\r\n", " experiment_timeout_hours = None\r\n",
"\r\n", "\r\n",
" if experiment_exit_score == 0:\r\n", " if experiment_exit_score == 0:\r\n",
" experiment_exit_score = None\r\n", " experiment_exit_score = None\r\n",
@@ -337,7 +337,7 @@
" debug_log = log_file_name, \r\n", " debug_log = log_file_name, \r\n",
" primary_metric = primary_metric, \r\n", " primary_metric = primary_metric, \r\n",
" iteration_timeout_minutes = iteration_timeout_minutes, \r\n", " iteration_timeout_minutes = iteration_timeout_minutes, \r\n",
" experiment_timeout_minutes = experiment_timeout_minutes,\r\n", " experiment_timeout_hours = experiment_timeout_hours,\r\n",
" iterations = iterations, \r\n", " iterations = iterations, \r\n",
" n_cross_validations = n_cross_validations, \r\n", " n_cross_validations = n_cross_validations, \r\n",
" preprocess = preprocess,\r\n", " preprocess = preprocess,\r\n",
@@ -378,7 +378,7 @@
"\t\t\t\t @iterations INT, @task NVARCHAR(40),\r\n", "\t\t\t\t @iterations INT, @task NVARCHAR(40),\r\n",
"\t\t\t\t @experiment_name NVARCHAR(32),\r\n", "\t\t\t\t @experiment_name NVARCHAR(32),\r\n",
"\t\t\t\t @iteration_timeout_minutes INT,\r\n", "\t\t\t\t @iteration_timeout_minutes INT,\r\n",
"\t\t\t\t @experiment_timeout_minutes INT,\r\n", "\t\t\t\t @experiment_timeout_hours FLOAT,\r\n",
"\t\t\t\t @n_cross_validations INT,\r\n", "\t\t\t\t @n_cross_validations INT,\r\n",
"\t\t\t\t @blacklist_models NVARCHAR(MAX),\r\n", "\t\t\t\t @blacklist_models NVARCHAR(MAX),\r\n",
"\t\t\t\t @whitelist_models NVARCHAR(MAX),\r\n", "\t\t\t\t @whitelist_models NVARCHAR(MAX),\r\n",
@@ -396,7 +396,7 @@
"\t, @task = @task\r\n", "\t, @task = @task\r\n",
"\t, @experiment_name = @experiment_name\r\n", "\t, @experiment_name = @experiment_name\r\n",
"\t, @iteration_timeout_minutes = @iteration_timeout_minutes\r\n", "\t, @iteration_timeout_minutes = @iteration_timeout_minutes\r\n",
"\t, @experiment_timeout_minutes = @experiment_timeout_minutes\r\n", "\t, @experiment_timeout_hours = @experiment_timeout_hours\r\n",
"\t, @n_cross_validations = @n_cross_validations\r\n", "\t, @n_cross_validations = @n_cross_validations\r\n",
"\t, @blacklist_models = @blacklist_models\r\n", "\t, @blacklist_models = @blacklist_models\r\n",
"\t, @whitelist_models = @whitelist_models\r\n", "\t, @whitelist_models = @whitelist_models\r\n",

View File

@@ -11,6 +11,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Register Azure Databricks trained model and deploy it to ACI\n"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -161,9 +168,9 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) #showing how to add libs as an eg. - not needed for this model.\n", "myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) # showing how to add libs as an eg. - not needed for this model.\n",
"\n", "\n",
"with open(\"mydeployenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myacienv.serialize_to_string())" " f.write(myacienv.serialize_to_string())"
] ]
}, },
@@ -175,18 +182,37 @@
"source": [ "source": [
"#deploy to ACI\n", "#deploy to ACI\n",
"from azureml.core.webservice import AciWebservice, Webservice\n", "from azureml.core.webservice import AciWebservice, Webservice\n",
"from azureml.exceptions import WebserviceException\n",
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"\n", "\n",
"myaci_config = AciWebservice.deploy_configuration(cpu_cores = 2, \n", "myaci_config = AciWebservice.deploy_configuration(cpu_cores = 2, \n",
" memory_gb = 2, \n", " memory_gb = 2, \n",
" tags = {'name':'Databricks Azure ML ACI'}, \n", " tags = {'name':'Databricks Azure ML ACI'}, \n",
" description = 'This is for ADB and AML example.')\n", " description = 'This is for ADB and AML example.')\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= 'spark-py', \n", "service_name = 'aciws'\n",
" entry_script='score_sparkml.py',\n",
" conda_file='mydeployenv.yml')\n",
"\n", "\n",
"myservice = Model.deploy(ws, 'aciws', [mymodel], inference_config, myaci_config)\n", "# Remove any existing service under the same name.\n",
"try:\n",
" Webservice(ws, service_name).delete()\n",
"except WebserviceException:\n",
" pass\n",
"\n",
"myenv = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')\n",
"# we need to add extra packages to procured environment\n",
"# in order to deploy amended environment we need to rename it\n",
"myenv.name = 'myenv'\n",
"model_dependencies = CondaDependencies('myenv.yml')\n",
"for pip_dep in model_dependencies.pip_packages:\n",
" myenv.python.conda_dependencies.add_pip_package(pip_dep)\n",
"for conda_dep in model_dependencies.conda_packages:\n",
" myenv.python.conda_dependencies.add_conda_package(conda_dep)\n",
"inference_config = InferenceConfig(entry_script='score_sparkml.py', environment=myenv)\n",
"\n",
"myservice = Model.deploy(ws, service_name, [mymodel], inference_config, myaci_config)\n",
"myservice.wait_for_deployment(show_output=True)" "myservice.wait_for_deployment(show_output=True)"
] ]
}, },
@@ -199,18 +225,6 @@
"help(Webservice)" "help(Webservice)"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# List images by ws\n",
"\n",
"for i in ContainerImage.list(workspace = ws):\n",
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -258,6 +272,15 @@
"myservice.delete()" "myservice.delete()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploying to other types of computes\n",
"\n",
"In order to learn how to deploy to other types of compute targets, such as AKS, please take a look at the set of notebooks in the [deployment](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment) folder."
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -1,298 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
"\n",
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook uses image from ACI notebook for deploying to AKS."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set auth to be used by workspace related APIs.\n",
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
"auth = None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config(auth = auth)\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Register the model\n",
"import os\n",
"from azureml.core.model import Model\n",
"\n",
"model_name = \"AdultCensus_runHistory_aks.mml\" # \n",
"model_name_dbfs = os.path.join(\"/dbfs\", model_name)\n",
"\n",
"print(\"copy model from dbfs to local\")\n",
"model_local = \"file:\" + os.getcwd() + \"/\" + model_name\n",
"dbutils.fs.cp(model_name, model_local, True)\n",
"\n",
"mymodel = Model.register(model_path = model_name, # this points to a local file\n",
" model_name = model_name, # this is the name the model is registered as, am using same name for both path and name. \n",
" description = \"ADB trained model by Parashar\",\n",
" workspace = ws)\n",
"\n",
"print(mymodel.name, mymodel.description, mymodel.version)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#%%writefile score_sparkml.py\n",
"score_sparkml = \"\"\"\n",
" \n",
"import json\n",
" \n",
"def init():\n",
" # One-time initialization of PySpark and predictive model\n",
" import pyspark\n",
" from azureml.core.model import Model\n",
" from pyspark.ml import PipelineModel\n",
" \n",
" global trainedModel\n",
" global spark\n",
" \n",
" spark = pyspark.sql.SparkSession.builder.appName(\"ADB and AML notebook by Parashar\").getOrCreate()\n",
" model_name = \"{model_name}\" #interpolated\n",
" model_path = Model.get_model_path(model_name)\n",
" trainedModel = PipelineModel.load(model_path)\n",
" \n",
"def run(input_json):\n",
" if isinstance(trainedModel, Exception):\n",
" return json.dumps({{\"trainedModel\":str(trainedModel)}})\n",
" \n",
" try:\n",
" sc = spark.sparkContext\n",
" input_list = json.loads(input_json)\n",
" input_rdd = sc.parallelize(input_list)\n",
" input_df = spark.read.json(input_rdd)\n",
" \n",
" # Compute prediction\n",
" prediction = trainedModel.transform(input_df)\n",
" #result = prediction.first().prediction\n",
" predictions = prediction.collect()\n",
" \n",
" #Get each scored result\n",
" preds = [str(x['prediction']) for x in predictions]\n",
" result = \",\".join(preds)\n",
" # you can return any data type as long as it is JSON-serializable\n",
" return result.tolist()\n",
" except Exception as e:\n",
" result = str(e)\n",
" return result\n",
" \n",
"\"\"\".format(model_name=model_name)\n",
" \n",
"exec(score_sparkml)\n",
" \n",
"with open(\"score_sparkml.py\", \"w\") as file:\n",
" file.write(score_sparkml)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) #showing how to add libs as an eg. - not needed for this model.\n",
"\n",
"with open(\"mydeployenv.yml\",\"w\") as f:\n",
" f.write(myacienv.serialize_to_string())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#create AKS compute\n",
"#it may take 20-25 minutes to create a new cluster\n",
"\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Use the default configuration (can also provide parameters to customize)\n",
"prov_config = AksCompute.provisioning_configuration()\n",
"\n",
"aks_name = 'ps-aks-demo2' \n",
"\n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)\n",
"\n",
"aks_target.wait_for_completion(show_output = True)\n",
"\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#deploy to AKS\n",
"from azureml.core.webservice import AksWebservice, Webservice\n",
"from azureml.core.model import InferenceConfig\n",
"\n",
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)\n",
"\n",
"inference_config = InferenceConfig(runtime = 'spark-py', \n",
" entry_script ='score_sparkml.py',\n",
" conda_file ='mydeployenv.yml')\n",
"\n",
"aks_service = Model.deploy(ws, 'ps-aks-service', [mymodel], inference_config, aks_config, aks_target)\n",
"aks_service.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.deployment_status"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#for using the Web HTTP API \n",
"print(aks_service.scoring_uri)\n",
"print(aks_service.get_keys())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"#get the some sample data\n",
"test_data_path = \"AdultCensusIncomeTest\"\n",
"test = spark.read.parquet(test_data_path).limit(5)\n",
"\n",
"test_json = json.dumps(test.toJSON().collect())\n",
"\n",
"print(test_json)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#using data defined above predict if income is >50K (1) or <=50K (0)\n",
"aks_service.run(input_data=test_json)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#comment to not delete the web service\n",
"aks_service.delete()\n",
"#model.delete()\n",
"aks_target.delete() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.png)"
]
}
],
"metadata": {
"authors": [
{
"name": "pasha"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
},
"name": "deploy-to-aks-existingimage-05",
"notebookId": 1030695628045968
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -640,7 +640,7 @@
"\n", "\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults', 'azureml-sdk[automl]'])\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults', 'azureml-sdk[automl]'])\n",
"\n", "\n",
"conda_env_file_name = 'mydeployenv.yml'\n", "conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)" "myenv.save_to_file('.', conda_env_file_name)"
] ]
}, },
@@ -661,22 +661,40 @@
"# this will take 10-15 minutes to finish\n", "# this will take 10-15 minutes to finish\n",
"\n", "\n",
"from azureml.core.webservice import AciWebservice, Webservice\n", "from azureml.core.webservice import AciWebservice, Webservice\n",
"from azureml.exceptions import WebserviceException\n",
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"from azureml.core.environment import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import uuid\n", "import uuid\n",
"\n", "\n",
"\n",
"myaci_config = AciWebservice.deploy_configuration(\n", "myaci_config = AciWebservice.deploy_configuration(\n",
" cpu_cores = 2, \n", " cpu_cores = 2, \n",
" memory_gb = 2, \n", " memory_gb = 2, \n",
" tags = {'name':'Databricks Azure ML ACI'}, \n", " tags = {'name':'Databricks Azure ML ACI'}, \n",
" description = 'This is for ADB and AutoML example.')\n", " description = 'This is for ADB and AutoML example.')\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= 'spark-py', \n", "myenv = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')\n",
" entry_script='score.py',\n", "# we need to add extra packages to procured environment\n",
" conda_file='mydeployenv.yml')\n", "# in order to deploy amended environment we need to rename it\n",
"myenv.name = 'myenv'\n",
"model_dependencies = CondaDependencies('myenv.yml')\n",
"for pip_dep in model_dependencies.pip_packages:\n",
" myenv.python.conda_dependencies.add_pip_package(pip_dep)\n",
"for conda_dep in model_dependencies.conda_packages:\n",
" myenv.python.conda_dependencies.add_conda_package(conda_dep)\n",
"inference_config = InferenceConfig(entry_script='score_sparkml.py', environment=myenv)\n",
"\n", "\n",
"guid = str(uuid.uuid4()).split(\"-\")[0]\n", "guid = str(uuid.uuid4()).split(\"-\")[0]\n",
"service_name = \"myservice-{}\".format(guid)\n", "service_name = \"myservice-{}\".format(guid)\n",
"\n",
"# Remove any existing service under the same name.\n",
"try:\n",
" Webservice(ws, service_name).delete()\n",
"except WebserviceException:\n",
" pass\n",
"\n",
"print(\"Creating service with name: {}\".format(service_name))\n", "print(\"Creating service with name: {}\".format(service_name))\n",
"\n", "\n",
"myservice = Model.deploy(ws, service_name, [model], inference_config, myaci_config)\n", "myservice = Model.deploy(ws, service_name, [model], inference_config, myaci_config)\n",
@@ -795,7 +813,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.5" "version": "3.6.8"
}, },
"name": "auto-ml-classification-local-adb", "name": "auto-ml-classification-local-adb",
"notebookId": 2733885892129020 "notebookId": 2733885892129020

View File

@@ -0,0 +1,36 @@
## Examples to get started with Azure Machine Learning SDK for R
Learn how to use Azure Machine Learning SDK for R for experimentation and model management.
As a pre-requisite, go through the [Installation](vignettes/installation.Rmd) and [Configuration](vignettes/configuration.Rmd) vignettes to first install the package and set up your Azure Machine Learning Workspace unless you are running these examples on an Azure Machine Learning compute instance. Azure Machine Learning compute instances have the Azure Machine Learning SDK pre-installed and your workspace details pre-configured.
Samples
* Deployment
* [deploy-to-aci](./samples/deployment/deploy-to-aci): Deploy a model as a web service to Azure Container Instances (ACI).
* [deploy-to-local](./samples/deployment/deploy-to-local): Deploy a model as a web service locally.
* Training
* [train-on-amlcompute](./samples/training/train-on-amlcompute): Train a model on a remote AmlCompute cluster.
* [train-on-local](./samples/training/train-on-local): Train a model locally with Docker.
Vignettes
* [deploy-to-aks](./vignettes/deploy-to-aks): Production deploy a model as a web service to Azure Kubernetes Service (AKS).
* [hyperparameter-tune-with-keras](./vignettes/hyperparameter-tune-with-keras): Hyperparameter tune a Keras model using HyperDrive, Azure ML's hyperparameter tuning functionality.
* [train-and-deploy-to-aci](./vignettes/train-and-deploy-to-aci): Train a caret model and deploy as a web service to Azure Container Instances (ACI).
* [train-with-tensorflow](./vignettes/train-with-tensorflow): Train a deep learning TensorFlow model with Azure ML.
Find more information on the [official documentation site for Azure Machine Learning SDK for R](https://azure.github.io/azureml-sdk-for-r/).
### Troubleshooting
- If the following error occurs when submitting an experiment using RStudio:
```R
Error in py_call_impl(callable, dots$args, dots$keywords) :
PermissionError: [Errno 13] Permission denied
```
Move the files for your project into a subdirectory and reset the working directory to that directory before re-submitting.
In order to submit an experiment, the Azure ML SDK must create a .zip file of the project directory to send to the service. However,
the SDK does not have permission to write into the .Rproj.user subdirectory that is automatically created during an RStudio
session. For this reason, the recommended best practice is to isolate project files into their own directory.

View File

@@ -0,0 +1,11 @@
## Azure Machine Learning samples
These samples are short code examples for using Azure Machine Learning SDK for R. If you are new to the R SDK, we recommend that you first take a look at the more detailed end-to-end [vignettes](../vignettes).
Before running a sample in RStudio, set the working directory to the folder that contains the sample script in RStudio using `setwd(dirname)` or Session -> Set Working Directory -> To Source File Location. Each vignette assumes that the data and scripts are in the current working directory.
1. [train-on-amlcompute](training/train-on-amlcompute): Train a model on a remote AmlCompute cluster.
2. [train-on-local](training/train-on-local): Train a model locally with Docker.
2. [deploy-to-aci](deployment/deploy-to-aci): Deploy a model as a web service to Azure Container Instances (ACI).
3. [deploy-to-local](deployment/deploy-to-local): Deploy a model as a web service locally.
> Before you run these samples, make sure you have an Azure Machine Learning workspace. You can follow the [configuration vignette](../vignettes/configuration.Rmd) to set up a workspace. (You do not need to do this if you are running these examples on an Azure Machine Learning compute instance).

View File

@@ -0,0 +1,59 @@
# Copyright(c) Microsoft Corporation.
# Licensed under the MIT license.
library(azuremlsdk)
library(jsonlite)
ws <- load_workspace_from_config()
# Register the model
model <- register_model(ws, model_path = "project_files/model.rds",
model_name = "model.rds")
# Create environment
r_env <- r_environment(name = "r_env")
# Create inference config
inference_config <- inference_config(
entry_script = "score.R",
source_directory = "project_files",
environment = r_env)
# Create ACI deployment config
deployment_config <- aci_webservice_deployment_config(cpu_cores = 1,
memory_gb = 1)
# Deploy the web service
service <- deploy_model(ws,
'rservice',
list(model),
inference_config,
deployment_config)
wait_for_deployment(service, show_output = TRUE)
# If you encounter any issue in deploying the webservice, please visit
# https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment
# Inferencing
# versicolor
plant <- data.frame(Sepal.Length = 6.4,
Sepal.Width = 2.8,
Petal.Length = 4.6,
Petal.Width = 1.8)
# setosa
plant <- data.frame(Sepal.Length = 5.1,
Sepal.Width = 3.5,
Petal.Length = 1.4,
Petal.Width = 0.2)
# virginica
plant <- data.frame(Sepal.Length = 6.7,
Sepal.Width = 3.3,
Petal.Length = 5.2,
Petal.Width = 2.3)
# Test the web service
predicted_val <- invoke_webservice(service, toJSON(plant))
predicted_val
# Delete the web service
delete_webservice(service)

View File

@@ -0,0 +1,17 @@
# Copyright(c) Microsoft Corporation.
# Licensed under the MIT license.
library(jsonlite)
init <- function() {
model_path <- Sys.getenv("AZUREML_MODEL_DIR")
model <- readRDS(file.path(model_path, "model.rds"))
message("model is loaded")
function(data) {
plant <- as.data.frame(fromJSON(data))
prediction <- predict(model, plant)
result <- as.character(prediction)
toJSON(result)
}
}

View File

@@ -0,0 +1,112 @@
# Copyright(c) Microsoft Corporation.
# Licensed under the MIT license.
# Register model and deploy locally
# This example shows how to deploy a web service in step-by-step fashion:
#
# 1) Register model
# 2) Deploy the model as a web service in a local Docker container.
# 3) Invoke web service with SDK or call web service with raw HTTP call.
# 4) Quickly test changes to your entry script by reloading the local service.
# 5) Optionally, you can also make changes to model and update the local service.
library(azuremlsdk)
library(jsonlite)
ws <- load_workspace_from_config()
# Register the model
model <- register_model(ws, model_path = "project_files/model.rds",
model_name = "model.rds")
# Create environment
r_env <- r_environment(name = "r_env")
# Create inference config
inference_config <- inference_config(
entry_script = "score.R",
source_directory = "project_files",
environment = r_env)
# Create local deployment config
local_deployment_config <- local_webservice_deployment_config()
# Deploy the web service
# NOTE:
# The Docker image runs as a Linux container. If you are running Docker for Windows, you need to ensure the Linux Engine is running:
# # PowerShell command to switch to Linux engine
# & 'C:\Program Files\Docker\Docker\DockerCli.exe' -SwitchLinuxEngine
service <- deploy_model(ws,
'rservice-local',
list(model),
inference_config,
local_deployment_config)
# Wait for deployment
wait_for_deployment(service, show_output = TRUE)
# Show the port of local service
message(service$port)
# If you encounter any issue in deploying the webservice, please visit
# https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment
# Inferencing
# versicolor
# plant <- data.frame(Sepal.Length = 6.4,
# Sepal.Width = 2.8,
# Petal.Length = 4.6,
# Petal.Width = 1.8)
# setosa
plant <- data.frame(Sepal.Length = 5.1,
Sepal.Width = 3.5,
Petal.Length = 1.4,
Petal.Width = 0.2)
# # virginica
# plant <- data.frame(Sepal.Length = 6.7,
# Sepal.Width = 3.3,
# Petal.Length = 5.2,
# Petal.Width = 2.3)
#Test the web service
invoke_webservice(service, toJSON(plant))
## The last few lines of the logs should have the correct prediction and should display -> R[write to console]: "setosa"
cat(gsub(pattern = "\n", replacement = " \n", x = get_webservice_logs(service)))
## Test the web service with a HTTP Raw request
#
# NOTE:
# To test the service locally use the https://localhost:<local_service$port> URL
# Import the request library
library(httr)
# Get the service scoring URL from the service object, its URL is for testing locally
local_service_url <- service$scoring_uri #Same as https://localhost:<local_service$port>
#POST request to web service
resp <- POST(local_service_url, body = plant, encode = "json", verbose())
## The last few lines of the logs should have the correct prediction and should display -> R[write to console]: "setosa"
cat(gsub(pattern = "\n", replacement = " \n", x = get_webservice_logs(service)))
# Optional, use a new scoring script
inference_config <- inference_config(
entry_script = "score_new.R",
source_directory = "project_files",
environment = r_env)
## Then reload the service to see the changes made
reload_local_webservice_assets(service)
## Check reloaded service, you will see the last line will say "this is a new scoring script! I was reloaded"
invoke_webservice(service, toJSON(plant))
cat(gsub(pattern = "\n", replacement = " \n", x = get_webservice_logs(service)))
# Update service
# If you want to change your model(s), environment, or deployment configuration, call update() to rebuild the Docker image.
# update_local_webservice(service, models = [NewModelObject], deployment_config = deployment_config, wait = FALSE, inference_config = inference_config)
# Delete service
delete_local_webservice(service)

View File

@@ -0,0 +1,18 @@
# Copyright(c) Microsoft Corporation.
# Licensed under the MIT license.
library(jsonlite)
init <- function() {
model_path <- Sys.getenv("AZUREML_MODEL_DIR")
model <- readRDS(file.path(model_path, "model.rds"))
message("model is loaded")
function(data) {
plant <- as.data.frame(fromJSON(data))
prediction <- predict(model, plant)
result <- as.character(prediction)
message(result)
toJSON(result)
}
}

View File

@@ -0,0 +1,19 @@
# Copyright(c) Microsoft Corporation.
# Licensed under the MIT license.
library(jsonlite)
init <- function() {
model_path <- Sys.getenv("AZUREML_MODEL_DIR")
model <- readRDS(file.path(model_path, "model.rds"))
message("model is loaded")
function(data) {
plant <- as.data.frame(fromJSON(data))
prediction <- predict(model, plant)
result <- as.character(prediction)
message(result)
message("this is a new scoring script! I was reloaded")
toJSON(result)
}
}

View File

@@ -0,0 +1,34 @@
# This script loads a dataset of which the last column is supposed to be the
# class and logs the accuracy
library(azuremlsdk)
library(caret)
library(optparse)
library(datasets)
iris_data <- data(iris)
summary(iris_data)
in_train <- createDataPartition(y = iris_data$Species, p = .8, list = FALSE)
train_data <- iris_data[in_train,]
test_data <- iris_data[-in_train,]
# Run algorithms using 10-fold cross validation
control <- trainControl(method = "cv", number = 10)
metric <- "Accuracy"
set.seed(7)
model <- train(Species ~ .,
data = train_data,
method = "lda",
metric = metric,
trControl = control)
predictions <- predict(model, test_data)
conf_matrix <- confusionMatrix(predictions, test_data$Species)
message(conf_matrix)
log_metric_to_run(metric, conf_matrix$overall["Accuracy"])
saveRDS(model, file = "./outputs/model.rds")
message("Model saved")

View File

@@ -0,0 +1,41 @@
# Copyright(c) Microsoft Corporation.
# Licensed under the MIT license.
# Reminder: set working directory to current file location prior to running this script
library(azuremlsdk)
ws <- load_workspace_from_config()
# Create AmlCompute cluster
cluster_name <- "r-cluster"
compute_target <- get_compute(ws, cluster_name = cluster_name)
if (is.null(compute_target)) {
vm_size <- "STANDARD_D2_V2"
compute_target <- create_aml_compute(workspace = ws,
cluster_name = cluster_name,
vm_size = vm_size,
max_nodes = 1)
wait_for_provisioning_completion(compute_target, show_output = TRUE)
}
# Define estimator
est <- estimator(source_directory = "scripts",
entry_script = "train.R",
compute_target = compute_target)
experiment_name <- "train-r-script-on-amlcompute"
exp <- experiment(ws, experiment_name)
# Submit job and display the run details
run <- submit_experiment(exp, est)
view_run_details(run)
wait_for_run_completion(run, show_output = TRUE)
# Get the run metrics
metrics <- get_run_metrics(run)
metrics
# Delete cluster
delete_compute(compute_target)

View File

@@ -0,0 +1,28 @@
# This script loads a dataset of which the last column is supposed to be the
# class and logs the accuracy
library(azuremlsdk)
library(caret)
library(datasets)
iris_data <- data(iris)
summary(iris_data)
in_train <- createDataPartition(y = iris_data$Species, p = .8, list = FALSE)
train_data <- iris_data[in_train,]
test_data <- iris_data[-in_train,]
# Run algorithms using 10-fold cross validation
control <- trainControl(method = "cv", number = 10)
metric <- "Accuracy"
set.seed(7)
model <- train(Species ~ .,
data = train_data,
method = "lda",
metric = metric,
trControl = control)
predictions <- predict(model, test_data)
conf_matrix <- confusionMatrix(predictions, test_data$Species)
message(conf_matrix)
log_metric_to_run(metric, conf_matrix$overall["Accuracy"])

View File

@@ -0,0 +1,26 @@
# Copyright(c) Microsoft Corporation.
# Licensed under the MIT license.
# Reminder: set working directory to current file location prior to running this script
library(azuremlsdk)
ws <- load_workspace_from_config()
# Define estimator
est <- estimator(source_directory = "scripts",
entry_script = "train.R",
compute_target = "local")
# Initialize experiment
experiment_name <- "train-r-script-on-local"
exp <- experiment(ws, experiment_name)
# Submit job and display the run details
run <- submit_experiment(exp, est)
view_run_details(run)
wait_for_run_completion(run, show_output = TRUE)
# Get the run metrics
metrics <- get_run_metrics(run)
metrics

View File

@@ -0,0 +1,17 @@
## Azure Machine Learning vignettes
These vignettes are end-to-end tutorials for using Azure Machine Learning SDK for R.
Before running a vignette in RStudio, set the working directory to the folder that contains the vignette file (.Rmd file) in RStudio using `setwd(dirname)` or Session -> Set Working Directory -> To Source File Location. Each vignette assumes that the data and scripts are in the current working directory.
The following vignettes are included:
1. [installation](installation.Rmd): Install the Azure ML SDK for R.
2. [configuration](configuration.Rmd): Set up an Azure ML workspace.
3. [train-and-deploy-to-aci](train-and-deploy-to-aci): Train a caret model and deploy as a web service to Azure Container Instances (ACI).
4. [train-with-tensorflow](train-with-tensorflow/): Train a deep learning TensorFlow model with Azure ML.
5. [hyperparameter-tune-with-keras](hyperparameter-tune-with-keras/): Hyperparameter tune a Keras model using HyperDrive, Azure ML's hyperparameter tuning functionality.
6. [deploy-to-aks](deploy-to-aks/): Production deploy a model as a web service to Azure Kubernetes Service (AKS).
> Before you run these samples, make sure you have an Azure Machine Learning workspace. You can follow the [configuration vignette](../vignettes/configuration.Rmd) to set up a workspace. (You do not need to do this if you are running these examples on an Azure Machine Learning compute instance).
For additional examples on using the R SDK, see the [samples](../samples) folder.

View File

@@ -0,0 +1,108 @@
---
title: "Set up an Azure ML workspace"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Set up an Azure ML workspace}
%\VignetteEngine{knitr::rmarkdown}
\use_package{UTF-8}
---
This tutorial gets you started with the Azure Machine Learning service by walking through the requirements and instructions for setting up a workspace, the top-level resource for Azure ML.
You do not need run this if you are working on an Azure Machine Learning Compute Instance, as the compute instance is already associated with an existing workspace.
## What is an Azure ML workspace?
The workspace is the top-level resource for Azure ML, providing a centralized place to work with all the artifacts you create when you use Azure ML. The workspace keeps a history of all training runs, including logs, metrics, output, and a snapshot of your scripts.
When you create a new workspace, it automatically creates several Azure resources that are used by the workspace:
* Azure Container Registry: Registers docker containers that you use during training and when you deploy a model. To minimize costs, ACR is lazy-loaded until deployment images are created.
* Azure Storage account: Used as the default datastore for the workspace.
* Azure Application Insights: Stores monitoring information about your models.
* Azure Key Vault: Stores secrets that are used by compute targets and other sensitive information that's needed by the workspace.
## Setup
This section describes the steps required before you can access any Azure ML service functionality.
### Azure subscription
In order to create an Azure ML workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com/). Later in this tutorial you will need information such as your subscription ID in order to create and access workspaces.
### Azure ML SDK installation
Follow the [installation guide](https://azure.github.io/azureml-sdk-for-r/articles/installation.html) to install **azuremlsdk** on your machine.
## Configure your workspace
### Workspace parameters
To use an Azure ML workspace, you will need to supply the following information:
* Your subscription ID
* A resource group name
* (Optional) The region that will host your workspace
* A name for your workspace
You can get your subscription ID from the [Azure portal](https://portal.azure.com/).
You will also need access to a [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the Azure portal. If you don't have a resource group, the `create_workspace()` method will create one for you using the name you provide.
The region to host your workspace will be used if you are creating a new workspace. You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service). You should pick a region that is close to your location or that contains your data.
The name for your workspace is unique within the subscription and should be descriptive enough to discern among other workspaces. The subscription may be used only by you, or it may be used by your department or your entire enterprise, so choose a name that makes sense for your situation.
The following code chunk allows you to specify your workspace parameters. It uses `Sys.getenv` to read values from environment variables, which is useful for automation. If no environment variable exists, the parameters will be set to the specified default values. Replace the default values in the code below with your default parameter values.
``` {r configure_parameters, eval=FALSE}
subscription_id <- Sys.getenv("SUBSCRIPTION_ID", unset = "<my-subscription-id>")
resource_group <- Sys.getenv("RESOURCE_GROUP", default="<my-resource-group>")
workspace_name <- Sys.getenv("WORKSPACE_NAME", default="<my-workspace-name>")
workspace_region <- Sys.getenv("WORKSPACE_REGION", default="eastus2")
```
### Create a new workspace
If you don't have an existing workspace and are the owner of the subscription or resource group, you can create a new workspace. If you don't have a resource group, `create_workspace()` will create one for you using the name you provide. If you don't want it to do so, set the `create_resource_group = FALSE` parameter.
Note: As with other Azure services, there are limits on certain resources (e.g. AmlCompute quota) associated with the Azure ML service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.
This cell will create an Azure ML workspace for you in a subscription, provided you have the correct permissions.
This will fail if:
* You do not have permission to create a workspace in the resource group.
* You do not have permission to create a resource group if it does not exist.
* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription.
If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources.
There are additional parameters that are not shown below that can be configured when creating a workspace. Please see [`create_workspace()`](https://azure.github.io/azureml-sdk-for-r/reference/create_workspace.html) for more details.
``` {r create_workspace, eval=FALSE}
library(azuremlsdk)
ws <- create_workspace(name = workspace_name,
subscription_id = subscription_id,
resource_group = resource_group,
location = workspace_region,
exist_ok = TRUE)
```
You can out write out the workspace ARM properties to a config file with [`write_workspace_config()`](https://azure.github.io/azureml-sdk-for-r/reference/write_workspace_config.html). The method provides a simple way of reusing the same workspace across multiple files or projects. Users can save the workspace details with `write_workspace_config()`, and use [`load_workspace_from_config()`](https://azure.github.io/azureml-sdk-for-r/reference/load_workspace_from_config.html) to load the same workspace in different files or projects without retyping the workspace ARM properties. The method defaults to writing out the config file to the current working directory with "config.json" as the file name. To specify a different path or file name, set the `path` and `file_name` parameters.
``` {r write_config, eval=FALSE}
write_workspace_config(ws)
```
### Access an existing workspace
You can access an existing workspace in a couple of ways. If your workspace properties were previously saved to a config file, you can load the workspace as follows:
``` {r load_config, eval=FALSE}
ws <- load_workspace_from_config()
```
If Azure ML cannot find the config file, specify the path to the config file with the `path` parameter. The method defaults to starting the search in the current directory.
You can also initialize a workspace using the [`get_workspace()`](https://azure.github.io/azureml-sdk-for-r/reference/get_workspace.html) method.
``` {r get_workspace, eval=FALSE}
ws <- get_workspace(name = workspace_name,
subscription_id = subscription_id,
resource_group = resource_group)
```

View File

@@ -0,0 +1,188 @@
---
title: "Deploy a web service to Azure Kubernetes Service"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Deploy a web service to Azure Kubernetes Service}
%\VignetteEngine{knitr::rmarkdown}
\use_package{UTF-8}
---
This tutorial demonstrates how to deploy a model as a web service on [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service/) (AKS). AKS is good for high-scale production deployments; use it if you need one or more of the following capabilities:
* Fast response time
* Autoscaling of the deployed service
* Hardware acceleration options such as GPU
You will learn to:
* Set up your testing environment
* Register a model
* Provision an AKS cluster
* Deploy the model to AKS
* Test the deployed service
## Prerequisites
If you don<6F>t have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace.
## Set up your testing environment
Start by setting up your environment. This includes importing the **azuremlsdk** package and connecting to your workspace.
### Import package
```{r import_package, eval=FALSE}
library(azuremlsdk)
```
### Load your workspace
Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a **config.json** file if you previously wrote one out with `write_workspace_config()`.
```{r load_workspace, eval=FALSE}
ws <- load_workspace_from_config()
```
Or, you can retrieve a workspace by directly specifying your workspace details:
```{r get_workspace, eval=FALSE}
ws <- get_workspace("<your workspace name>", "<your subscription ID>", "<your resource group>")
```
## Register the model
In this tutorial we will deploy a model that was trained in one of the [samples](https://github.com/Azure/azureml-sdk-for-r/blob/master/samples/training/train-on-amlcompute/train-on-amlcompute.R). The model was trained with the Iris dataset and can be used to determine if a flower is one of three Iris flower species (setosa, versicolor, virginica). We have provided the model file (`model.rds`) for the tutorial; it is located in the "project_files" directory of this vignette.
First, register the model to your workspace with [`register_model()`](https://azure.github.io/azureml-sdk-for-r/reference/register_model.html). A registered model can be any collection of files, but in this case the R model file is sufficient. Azure ML will use the registered model for deployment.
```{r register_model, eval=FALSE}
model <- register_model(ws,
model_path = "project_files/model.rds",
model_name = "iris_model",
description = "Predict an Iris flower type")
```
## Provision an AKS cluster
When deploying a web service to AKS, you deploy to an AKS cluster that is connected to your workspace. There are two ways to connect an AKS cluster to your workspace:
* Create the AKS cluster. The process automatically connects the cluster to the workspace.
* Attach an existing AKS cluster to your workspace. You can attach a cluster with the [`attach_aks_compute()`](https://azure.github.io/azureml-sdk-for-r/reference/attach_aks_compute.html) method.
Creating or attaching an AKS cluster is a one-time process for your workspace. You can reuse this cluster for multiple deployments. If you delete the cluster or the resource group that contains it, you must create a new cluster the next time you need to deploy.
In this tutorial, we will go with the first method of provisioning a new cluster. See the [`create_aks_compute()`](https://azure.github.io/azureml-sdk-for-r/reference/create_aks_compute.html) reference for the full set of configurable parameters. If you pick custom values for the `agent_count` and `vm_size` parameters, you need to make sure `agent_count` multiplied by `vm_size` is greater than or equal to `12` virtual CPUs.
``` {r provision_cluster, eval=FALSE}
aks_target <- create_aks_compute(ws, cluster_name = 'myakscluster')
wait_for_provisioning_completion(aks_target, show_output = TRUE)
```
The Azure ML SDK does not provide support for scaling an AKS cluster. To scale the nodes in the cluster, use the UI for your AKS cluster in the Azure portal. You can only change the node count, not the VM size of the cluster.
## Deploy as a web service
### Define the inference dependencies
To deploy a model, you need an **inference configuration**, which describes the environment needed to host the model and web service. To create an inference config, you will first need a scoring script and an Azure ML environment.
The scoring script (`entry_script`) is an R script that will take as input variable values (in JSON format) and output a prediction from your model. For this tutorial, use the provided scoring file `score.R`. The scoring script must contain an `init()` method that loads your model and returns a function that uses the model to make a prediction based on the input data. See the [documentation](https://azure.github.io/azureml-sdk-for-r/reference/inference_config.html#details) for more details.
Next, define an Azure ML **environment** for your script<70>s package dependencies. With an environment, you specify R packages (from CRAN or elsewhere) that are needed for your script to run. You can also provide the values of environment variables that your script can reference to modify its behavior.
By default Azure ML will build a default Docker image that includes R, the Azure ML SDK, and additional required dependencies for deployment. See the documentation here for the full list of dependencies that will be installed in the default container. You can also specify additional packages to be installed at runtime, or even a custom Docker image to be used instead of the base image that will be built, using the other available parameters to [`r_environment()`](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html).
```{r create_env, eval=FALSE}
r_env <- r_environment(name = "deploy_env")
```
Now you have everything you need to create an inference config for encapsulating your scoring script and environment dependencies.
``` {r create_inference_config, eval=FALSE}
inference_config <- inference_config(
entry_script = "score.R",
source_directory = "project_files",
environment = r_env)
```
### Deploy to AKS
Now, define the deployment configuration that describes the compute resources needed, for example, the number of cores and memory. See the [`aks_webservice_deployment_config()`](https://azure.github.io/azureml-sdk-for-r/reference/aks_webservice_deployment_config.html) for the full set of configurable parameters.
``` {r deploy_config, eval=FALSE}
aks_config <- aks_webservice_deployment_config(cpu_cores = 1, memory_gb = 1)
```
Now, deploy your model as a web service to the AKS cluster you created earlier.
```{r deploy_service, eval=FALSE}
aks_service <- deploy_model(ws,
'my-new-aksservice',
models = list(model),
inference_config = inference_config,
deployment_config = aks_config,
deployment_target = aks_target)
wait_for_deployment(aks_service, show_output = TRUE)
```
To inspect the logs from the deployment:
```{r get_logs, eval=FALSE}
get_webservice_logs(aks_service)
```
If you encounter any issue in deploying the web service, please visit the [troubleshooting guide](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment).
## Test the deployed service
Now that your model is deployed as a service, you can test the service from R using [`invoke_webservice()`](https://azure.github.io/azureml-sdk-for-r/reference/invoke_webservice.html). Provide a new set of data to predict from, convert it to JSON, and send it to the service.
``` {r test_service, eval=FALSE}
library(jsonlite)
# versicolor
plant <- data.frame(Sepal.Length = 6.4,
Sepal.Width = 2.8,
Petal.Length = 4.6,
Petal.Width = 1.8)
# setosa
# plant <- data.frame(Sepal.Length = 5.1,
# Sepal.Width = 3.5,
# Petal.Length = 1.4,
# Petal.Width = 0.2)
# virginica
# plant <- data.frame(Sepal.Length = 6.7,
# Sepal.Width = 3.3,
# Petal.Length = 5.2,
# Petal.Width = 2.3)
predicted_val <- invoke_webservice(aks_service, toJSON(plant))
message(predicted_val)
```
You can also get the web service<63>s HTTP endpoint, which accepts REST client calls. You can share this endpoint with anyone who wants to test the web service or integrate it into an application.
``` {r eval=FALSE}
aks_service$scoring_uri
```
## Web service authentication
When deploying to AKS, key-based authentication is enabled by default. You can also enable token-based authentication. Token-based authentication requires clients to use an Azure Active Directory account to request an authentication token, which is used to make requests to the deployed service.
To disable key-based auth, set the `auth_enabled = FALSE` parameter when creating the deployment configuration with [`aks_webservice_deployment_config()`](https://azure.github.io/azureml-sdk-for-r/reference/aks_webservice_deployment_config.html).
To enable token-based auth, set `token_auth_enabled = TRUE` when creating the deployment config.
### Key-based authentication
If key authentication is enabled, you can use the [`get_webservice_keys()`](https://azure.github.io/azureml-sdk-for-r/reference/get_webservice_keys.html) method to retrieve a primary and secondary authentication key. To generate a new key, use [`generate_new_webservice_key()`](https://azure.github.io/azureml-sdk-for-r/reference/generate_new_webservice_key.html).
### Token-based authentication
If token authentication is enabled, you can use the [`get_webservice_token()`](https://azure.github.io/azureml-sdk-for-r/reference/get_webservice_token.html) method to retrieve a JWT token and that token's expiration time. Make sure to request a new token after the token's expiration time.
## Clean up resources
Delete the resources once you no longer need them. Do not delete any resource you plan on still using.
Delete the web service:
```{r delete_service, eval=FALSE}
delete_webservice(aks_service)
```
Delete the registered model:
```{r delete_model, eval=FALSE}
delete_model(model)
```
Delete the AKS cluster:
```{r delete_cluster, eval=FALSE}
delete_compute(aks_target)
```

View File

@@ -0,0 +1,17 @@
#' Copyright(c) Microsoft Corporation.
#' Licensed under the MIT license.
library(jsonlite)
init <- function() {
model_path <- Sys.getenv("AZUREML_MODEL_DIR")
model <- readRDS(file.path(model_path, "model.rds"))
message("model is loaded")
function(data) {
plant <- as.data.frame(fromJSON(data))
prediction <- predict(model, plant)
result <- as.character(prediction)
toJSON(result)
}
}

View File

@@ -0,0 +1,242 @@
---
title: "Hyperparameter tune a Keras model"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Hyperparameter tune a Keras model}
%\VignetteEngine{knitr::rmarkdown}
\use_package{UTF-8}
---
This tutorial demonstrates how you can efficiently tune hyperparameters for a model using HyperDrive, Azure ML's hyperparameter tuning functionality. You will train a Keras model on the CIFAR10 dataset, automate hyperparameter exploration, launch parallel jobs, log your results, and find the best run.
### What are hyperparameters?
Hyperparameters are variable parameters chosen to train a model. Learning rate, number of epochs, and batch size are all examples of hyperparameters.
Using brute-force methods to find the optimal values for parameters can be time-consuming, and poor-performing runs can result in wasted money. To avoid this, HyperDrive automates hyperparameter exploration in a time-saving and cost-effective manner by launching several parallel runs with different configurations and finding the configuration that results in best performance on your primary metric.
Let's get started with the example to see how it works!
## Prerequisites
If you don<6F>t have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace.
## Set up development environment
The setup for your development work in this tutorial includes the following actions:
* Import required packages
* Connect to a workspace
* Create an experiment to track your runs
* Create a remote compute target to use for training
### Import **azuremlsdk** package
```{r eval=FALSE}
library(azuremlsdk)
```
### Load your workspace
Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a **config.json** file if you previously wrote one out with [`write_workspace_config()`](https://azure.github.io/azureml-sdk-for-r/reference/write_workspace_config.html).
```{r load_workpace, eval=FALSE}
ws <- load_workspace_from_config()
```
Or, you can retrieve a workspace by directly specifying your workspace details:
```{r get_workpace, eval=FALSE}
ws <- get_workspace("<your workspace name>", "<your subscription ID>", "<your resource group>")
```
### Create an experiment
An Azure ML **experiment** tracks a grouping of runs, typically from the same training script. Create an experiment to track hyperparameter tuning runs for the Keras model.
```{r create_experiment, eval=FALSE}
exp <- experiment(workspace = ws, name = 'hyperdrive-cifar10')
```
If you would like to track your runs in an existing experiment, simply specify that experiment's name to the `name` parameter of `experiment()`.
### Create a compute target
By using Azure Machine Learning Compute (AmlCompute), a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. In this tutorial, you create a GPU-enabled cluster as your training environment. The code below creates the compute cluster for you if it doesn't already exist in your workspace.
You may need to wait a few minutes for your compute cluster to be provisioned if it doesn't already exist.
```{r create_cluster, eval=FALSE}
cluster_name <- "gpucluster"
compute_target <- get_compute(ws, cluster_name = cluster_name)
if (is.null(compute_target))
{
vm_size <- "STANDARD_NC6"
compute_target <- create_aml_compute(workspace = ws,
cluster_name = cluster_name,
vm_size = vm_size,
max_nodes = 4)
wait_for_provisioning_completion(compute_target, show_output = TRUE)
}
```
## Prepare the training script
A training script called `cifar10_cnn.R` has been provided for you in the "project_files" directory of this tutorial.
In order to leverage HyperDrive, the training script for your model must log the relevant metrics during model training. When you configure the hyperparameter tuning run, you specify the primary metric to use for evaluating run performance. You must log this metric so it is available to the hyperparameter tuning process.
In order to log the required metrics, you need to do the following **inside the training script**:
* Import the **azuremlsdk** package
```
library(azuremlsdk)
```
* Take the hyperparameters as command-line arguments to the script. This is necessary so that when HyperDrive carries out the hyperparameter sweep, it can run the training script with different values to the hyperparameters as defined by the search space.
* Use the [`log_metric_to_run()`](https://azure.github.io/azureml-sdk-for-r/reference/log_metric_to_run.html) function to log the hyperparameters and the primary metric.
```
log_metric_to_run("batch_size", batch_size)
...
log_metric_to_run("epochs", epochs)
...
log_metric_to_run("lr", lr)
...
log_metric_to_run("decay", decay)
...
log_metric_to_run("Loss", results[[1]])
```
## Create an estimator
An Azure ML **estimator** encapsulates the run configuration information needed for executing a training script on the compute target. Azure ML runs are run as containerized jobs on the specified compute target. By default, the Docker image built for your training job will include R, the Azure ML SDK, and a set of commonly used R packages. See the full list of default packages included [here](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html). The estimator is used to define the configuration for each of the child runs that the parent HyperDrive run will kick off.
To create the estimator, define the following:
* The directory that contains your scripts needed for training (`source_directory`). All the files in this directory are uploaded to the cluster node(s) for execution. The directory must contain your training script and any additional scripts required.
* The training script that will be executed (`entry_script`).
* The compute target (`compute_target`), in this case the AmlCompute cluster you created earlier.
* Any environment dependencies required for training. Since the training script requires the Keras package, which is not included in the image by default, pass the package name to the `cran_packages` parameter to have it installed in the Docker container where the job will run. See the [`estimator()`](https://azure.github.io/azureml-sdk-for-r/reference/estimator.html) reference for the full set of configurable options.
* Set the `use_gpu = TRUE` flag so the default base GPU Docker image will be built, since the job will be run on a GPU cluster.
```{r create_estimator, eval=FALSE}
est <- estimator(source_directory = "project_files",
entry_script = "cifar10_cnn.R",
compute_target = compute_target,
cran_packages = c("keras"),
use_gpu = TRUE)
```
## Configure the HyperDrive run
To kick off hyperparameter tuning in Azure ML, you will need to configure a HyperDrive run, which will in turn launch individual children runs of the training scripts with the corresponding hyperparameter values.
### Define search space
In this experiment, we will use four hyperparameters: batch size, number of epochs, learning rate, and decay. In order to begin tuning, we must define the range of values we would like to explore from and how they will be distributed. This is called a parameter space definition and can be created with discrete or continuous ranges.
__Discrete hyperparameters__ are specified as a choice among discrete values represented as a list.
Advanced discrete hyperparameters can also be specified using a distribution. The following distributions are supported:
* `quniform(low, high, q)`
* `qloguniform(low, high, q)`
* `qnormal(mu, sigma, q)`
* `qlognormal(mu, sigma, q)`
__Continuous hyperparameters__ are specified as a distribution over a continuous range of values. The following distributions are supported:
* `uniform(low, high)`
* `loguniform(low, high)`
* `normal(mu, sigma)`
* `lognormal(mu, sigma)`
Here, we will use the [`random_parameter_sampling()`](https://azure.github.io/azureml-sdk-for-r/reference/random_parameter_sampling.html) function to define the search space for each hyperparameter. `batch_size` and `epochs` will be chosen from discrete sets while `lr` and `decay` will be drawn from continuous distributions.
Other available sampling function options are:
* [`grid_parameter_sampling()`](https://azure.github.io/azureml-sdk-for-r/reference/grid_parameter_sampling.html)
* [`bayesian_parameter_sampling()`](https://azure.github.io/azureml-sdk-for-r/reference/bayesian_parameter_sampling.html)
```{r search_space, eval=FALSE}
sampling <- random_parameter_sampling(list(batch_size = choice(c(16, 32, 64)),
epochs = choice(c(200, 350, 500)),
lr = normal(0.0001, 0.005),
decay = uniform(1e-6, 3e-6)))
```
### Define termination policy
To prevent resource waste, Azure ML can detect and terminate poorly performing runs. HyperDrive will do this automatically if you specify an early termination policy.
Here, you will use the [`bandit_policy()`](https://azure.github.io/azureml-sdk-for-r/reference/bandit_policy.html), which terminates any runs where the primary metric is not within the specified slack factor with respect to the best performing training run.
```{r termination_policy, eval=FALSE}
policy <- bandit_policy(slack_factor = 0.15)
```
Other termination policy options are:
* [`median_stopping_policy()`](https://azure.github.io/azureml-sdk-for-r/reference/median_stopping_policy.html)
* [`truncation_selection_policy()`](https://azure.github.io/azureml-sdk-for-r/reference/truncation_selection_policy.html)
If no policy is provided, all runs will continue to completion regardless of performance.
### Finalize configuration
Now, you can create a `HyperDriveConfig` object to define your HyperDrive run. Along with the sampling and policy definitions, you need to specify the name of the primary metric that you want to track and whether we want to maximize it or minimize it. The `primary_metric_name` must correspond with the name of the primary metric you logged in your training script. `max_total_runs` specifies the total number of child runs to launch. See the [hyperdrive_config()](https://azure.github.io/azureml-sdk-for-r/reference/hyperdrive_config.html) reference for the full set of configurable parameters.
```{r create_config, eval=FALSE}
hyperdrive_config <- hyperdrive_config(hyperparameter_sampling = sampling,
primary_metric_goal("MINIMIZE"),
primary_metric_name = "Loss",
max_total_runs = 4,
policy = policy,
estimator = est)
```
## Submit the HyperDrive run
Finally submit the experiment to run on your cluster. The parent HyperDrive run will launch the individual child runs. `submit_experiment()` will return a `HyperDriveRun` object that you will use to interface with the run. In this tutorial, since the cluster we created scales to a max of `4` nodes, all 4 child runs will be launched in parallel.
```{r submit_run, eval=FALSE}
hyperdrive_run <- submit_experiment(exp, hyperdrive_config)
```
You can view the HyperDrive run<75>s details as a table. Clicking the <20>Web View<65> link provided will bring you to Azure Machine Learning studio, where you can monitor the run in the UI.
```{r eval=FALSE}
view_run_details(hyperdrive_run)
```
Wait until hyperparameter tuning is complete before you run more code.
```{r eval=FALSE}
wait_for_run_completion(hyperdrive_run, show_output = TRUE)
```
## Analyse runs by performance
Finally, you can view and compare the metrics collected during all of the child runs!
```{r analyse_runs, eval=FALSE}
# Get the metrics of all the child runs
child_run_metrics <- get_child_run_metrics(hyperdrive_run)
child_run_metrics
# Get the child run objects sorted in descending order by the best primary metric
child_runs <- get_child_runs_sorted_by_primary_metric(hyperdrive_run)
child_runs
# Directly get the run object of the best performing run
best_run <- get_best_run_by_primary_metric(hyperdrive_run)
# Get the metrics of the best performing run
metrics <- get_run_metrics(best_run)
metrics
```
The `metrics` variable will include the values of the hyperparameters that resulted in the best performing run.
## Clean up resources
Delete the resources once you no longer need them. Don't delete any resource you plan to still use.
Delete the compute cluster:
```{r delete_compute, eval=FALSE}
delete_compute(compute_target)
```

View File

@@ -0,0 +1,124 @@
#' Modified from: "https://github.com/rstudio/keras/blob/master/vignettes/
#' examples/cifar10_cnn.R"
#'
#' Train a simple deep CNN on the CIFAR10 small images dataset.
#'
#' It gets down to 0.65 test logloss in 25 epochs, and down to 0.55 after 50
#' epochs, though it is still underfitting at that point.
library(keras)
install_keras()
library(azuremlsdk)
# Parameters --------------------------------------------------------------
args <- commandArgs(trailingOnly = TRUE)
batch_size <- as.numeric(args[2])
log_metric_to_run("batch_size", batch_size)
epochs <- as.numeric(args[4])
log_metric_to_run("epochs", epochs)
lr <- as.numeric(args[6])
log_metric_to_run("lr", lr)
decay <- as.numeric(args[8])
log_metric_to_run("decay", decay)
data_augmentation <- TRUE
# Data Preparation --------------------------------------------------------
# See ?dataset_cifar10 for more info
cifar10 <- dataset_cifar10()
# Feature scale RGB values in test and train inputs
x_train <- cifar10$train$x / 255
x_test <- cifar10$test$x / 255
y_train <- to_categorical(cifar10$train$y, num_classes = 10)
y_test <- to_categorical(cifar10$test$y, num_classes = 10)
# Defining Model ----------------------------------------------------------
# Initialize sequential model
model <- keras_model_sequential()
model %>%
# Start with hidden 2D convolutional layer being fed 32x32 pixel images
layer_conv_2d(
filter = 32, kernel_size = c(3, 3), padding = "same",
input_shape = c(32, 32, 3)
) %>%
layer_activation("relu") %>%
# Second hidden layer
layer_conv_2d(filter = 32, kernel_size = c(3, 3)) %>%
layer_activation("relu") %>%
# Use max pooling
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(0.25) %>%
# 2 additional hidden 2D convolutional layers
layer_conv_2d(filter = 32, kernel_size = c(3, 3), padding = "same") %>%
layer_activation("relu") %>%
layer_conv_2d(filter = 32, kernel_size = c(3, 3)) %>%
layer_activation("relu") %>%
# Use max pooling once more
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(0.25) %>%
# Flatten max filtered output into feature vector
# and feed into dense layer
layer_flatten() %>%
layer_dense(512) %>%
layer_activation("relu") %>%
layer_dropout(0.5) %>%
# Outputs from dense layer are projected onto 10 unit output layer
layer_dense(10) %>%
layer_activation("softmax")
opt <- optimizer_rmsprop(lr, decay)
model %>%
compile(loss = "categorical_crossentropy",
optimizer = opt,
metrics = "accuracy"
)
# Training ----------------------------------------------------------------
if (!data_augmentation) {
model %>%
fit(x_train,
y_train,
batch_size = batch_size,
epochs = epochs,
validation_data = list(x_test, y_test),
shuffle = TRUE
)
} else {
datagen <- image_data_generator(rotation_range = 20,
width_shift_range = 0.2,
height_shift_range = 0.2,
horizontal_flip = TRUE
)
datagen %>% fit_image_data_generator(x_train)
results <- evaluate(model, x_train, y_train, batch_size)
log_metric_to_run("Loss", results[[1]])
cat("Loss: ", results[[1]], "\n")
cat("Accuracy: ", results[[2]], "\n")
}

View File

@@ -0,0 +1,100 @@
---
title: "Install the Azure ML SDK for R"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Install the Azure ML SDK for R}
%\VignetteEngine{knitr::rmarkdown}
\use_package{UTF-8}
---
This article covers the step-by-step instructions for installing the Azure ML SDK for R.
You do not need run this if you are working on an Azure Machine Learning Compute Instance, as the compute instance already has the Azure ML SDK preinstalled.
## Install Conda
If you do not have Conda already installed on your machine, you will first need to install it, since the Azure ML R SDK uses **reticulate** to bind to the Python SDK. We recommend installing [Miniconda](https://docs.conda.io/en/latest/miniconda.html), which is a smaller, lightweight version of Anaconda. Choose the 64-bit binary for Python 3.5 or later.
## Install the **azuremlsdk** R package
You will need **remotes** to install **azuremlsdk** from the GitHub repo.
``` {r install_remotes, eval=FALSE}
install.packages('remotes')
```
Then, you can use the `install_github` function to install the package.
``` {r install_azuremlsdk, eval=FALSE}
remotes::install_cran('azuremlsdk', repos = 'https://cloud.r-project.org/')
```
If you are using R installed from CRAN, which comes with 32-bit and 64-bit binaries, you may need to specify the parameter `INSTALL_opts=c("--no-multiarch")` to only build for the current 64-bit architecture.
``` {r eval=FALSE}
remotes::install_cran('azuremlsdk', repos = 'https://cloud.r-project.org/', INSTALL_opts=c("--no-multiarch"))
```
## Install the Azure ML Python SDK
Lastly, use the **azuremlsdk** R library to install the Python SDK. By default, `azuremlsdk::install_azureml()` will install the [latest version of the Python SDK](https://pypi.org/project/azureml-sdk/) in a conda environment called `r-azureml` if reticulate < 1.14 or `r-reticulate` if reticulate ≥ 1.14.
``` {r install_pythonsdk, eval=FALSE}
azuremlsdk::install_azureml()
```
If you would like to override the default version, environment name, or Python version, you can pass in those arguments. If you would like to restart the R session after installation or delete the conda environment if it already exists and create a new environment, you can also do so:
``` {r eval=FALSE}
azuremlsdk::install_azureml(version = NULL,
custom_envname = "<your conda environment name>",
conda_python_version = "<desired python version>",
restart_session = TRUE,
remove_existing_env = TRUE)
```
## Test installation
You can confirm your installation worked by loading the library and successfully retrieving a run.
``` {r test_installation, eval=FALSE}
library(azuremlsdk)
get_current_run()
```
## Troubleshooting
- In step 3 of the installation, if you get ssl errors on windows, it is due to an
outdated openssl binary. Install the latest openssl binaries from
[here](https://wiki.openssl.org/index.php/Binaries).
- If installation fails due to this error:
```R
Error in strptime(xx, f, tz = tz) :
(converted from warning) unable to identify current timezone 'C':
please set environment variable 'TZ'
In R CMD INSTALL
Error in i.p(...) :
(converted from warning) installation of package C:/.../azureml_0.4.0.tar.gz had non-zero exit
status
```
You will need to set your time zone environment variable to GMT and restart the installation process.
```R
Sys.setenv(TZ='GMT')
```
- If the following permission error occurs while installing in RStudio,
change your RStudio session to administrator mode, and re-run the installation command.
```R
Downloading GitHub repo Azure/azureml-sdk-for-r@master
Skipping 2 packages ahead of CRAN: reticulate, rlang
Running `R CMD build`...
Error: (converted from warning) invalid package
'C:/.../file2b441bf23631'
In R CMD INSTALL
Error in i.p(...) :
(converted from warning) installation of package
C:/.../file2b441bf23631 had non-zero exit status
In addition: Warning messages:
1: In file(con, "r") :
cannot open file 'C:...\file2b44144a540f': Permission denied
2: In file(con, "r") :
cannot open file 'C:...\file2b4463c21577': Permission denied
```

View File

@@ -0,0 +1,16 @@
#' Copyright(c) Microsoft Corporation.
#' Licensed under the MIT license.
library(jsonlite)
init <- function() {
model_path <- Sys.getenv("AZUREML_MODEL_DIR")
model <- readRDS(file.path(model_path, "model.rds"))
message("logistic regression model loaded")
function(data) {
vars <- as.data.frame(fromJSON(data))
prediction <- as.numeric(predict(model, vars, type = "response") * 100)
toJSON(prediction)
}
}

View File

@@ -0,0 +1,33 @@
#' Copyright(c) Microsoft Corporation.
#' Licensed under the MIT license.
library(azuremlsdk)
library(optparse)
library(caret)
options <- list(
make_option(c("-d", "--data_folder"))
)
opt_parser <- OptionParser(option_list = options)
opt <- parse_args(opt_parser)
paste(opt$data_folder)
accidents <- readRDS(file.path(opt$data_folder, "accidents.Rd"))
summary(accidents)
mod <- glm(dead ~ dvcat + seatbelt + frontal + sex + ageOFocc + yearVeh + airbag + occRole, family = binomial, data = accidents)
summary(mod)
predictions <- factor(ifelse(predict(mod) > 0.1, "dead", "alive"))
conf_matrix <- confusionMatrix(predictions, accidents$dead)
message(conf_matrix)
log_metric_to_run("Accuracy", conf_matrix$overall["Accuracy"])
output_dir = "outputs"
if (!dir.exists(output_dir)) {
dir.create(output_dir)
}
saveRDS(mod, file = "./outputs/model.rds")
message("Model saved")

View File

@@ -0,0 +1,326 @@
---
title: "Train and deploy your first model with Azure ML"
author: "David Smith"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Train and deploy your first model with Azure ML}
%\VignetteEngine{knitr::rmarkdown}
\use_package{UTF-8}
---
In this tutorial, you learn the foundational design patterns in Azure Machine Learning. You'll train and deploy a **caret** model to predict the likelihood of a fatality in an automobile accident. After completing this tutorial, you'll have the practical knowledge of the R SDK to scale up to developing more-complex experiments and workflows.
In this tutorial, you learn the following tasks:
* Connect your workspace
* Load data and prepare for training
* Upload data to the datastore so it is available for remote training
* Create a compute resource
* Train a caret model to predict probability of fatality
* Deploy a prediction endpoint
* Test the model from R
## Prerequisites
If you don't have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace.
## Set up your development environment
The setup for your development work in this tutorial includes the following actions:
* Install required packages
* Connect to a workspace, so that your local computer can communicate with remote resources
* Create an experiment to track your runs
* Create a remote compute target to use for training
### Install required packages
This tutorial assumes you already have the Azure ML SDK installed. Go ahead and import the **azuremlsdk** package.
```{r eval=FALSE}
library(azuremlsdk)
```
The tutorial uses data from the [**DAAG** package](https://cran.r-project.org/package=DAAG). Install the package if you don't have it.
```{r eval=FALSE}
install.packages("DAAG")
```
The training and scoring scripts (`accidents.R` and `accident_predict.R`) have some additional dependencies. If you plan on running those scripts locally, make sure you have those required packages as well.
### Load your workspace
Instantiate a workspace object from your existing workspace. The following code will load the workspace details from the **config.json** file. You can also retrieve a workspace using [`get_workspace()`](https://azure.github.io/azureml-sdk-for-r/reference/get_workspace.html).
```{r load_workpace, eval=FALSE}
ws <- load_workspace_from_config()
```
### Create an experiment
An Azure ML experiment tracks a grouping of runs, typically from the same training script. Create an experiment to track the runs for training the caret model on the accidents data.
```{r create_experiment, eval=FALSE}
experiment_name <- "accident-logreg"
exp <- experiment(ws, experiment_name)
```
### Create a compute target
By using Azure Machine Learning Compute (AmlCompute), a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create a single-node AmlCompute cluster as your training environment. The code below creates the compute cluster for you if it doesn't already exist in your workspace.
You may need to wait a few minutes for your compute cluster to be provisioned if it doesn't already exist.
```{r create_cluster, eval=FALSE}
cluster_name <- "rcluster"
compute_target <- get_compute(ws, cluster_name = cluster_name)
if (is.null(compute_target)) {
vm_size <- "STANDARD_D2_V2"
compute_target <- create_aml_compute(workspace = ws,
cluster_name = cluster_name,
vm_size = vm_size,
max_nodes = 1)
wait_for_provisioning_completion(compute_target, show_output = TRUE)
}
```
## Prepare data for training
This tutorial uses data from the **DAAG** package. This dataset includes data from over 25,000 car crashes in the US, with variables you can use to predict the likelihood of a fatality. First, import the data into R and transform it into a new dataframe `accidents` for analysis, and export it to an `Rdata` file.
```{r load_data, eval=FALSE}
library(DAAG)
data(nassCDS)
accidents <- na.omit(nassCDS[,c("dead","dvcat","seatbelt","frontal","sex","ageOFocc","yearVeh","airbag","occRole")])
accidents$frontal <- factor(accidents$frontal, labels=c("notfrontal","frontal"))
accidents$occRole <- factor(accidents$occRole)
saveRDS(accidents, file="accidents.Rd")
```
### Upload data to the datastore
Upload data to the cloud so that it can be access by your remote training environment. Each Azure ML workspace comes with a default datastore that stores the connection information to the Azure blob container that is provisioned in the storage account attached to the workspace. The following code will upload the accidents data you created above to that datastore.
```{r upload_data, eval=FALSE}
ds <- get_default_datastore(ws)
target_path <- "accidentdata"
upload_files_to_datastore(ds,
list("./project_files/accidents.Rd"),
target_path = target_path,
overwrite = TRUE)
```
## Train a model
For this tutorial, fit a logistic regression model on your uploaded data using your remote compute cluster. To submit a job, you need to:
* Prepare the training script
* Create an estimator
* Submit the job
### Prepare the training script
A training script called `accidents.R` has been provided for you in the "project_files" directory of this tutorial. Notice the following details **inside the training script** that have been done to leverage the Azure ML service for training:
* The training script takes an argument `-d` to find the directory that contains the training data. When you define and submit your job later, you point to the datastore for this argument. Azure ML will mount the storage folder to the remote cluster for the training job.
* The training script logs the final accuracy as a metric to the run record in Azure ML using `log_metric_to_run()`. The Azure ML SDK provides a set of logging APIs for logging various metrics during training runs. These metrics are recorded and persisted in the experiment run record. The metrics can then be accessed at any time or viewed in the run details page in [Azure Machine Learning studio](http://ml.azure.com). See the [reference](https://azure.github.io/azureml-sdk-for-r/reference/index.html#section-training-experimentation) for the full set of logging methods `log_*()`.
* The training script saves your model into a directory named **outputs**. The `./outputs` folder receives special treatment by Azure ML. During training, files written to `./outputs` are automatically uploaded to your run record by Azure ML and persisted as artifacts. By saving the trained model to `./outputs`, you'll be able to access and retrieve your model file even after the run is over and you no longer have access to your remote training environment.
### Create an estimator
An Azure ML estimator encapsulates the run configuration information needed for executing a training script on the compute target. Azure ML runs are run as containerized jobs on the specified compute target. By default, the Docker image built for your training job will include R, the Azure ML SDK, and a set of commonly used R packages. See the full list of default packages included [here](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html).
To create the estimator, define:
* The directory that contains your scripts needed for training (`source_directory`). All the files in this directory are uploaded to the cluster node(s) for execution. The directory must contain your training script and any additional scripts required.
* The training script that will be executed (`entry_script`).
* The compute target (`compute_target`), in this case the AmlCompute cluster you created earlier.
* The parameters required from the training script (`script_params`). Azure ML will run your training script as a command-line script with `Rscript`. In this tutorial you specify one argument to the script, the data directory mounting point, which you can access with `ds$path(target_path)`.
* Any environment dependencies required for training. The default Docker image built for training already contains the three packages (`caret`, `e1071`, and `optparse`) needed in the training script. So you don't need to specify additional information. If you are using R packages that are not included by default, use the estimator's `cran_packages` parameter to add additional CRAN packages. See the [`estimator()`](https://azure.github.io/azureml-sdk-for-r/reference/estimator.html) reference for the full set of configurable options.
```{r create_estimator, eval=FALSE}
est <- estimator(source_directory = "project_files",
entry_script = "accidents.R",
script_params = list("--data_folder" = ds$path(target_path)),
compute_target = compute_target
)
```
### Submit the job on the remote cluster
Finally submit the job to run on your cluster. `submit_experiment()` returns a Run object that you then use to interface with the run. In total, the first run takes **about 10 minutes**. But for later runs, the same Docker image is reused as long as the script dependencies don't change. In this case, the image is cached and the container startup time is much faster.
```{r submit_job, eval=FALSE}
run <- submit_experiment(exp, est)
```
You can view a table of the run's details. Clicking the "Web View" link provided will bring you to Azure Machine Learning studio, where you can monitor the run in the UI.
```{r view_run, eval=FALSE}
view_run_details(run)
```
Model training happens in the background. Wait until the model has finished training before you run more code.
```{r wait_run, eval=FALSE}
wait_for_run_completion(run, show_output = TRUE)
```
You -- and colleagues with access to the workspace -- can submit multiple experiments in parallel, and Azure ML will take of scheduling the tasks on the compute cluster. You can even configure the cluster to automatically scale up to multiple nodes, and scale back when there are no more compute tasks in the queue. This configuration is a cost-effective way for teams to share compute resources.
## Retrieve training results
Once your model has finished training, you can access the artifacts of your job that were persisted to the run record, including any metrics logged and the final trained model.
### Get the logged metrics
In the training script `accidents.R`, you logged a metric from your model: the accuracy of the predictions in the training data. You can see metrics in the [studio](https://ml.azure.com), or extract them to the local session as an R list as follows:
```{r metrics, eval=FALSE}
metrics <- get_run_metrics(run)
metrics
```
If you've run multiple experiments (say, using differing variables, algorithms, or hyperparamers), you can use the metrics from each run to compare and choose the model you'll use in production.
### Get the trained model
You can retrieve the trained model and look at the results in your local R session. The following code will download the contents of the `./outputs` directory, which includes the model file.
```{r retrieve_model, eval=FALSE}
download_files_from_run(run, prefix="outputs/")
accident_model <- readRDS("project_files/outputs/model.rds")
summary(accident_model)
```
You see some factors that contribute to an increase in the estimated probability of death:
* higher impact speed
* male driver
* older occupant
* passenger
You see lower probabilities of death with:
* presence of airbags
* presence seatbelts
* frontal collision
The vehicle year of manufacture does not have a significant effect.
You can use this model to make new predictions:
```{r manual_predict, eval=FALSE}
newdata <- data.frame( # valid values shown below
dvcat="10-24", # "1-9km/h" "10-24" "25-39" "40-54" "55+"
seatbelt="none", # "none" "belted"
frontal="frontal", # "notfrontal" "frontal"
sex="f", # "f" "m"
ageOFocc=16, # age in years, 16-97
yearVeh=2002, # year of vehicle, 1955-2003
airbag="none", # "none" "airbag"
occRole="pass" # "driver" "pass"
)
## predicted probability of death for these variables, as a percentage
as.numeric(predict(accident_model,newdata, type="response")*100)
```
## Deploy as a web service
With your model, you can predict the danger of death from a collision. Use Azure ML to deploy your model as a prediction service. In this tutorial, you will deploy the web service in [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/) (ACI).
### Register the model
First, register the model you downloaded to your workspace with [`register_model()`](https://azure.github.io/azureml-sdk-for-r/reference/register_model.html). A registered model can be any collection of files, but in this case the R model object is sufficient. Azure ML will use the registered model for deployment.
```{r register_model, eval=FALSE}
model <- register_model(ws,
model_path = "project_files/outputs/model.rds",
model_name = "accidents_model",
description = "Predict probablity of auto accident")
```
### Define the inference dependencies
To create a web service for your model, you first need to create a scoring script (`entry_script`), an R script that will take as input variable values (in JSON format) and output a prediction from your model. For this tutorial, use the provided scoring file `accident_predict.R`. The scoring script must contain an `init()` method that loads your model and returns a function that uses the model to make a prediction based on the input data. See the [documentation](https://azure.github.io/azureml-sdk-for-r/reference/inference_config.html#details) for more details.
Next, define an Azure ML **environment** for your script's package dependencies. With an environment, you specify R packages (from CRAN or elsewhere) that are needed for your script to run. You can also provide the values of environment variables that your script can reference to modify its behavior. By default, Azure ML will build the same default Docker image used with the estimator for training. Since the tutorial has no special requirements, create an environment with no special attributes.
```{r create_environment, eval=FALSE}
r_env <- r_environment(name = "basic_env")
```
If you want to use your own Docker image for deployment instead, specify the `custom_docker_image` parameter. See the [`r_environment()`](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html) reference for the full set of configurable options for defining an environment.
Now you have everything you need to create an **inference config** for encapsulating your scoring script and environment dependencies.
``` {r create_inference_config, eval=FALSE}
inference_config <- inference_config(
entry_script = "accident_predict.R",
source_directory = "project_files",
environment = r_env)
```
### Deploy to ACI
In this tutorial, you will deploy your service to ACI. This code provisions a single container to respond to inbound requests, which is suitable for testing and light loads. See [`aci_webservice_deployment_config()`](https://azure.github.io/azureml-sdk-for-r/reference/aci_webservice_deployment_config.html) for additional configurable options. (For production-scale deployments, you can also [deploy to Azure Kubernetes Service](https://azure.github.io/azureml-sdk-for-r/articles/deploy-to-aks/deploy-to-aks.html).)
``` {r create_aci_config, eval=FALSE}
aci_config <- aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 0.5)
```
Now you deploy your model as a web service. Deployment **can take several minutes**.
```{r deploy_service, eval=FALSE}
aci_service <- deploy_model(ws,
'accident-pred',
list(model),
inference_config,
aci_config)
wait_for_deployment(aci_service, show_output = TRUE)
```
If you encounter any issue in deploying the web service, please visit the [troubleshooting guide](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment).
## Test the deployed service
Now that your model is deployed as a service, you can test the service from R using [`invoke_webservice()`](https://azure.github.io/azureml-sdk-for-r/reference/invoke_webservice.html). Provide a new set of data to predict from, convert it to JSON, and send it to the service.
```{r test_deployment, eval=FALSE}
library(jsonlite)
newdata <- data.frame( # valid values shown below
dvcat="10-24", # "1-9km/h" "10-24" "25-39" "40-54" "55+"
seatbelt="none", # "none" "belted"
frontal="frontal", # "notfrontal" "frontal"
sex="f", # "f" "m"
ageOFocc=22, # age in years, 16-97
yearVeh=2002, # year of vehicle, 1955-2003
airbag="none", # "none" "airbag"
occRole="pass" # "driver" "pass"
)
prob <- invoke_webservice(aci_service, toJSON(newdata))
prob
```
You can also get the web service's HTTP endpoint, which accepts REST client calls. You can share this endpoint with anyone who wants to test the web service or integrate it into an application.
```{r get_endpoint, eval=FALSE}
aci_service$scoring_uri
```
## Clean up resources
Delete the resources once you no longer need them. Don't delete any resource you plan to still use.
Delete the web service:
```{r delete_service, eval=FALSE}
delete_webservice(aci_service)
```
Delete the registered model:
```{r delete_model, eval=FALSE}
delete_model(model)
```
Delete the compute cluster:
```{r delete_compute, eval=FALSE}
delete_compute(compute_target)
```

View File

@@ -0,0 +1,62 @@
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
# Copyright 2016 RStudio, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
library(tensorflow)
install_tensorflow(version = "1.13.2-gpu")
library(azuremlsdk)
# Create the model
x <- tf$placeholder(tf$float32, shape(NULL, 784L))
W <- tf$Variable(tf$zeros(shape(784L, 10L)))
b <- tf$Variable(tf$zeros(shape(10L)))
y <- tf$nn$softmax(tf$matmul(x, W) + b)
# Define loss and optimizer
y_ <- tf$placeholder(tf$float32, shape(NULL, 10L))
cross_entropy <- tf$reduce_mean(-tf$reduce_sum(y_ * log(y),
reduction_indices = 1L))
train_step <- tf$train$GradientDescentOptimizer(0.5)$minimize(cross_entropy)
# Create session and initialize variables
sess <- tf$Session()
sess$run(tf$global_variables_initializer())
# Load mnist data )
datasets <- tf$contrib$learn$datasets
mnist <- datasets$mnist$read_data_sets("MNIST-data", one_hot = TRUE)
# Train
for (i in 1:1000) {
batches <- mnist$train$next_batch(100L)
batch_xs <- batches[[1]]
batch_ys <- batches[[2]]
sess$run(train_step,
feed_dict = dict(x = batch_xs, y_ = batch_ys))
}
# Test trained model
correct_prediction <- tf$equal(tf$argmax(y, 1L), tf$argmax(y_, 1L))
accuracy <- tf$reduce_mean(tf$cast(correct_prediction, tf$float32))
cat("Accuracy: ", sess$run(accuracy,
feed_dict = dict(x = mnist$test$images,
y_ = mnist$test$labels)))
log_metric_to_run("accuracy",
sess$run(accuracy, feed_dict = dict(x = mnist$test$images,
y_ = mnist$test$labels)))

View File

@@ -0,0 +1,143 @@
---
title: "Train a TensorFlow model"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Train a TensorFlow model}
%\VignetteEngine{knitr::rmarkdown}
\use_package{UTF-8}
---
This tutorial demonstrates how run a TensorFlow job at scale using Azure ML. You will train a TensorFlow model to classify handwritten digits (MNIST) using a deep neural network (DNN) and log your results to the Azure ML service.
## Prerequisites
If you don<6F>t have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace.
## Set up development environment
The setup for your development work in this tutorial includes the following actions:
* Import required packages
* Connect to a workspace
* Create an experiment to track your runs
* Create a remote compute target to use for training
### Import **azuremlsdk** package
```{r eval=FALSE}
library(azuremlsdk)
```
### Load your workspace
Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a **config.json** file if you previously wrote one out with [`write_workspace_config()`](https://azure.github.io/azureml-sdk-for-r/reference/write_workspace_config.html).
```{r load_workpace, eval=FALSE}
ws <- load_workspace_from_config()
```
Or, you can retrieve a workspace by directly specifying your workspace details:
```{r get_workpace, eval=FALSE}
ws <- get_workspace("<your workspace name>", "<your subscription ID>", "<your resource group>")
```
### Create an experiment
An Azure ML **experiment** tracks a grouping of runs, typically from the same training script. Create an experiment to track the runs for training the TensorFlow model on the MNIST data.
```{r create_experiment, eval=FALSE}
exp <- experiment(workspace = ws, name = "tf-mnist")
```
If you would like to track your runs in an existing experiment, simply specify that experiment's name to the `name` parameter of `experiment()`.
### Create a compute target
By using Azure Machine Learning Compute (AmlCompute), a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. In this tutorial, you create a GPU-enabled cluster as your training environment. The code below creates the compute cluster for you if it doesn't already exist in your workspace.
You may need to wait a few minutes for your compute cluster to be provisioned if it doesn't already exist.
```{r create_cluster, eval=FALSE}
cluster_name <- "gpucluster"
compute_target <- get_compute(ws, cluster_name = cluster_name)
if (is.null(compute_target))
{
vm_size <- "STANDARD_NC6"
compute_target <- create_aml_compute(workspace = ws,
cluster_name = cluster_name,
vm_size = vm_size,
max_nodes = 4)
wait_for_provisioning_completion(compute_target, show_output = TRUE)
}
```
## Prepare the training script
A training script called `tf_mnist.R` has been provided for you in the "project_files" directory of this tutorial. The Azure ML SDK provides a set of logging APIs for logging various metrics during training runs. These metrics are recorded and persisted in the experiment run record, and can be be accessed at any time or viewed in the run details page in [Azure Machine Learning studio](http://ml.azure.com/).
In order to collect and upload run metrics, you need to do the following **inside the training script**:
* Import the **azuremlsdk** package
```
library(azuremlsdk)
```
* Add the [`log_metric_to_run()`](https://azure.github.io/azureml-sdk-for-r/reference/log_metric_to_run.html) function to track our primary metric, "accuracy", for this experiment. If you have your own training script with several important metrics, simply create a logging call for each one within the script.
```
log_metric_to_run("accuracy",
sess$run(accuracy,
feed_dict = dict(x = mnist$test$images, y_ = mnist$test$labels)))
```
See the [reference](https://azure.github.io/azureml-sdk-for-r/reference/index.html#section-training-experimentation) for the full set of logging methods `log_*()` available from the R SDK.
## Create an estimator
An Azure ML **estimator** encapsulates the run configuration information needed for executing a training script on the compute target. Azure ML runs are run as containerized jobs on the specified compute target. By default, the Docker image built for your training job will include R, the Azure ML SDK, and a set of commonly used R packages. See the full list of default packages included [here](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html).
To create the estimator, define the following:
* The directory that contains your scripts needed for training (`source_directory`). All the files in this directory are uploaded to the cluster node(s) for execution. The directory must contain your training script and any additional scripts required.
* The training script that will be executed (`entry_script`).
* The compute target (`compute_target`), in this case the AmlCompute cluster you created earlier.
* Any environment dependencies required for training. Since the training script requires the TensorFlow package, which is not included in the image by default, pass the package name to the `cran_packages` parameter to have it installed in the Docker container where the job will run. See the [`estimator()`](https://azure.github.io/azureml-sdk-for-r/reference/estimator.html) reference for the full set of configurable options.
* Set the `use_gpu = TRUE` flag so the default base GPU Docker image will be built, since the job will be run on a GPU cluster.
```{r create_estimator, eval=FALSE}
est <- estimator(source_directory = "project_files",
entry_script = "tf_mnist.R",
compute_target = compute_target,
cran_packages = c("tensorflow"),
use_gpu = TRUE)
```
## Submit the job
Finally submit the job to run on your cluster. [`submit_experiment()`](https://azure.github.io/azureml-sdk-for-r/reference/submit_experiment.html) returns a `Run` object that you can then use to interface with the run.
```{r submit_job, eval=FALSE}
run <- submit_experiment(exp, est)
```
You can view the run<75>s details as a table. Clicking the <20>Web View<65> link provided will bring you to Azure Machine Learning studio, where you can monitor the run in the UI.
```{r eval=FALSE}
view_run_details(run)
```
Model training happens in the background. Wait until the model has finished training before you run more code.
```{r eval=FALSE}
wait_for_run_completion(run, show_output = TRUE)
```
## View run metrics
Once your job has finished, you can view the metrics collected during your TensorFlow run.
```{r get_metrics, eval=FALSE}
metrics <- get_run_metrics(run)
metrics
```
## Clean up resources
Delete the resources once you no longer need them. Don't delete any resource you plan to still use.
Delete the compute cluster:
```{r delete_compute, eval=FALSE}
delete_compute(compute_target)
```

View File

@@ -345,7 +345,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"sample-akscompute-provision"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AksCompute, ComputeTarget\n", "from azureml.core.compute import AksCompute, ComputeTarget\n",

View File

@@ -682,7 +682,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"sample-akswebservice-deploy-from-image"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"%%time\n", "%%time\n",

View File

@@ -195,7 +195,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Only Environments that were created using azureml-defaults version 1.0.48 or later will work with this new handling however.\n", "You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Please note that your environment must include azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service.\n",
"\n", "\n",
"More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)." "More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)."
] ]
@@ -221,23 +221,30 @@
"## Create Inference Configuration\n", "## Create Inference Configuration\n",
"\n", "\n",
"There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.\n", "There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.\n",
"Note: in that case, your entry_script, conda_file, and extra_docker_file_steps paths are relative paths to the source_directory path.\n", "Note: in that case, environments's entry_script and file_path are relative paths to the source_directory path; myenv.docker.base_dockerfile is a string containing extra docker steps or contents of the docker file.\n",
"\n", "\n",
"Sample code for using a source directory:\n", "Sample code for using a source directory:\n",
"\n", "\n",
"```python\n", "```python\n",
"from azureml.core.environment import Environment\n",
"from azureml.core.model import InferenceConfig\n",
"\n",
"myenv = Environment.from_conda_specification(name='myenv', file_path='env/myenv.yml')\n",
"\n",
"# explicitly set base_image to None when setting base_dockerfile\n",
"myenv.docker.base_image = None\n",
"# add extra docker commends to execute\n",
"myenv.docker.base_dockerfile = \"FROM ubuntu\\n RUN echo \\\"hello\\\"\"\n",
"\n",
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n", "inference_config = InferenceConfig(source_directory=\"C:/abc\",\n",
" runtime= \"python\", \n",
" entry_script=\"x/y/score.py\",\n", " entry_script=\"x/y/score.py\",\n",
" conda_file=\"env/myenv.yml\", \n", " environment=myenv)\n",
" extra_docker_file_steps=\"helloworld.txt\")\n",
"```\n", "```\n",
"\n", "\n",
" - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", " - file_path: input parameter to Environment constructor. Manages conda and python package dependencies.\n",
" - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python\n", " - env.docker.base_dockerfile: any extra steps you want to inject into docker file\n",
" - entry_script = contains logic specific to initializing your model and running predictions\n", " - source_directory: holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n",
" - conda_file = manages conda and python package dependencies.\n", " - entry_script: contains logic specific to initializing your model and running predictions"
" - extra_docker_file_steps = optional: any extra steps you want to inject into docker file"
] ]
}, },
{ {

View File

@@ -20,7 +20,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Register model and deploy as webservice\n", "# Register model and deploy as webservice in ACI\n",
"\n", "\n",
"Following this notebook, you will:\n", "Following this notebook, you will:\n",
"\n", "\n",
@@ -45,6 +45,7 @@
"source": [ "source": [
"import azureml.core\n", "import azureml.core\n",
"\n", "\n",
"\n",
"# Check core SDK version number.\n", "# Check core SDK version number.\n",
"print('SDK version:', azureml.core.VERSION)" "print('SDK version:', azureml.core.VERSION)"
] ]
@@ -70,6 +71,7 @@
"source": [ "source": [
"from azureml.core import Workspace\n", "from azureml.core import Workspace\n",
"\n", "\n",
"\n",
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
] ]
@@ -91,6 +93,7 @@
"source": [ "source": [
"from azureml.core import Dataset\n", "from azureml.core import Dataset\n",
"\n", "\n",
"\n",
"datastore = ws.get_default_datastore()\n", "datastore = ws.get_default_datastore()\n",
"datastore.upload_files(files=['./features.csv', './labels.csv'],\n", "datastore.upload_files(files=['./features.csv', './labels.csv'],\n",
" target_path='sklearn_regression/',\n", " target_path='sklearn_regression/',\n",
@@ -116,7 +119,8 @@
"execution_count": null, "execution_count": null,
"metadata": { "metadata": {
"tags": [ "tags": [
"register model from file" "register model from file",
"sample-model-register"
] ]
}, },
"outputs": [], "outputs": [],
@@ -124,6 +128,7 @@
"from azureml.core import Model\n", "from azureml.core import Model\n",
"from azureml.core.resource_configuration import ResourceConfiguration\n", "from azureml.core.resource_configuration import ResourceConfiguration\n",
"\n", "\n",
"\n",
"model = Model.register(workspace=ws,\n", "model = Model.register(workspace=ws,\n",
" model_name='my-sklearn-model', # Name of the registered model in your workspace.\n", " model_name='my-sklearn-model', # Name of the registered model in your workspace.\n",
" model_path='./sklearn_regression_model.pkl', # Local file to upload and register as a model.\n", " model_path='./sklearn_regression_model.pkl', # Local file to upload and register as a model.\n",
@@ -158,6 +163,8 @@
"\n", "\n",
"The Azure Machine Learning service provides a default environment for supported model frameworks, including scikit-learn, based on the metadata you provided when registering your model. This is the easiest way to deploy your model.\n", "The Azure Machine Learning service provides a default environment for supported model frameworks, including scikit-learn, based on the metadata you provided when registering your model. This is the easiest way to deploy your model.\n",
"\n", "\n",
"Even when you deploy your model to ACI with a default environment you can still customize the deploy configuration (i.e. the number of cores and amount of memory made available for the deployment) using the [AciWebservice.deploy_configuration()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aci.aciwebservice#deploy-configuration-cpu-cores-none--memory-gb-none--tags-none--properties-none--description-none--location-none--auth-enabled-none--ssl-enabled-none--enable-app-insights-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--ssl-cname-none--dns-name-label-none--). Look at the \"Use a custom environment\" section of this notebook for more information on deploy configuration.\n",
"\n",
"**Note**: This step can take several minutes." "**Note**: This step can take several minutes."
] ]
}, },
@@ -170,6 +177,7 @@
"from azureml.core import Webservice\n", "from azureml.core import Webservice\n",
"from azureml.exceptions import WebserviceException\n", "from azureml.exceptions import WebserviceException\n",
"\n", "\n",
"\n",
"service_name = 'my-sklearn-service'\n", "service_name = 'my-sklearn-service'\n",
"\n", "\n",
"# Remove any existing service under the same name.\n", "# Remove any existing service under the same name.\n",
@@ -197,6 +205,7 @@
"source": [ "source": [
"import json\n", "import json\n",
"\n", "\n",
"\n",
"input_payload = json.dumps({\n", "input_payload = json.dumps({\n",
" 'data': [\n", " 'data': [\n",
" [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n", " [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n",
@@ -230,9 +239,9 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Use a custom environment (for all models)\n", "### Use a custom environment\n",
"\n", "\n",
"If you want more control over how your model is run, if it uses another framework, or if it has special runtime requirements, you can instead specify your own environment and scoring method.\n", "If you want more control over how your model is run, if it uses another framework, or if it has special runtime requirements, you can instead specify your own environment and scoring method. Custom environments can be used for any model you want to deploy.\n",
"\n", "\n",
"Specify the model's runtime environment by creating an [Environment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment%28class%29?view=azure-ml-py) object and providing the [CondaDependencies](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) needed by your model." "Specify the model's runtime environment by creating an [Environment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment%28class%29?view=azure-ml-py) object and providing the [CondaDependencies](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) needed by your model."
] ]
@@ -246,6 +255,7 @@
"from azureml.core import Environment\n", "from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n", "from azureml.core.conda_dependencies import CondaDependencies\n",
"\n", "\n",
"\n",
"environment = Environment('my-sklearn-environment')\n", "environment = Environment('my-sklearn-environment')\n",
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n", "environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
" 'azureml-defaults',\n", " 'azureml-defaults',\n",
@@ -277,7 +287,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Deploy your model in the custom environment by providing an [InferenceConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py) object to [Model.deploy()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#deploy-workspace--name--models--inference-config--deployment-config-none--deployment-target-none-).\n", "Deploy your model in the custom environment by providing an [InferenceConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py) object to [Model.deploy()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#deploy-workspace--name--models--inference-config--deployment-config-none--deployment-target-none-). In this case we are also using the [AciWebservice.deploy_configuration()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aci.aciwebservice#deploy-configuration-cpu-cores-none--memory-gb-none--tags-none--properties-none--description-none--location-none--auth-enabled-none--ssl-enabled-none--enable-app-insights-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--ssl-cname-none--dns-name-label-none--) method to generate a custom deploy configuration.\n",
"\n", "\n",
"**Note**: This step can take several minutes." "**Note**: This step can take several minutes."
] ]
@@ -287,15 +297,18 @@
"execution_count": null, "execution_count": null,
"metadata": { "metadata": {
"tags": [ "tags": [
"azuremlexception-remarks-sample" "azuremlexception-remarks-sample",
"sample-aciwebservice-deploy-config"
] ]
}, },
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core import Webservice\n", "from azureml.core import Webservice\n",
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n",
"from azureml.exceptions import WebserviceException\n", "from azureml.exceptions import WebserviceException\n",
"\n", "\n",
"\n",
"service_name = 'my-custom-env-service'\n", "service_name = 'my-custom-env-service'\n",
"\n", "\n",
"# Remove any existing service under the same name.\n", "# Remove any existing service under the same name.\n",
@@ -304,11 +317,14 @@
"except WebserviceException:\n", "except WebserviceException:\n",
" pass\n", " pass\n",
"\n", "\n",
"inference_config = InferenceConfig(entry_script='score.py',\n", "inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n",
" source_directory='.',\n", "aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n",
" environment=environment)\n",
"\n", "\n",
"service = Model.deploy(ws, service_name, [model], inference_config)\n", "service = Model.deploy(workspace=ws,\n",
" name=service_name,\n",
" models=[model],\n",
" inference_config=inference_config,\n",
" deployment_config=aci_config)\n",
"service.wait_for_deployment(show_output=True)" "service.wait_for_deployment(show_output=True)"
] ]
}, },
@@ -325,8 +341,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import json\n",
"\n",
"input_payload = json.dumps({\n", "input_payload = json.dumps({\n",
" 'data': [\n", " 'data': [\n",
" [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n", " [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n",
@@ -359,16 +373,101 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Model profiling\n", "### Model Profiling\n",
"\n", "\n",
"You can also take advantage of the profiling feature to estimate CPU and memory requirements for models.\n", "Profile your model to understand how much CPU and memory the service, created as a result of its deployment, will need. Profiling returns information such as CPU usage, memory usage, and response latency. It also provides a CPU and memory recommendation based on the resource usage. You can profile your model (or more precisely the service built based on your model) on any CPU and/or memory combination where 0.1 <= CPU <= 3.5 and 0.1GB <= memory <= 15GB. If you do not provide a CPU and/or memory requirement, we will test it on the default configuration of 3.5 CPU and 15GB memory.\n",
"\n", "\n",
"```python\n", "In order to profile your model you will need:\n",
"profile = Model.profile(ws, \"profilename\", [model], inference_config, test_sample)\n", "- a registered model\n",
"profile.wait_for_profiling(True)\n", "- an entry script\n",
"profiling_results = profile.get_results()\n", "- an inference configuration\n",
"print(profiling_results)\n", "- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n",
"```" "\n",
"At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n",
"\n",
"Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Datastore\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.data import dataset_type_definitions\n",
"\n",
"\n",
"# create a string that can be utf-8 encoded and\n",
"# put in the body of the request\n",
"serialized_input_json = json.dumps({\n",
" 'data': [\n",
" [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n",
" -0.03482076, -0.04340085, -0.00259226, 0.01990842, -0.01764613]\n",
" ]\n",
"})\n",
"dataset_content = []\n",
"for i in range(100):\n",
" dataset_content.append(serialized_input_json)\n",
"dataset_content = '\\n'.join(dataset_content)\n",
"file_name = 'sample_request_data.txt'\n",
"f = open(file_name, 'w')\n",
"f.write(dataset_content)\n",
"f.close()\n",
"\n",
"# upload the txt file created above to the Datastore and create a dataset from it\n",
"data_store = Datastore.get_default(ws)\n",
"data_store.upload_files(['./' + file_name], target_path='sample_request_data')\n",
"datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]\n",
"sample_request_data = Dataset.Tabular.from_delimited_files(\n",
" datastore_path,\n",
" separator='\\n',\n",
" infer_column_types=True,\n",
" header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n",
"sample_request_data = sample_request_data.register(workspace=ws,\n",
" name='diabetes_sample_request_data',\n",
" create_new_version=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have an input dataset we are ready to go ahead with profiling. In this case we are testing the previously introduced sklearn regression model on 1 CPU and 0.5 GB memory. The memory usage and recommendation presented in the result is measured in Gigabytes. The CPU usage and recommendation is measured in CPU cores."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
"\n",
"environment = Environment('my-sklearn-environment')\n",
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
" 'azureml-defaults',\n",
" 'inference-schema[numpy-support]',\n",
" 'joblib',\n",
" 'numpy',\n",
" 'scikit-learn'\n",
"])\n",
"inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n",
"# if cpu and memory_in_gb parameters are not provided\n",
"# the model will be profiled on default configuration of\n",
"# 3.5CPU and 15GB memory\n",
"profile = Model.profile(ws,\n",
" 'rgrsn-%s' % datetime.now().strftime('%m%d%Y-%H%M%S'),\n",
" [model],\n",
" inference_config,\n",
" input_dataset=sample_request_data,\n",
" cpu=1.0,\n",
" memory_in_gb=0.5)\n",
"\n",
"profile.wait_for_completion(True)\n",
"details = profile.get_details()"
] ]
}, },
{ {
@@ -404,7 +503,7 @@
"\n", "\n",
" - To run a production-ready web service, see the [notebook on deployment to Azure Kubernetes Service](../production-deploy-to-aks/production-deploy-to-aks.ipynb).\n", " - To run a production-ready web service, see the [notebook on deployment to Azure Kubernetes Service](../production-deploy-to-aks/production-deploy-to-aks.ipynb).\n",
" - To run a local web service, see the [notebook on deployment to a local Docker container](../deploy-to-local/register-model-deploy-local.ipynb).\n", " - To run a local web service, see the [notebook on deployment to a local Docker container](../deploy-to-local/register-model-deploy-local.ipynb).\n",
" - For more information on datasets, see the [notebook on training with datasets](../../work-with-data/datasets-tutorial/train-with-datasets.ipynb).\n", " - For more information on datasets, see the [notebook on training with datasets](../../work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.ipynb).\n",
" - For more information on environments, see the [notebook on using environments](../../training/using-environments/using-environments.ipynb).\n", " - For more information on environments, see the [notebook on using environments](../../training/using-environments/using-environments.ipynb).\n",
" - For information on all the available deployment targets, see [&ldquo;How and where to deploy models&rdquo;](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#choose-a-compute-target)." " - For information on all the available deployment targets, see [&ldquo;How and where to deploy models&rdquo;](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#choose-a-compute-target)."
] ]

View File

@@ -96,7 +96,8 @@
"execution_count": null, "execution_count": null,
"metadata": { "metadata": {
"tags": [ "tags": [
"register model from file" "register model from file",
"sample-model-register"
] ]
}, },
"outputs": [], "outputs": [],
@@ -188,6 +189,15 @@
" return error" " return error"
] ]
}, },
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency for your environemnt. This package contains the functionality needed to host the model as a web service."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -205,16 +215,6 @@
" - inference-schema[numpy-support]" " - inference-schema[numpy-support]"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile C:/abc/dockerstep/customDockerStep.txt\n",
"RUN echo \"this is test\""
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -239,11 +239,10 @@
"source": [ "source": [
"## Create Inference Configuration\n", "## Create Inference Configuration\n",
"\n", "\n",
" - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", " - file_path: input parameter to Environment constructor. Manages conda and python package dependencies.\n",
" - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python\n", " - env.docker.base_dockerfile: any extra steps you want to inject into docker file\n",
" - entry_script = contains logic specific to initializing your model and running predictions\n", " - source_directory: holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n",
" - conda_file = manages conda and python package dependencies.\n", " - entry_script: contains logic specific to initializing your model and running predictions"
" - extra_docker_file_steps = optional: any extra steps you want to inject into docker file"
] ]
}, },
{ {
@@ -252,13 +251,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.environment import Environment\n",
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"\n",
"myenv = Environment.from_conda_specification(name='myenv', file_path='env/myenv.yml')\n",
"\n",
"# explicitly set base_image to None when setting base_dockerfile\n",
"myenv.docker.base_image = None\n",
"myenv.docker.base_dockerfile = \"RUN echo \\\"this is test\\\"\"\n",
"\n",
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n", "inference_config = InferenceConfig(source_directory=\"C:/abc\",\n",
" runtime=\"python\", \n",
" entry_script=\"x/y/score.py\",\n", " entry_script=\"x/y/score.py\",\n",
" conda_file=\"env/myenv.yml\", \n", " environment=myenv)\n"
" extra_docker_file_steps=\"dockerstep/customDockerStep.txt\")"
] ]
}, },
{ {

View File

@@ -145,6 +145,110 @@
" environment=environment)" " environment=environment)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Profiling\n",
"\n",
"Profile your model to understand how much CPU and memory the service, created as a result of its deployment, will need. Profiling returns information such as CPU usage, memory usage, and response latency. It also provides a CPU and memory recommendation based on the resource usage. You can profile your model (or more precisely the service built based on your model) on any CPU and/or memory combination where 0.1 <= CPU <= 3.5 and 0.1GB <= memory <= 15GB. If you do not provide a CPU and/or memory requirement, we will test it on the default configuration of 3.5 CPU and 15GB memory.\n",
"\n",
"In order to profile your model you will need:\n",
"- a registered model\n",
"- an entry script\n",
"- an inference configuration\n",
"- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n",
"\n",
"At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n",
"\n",
"Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from azureml.core import Datastore\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.data import dataset_type_definitions\n",
"\n",
"\n",
"# create a string that can be put in the body of the request\n",
"serialized_input_json = json.dumps({\n",
" 'data': [\n",
" [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n",
" [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]\n",
" ]\n",
"})\n",
"dataset_content = []\n",
"for i in range(100):\n",
" dataset_content.append(serialized_input_json)\n",
"dataset_content = '\\n'.join(dataset_content)\n",
"file_name = 'sample_request_data_diabetes.txt'\n",
"f = open(file_name, 'w')\n",
"f.write(dataset_content)\n",
"f.close()\n",
"\n",
"# upload the txt file created above to the Datastore and create a dataset from it\n",
"data_store = Datastore.get_default(ws)\n",
"data_store.upload_files(['./' + file_name], target_path='sample_request_data_diabetes')\n",
"datastore_path = [(data_store, 'sample_request_data_diabetes' +'/' + file_name)]\n",
"sample_request_data_diabetes = Dataset.Tabular.from_delimited_files(\n",
" datastore_path,\n",
" separator='\\n',\n",
" infer_column_types=True,\n",
" header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n",
"sample_request_data_diabetes = sample_request_data_diabetes.register(workspace=ws,\n",
" name='sample_request_data_diabetes',\n",
" create_new_version=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have an input dataset we are ready to go ahead with profiling. In this case we are testing the previously introduced sklearn regression model on 1 CPU and 0.5 GB memory. The memory usage and recommendation presented in the result is measured in Gigabytes. The CPU usage and recommendation is measured in CPU cores."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.model import Model, InferenceConfig\n",
"\n",
"\n",
"environment = Environment('my-sklearn-environment')\n",
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
" 'azureml-defaults',\n",
" 'inference-schema[numpy-support]',\n",
" 'joblib',\n",
" 'numpy',\n",
" 'scikit-learn'\n",
"])\n",
"inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n",
"# if cpu and memory_in_gb parameters are not provided\n",
"# the model will be profiled on default configuration of\n",
"# 3.5CPU and 15GB memory\n",
"profile = Model.profile(ws,\n",
" 'profile-%s' % datetime.now().strftime('%m%d%Y-%H%M%S'),\n",
" [model],\n",
" inference_config,\n",
" input_dataset=sample_request_data_diabetes,\n",
" cpu=1.0,\n",
" memory_in_gb=0.5)\n",
"\n",
"profile.wait_for_completion(True)\n",
"details = profile.get_details()"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -166,7 +270,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"sample-localwebservice-deploy"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.webservice import LocalWebservice\n", "from azureml.core.webservice import LocalWebservice\n",
@@ -341,9 +449,11 @@
], ],
"category": "tutorial", "category": "tutorial",
"compute": [ "compute": [
"local" "Local"
],
"datasets": [
"None"
], ],
"datasets": [],
"deployment": [ "deployment": [
"Local" "Local"
], ],

View File

@@ -0,0 +1,369 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy models to Azure Kubernetes Service (AKS) using controlled roll out\n",
"This notebook will show you how to deploy mulitple AKS webservices with the same scoring endpoint and how to roll out your models in a controlled manner by configuring % of scoring traffic going to each webservice. If you are using a Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to install the Azure Machine Learning Python SDK and create an Azure ML Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check for latest version\n",
"import azureml.core\n",
"print(azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize workspace\n",
"Create a [Workspace](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace%28class%29?view=azure-ml-py) object from your persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.workspace import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register the model\n",
"Register a file or folder as a model by calling [Model.register()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#register-workspace--model-path--model-name--tags-none--properties-none--description-none--datasets-none--model-framework-none--model-framework-version-none--child-paths-none-).\n",
"In addition to the content of the model file itself, your registered model will also store model metadata -- model description, tags, and framework information -- that will be useful when managing and deploying models in your workspace. Using tags, for instance, you can categorize your models and apply filters when listing models in your workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Model\n",
"\n",
"model = Model.register(workspace=ws,\n",
" model_name='sklearn_regression_model.pkl', # Name of the registered model in your workspace.\n",
" model_path='./sklearn_regression_model.pkl', # Local file to upload and register as a model.\n",
" model_framework=Model.Framework.SCIKITLEARN, # Framework used to create the model.\n",
" model_framework_version='0.19.1', # Version of scikit-learn used to create the model.\n",
" description='Ridge regression model to predict diabetes progression.',\n",
" tags={'area': 'diabetes', 'type': 'regression'})\n",
"\n",
"print('Name:', model.name)\n",
"print('Version:', model.version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register an environment (for all models)\n",
"\n",
"If you control over how your model is run, or if it has special runtime requirements, you can specify your own environment and scoring method.\n",
"\n",
"Specify the model's runtime environment by creating an [Environment](https://docs.microsoft.com/python/api/azureml-core/azureml.core.environment%28class%29?view=azure-ml-py) object and providing the [CondaDependencies](https://docs.microsoft.com/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) needed by your model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"environment=Environment('my-sklearn-environment')\n",
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
" 'azureml-defaults',\n",
" 'inference-schema[numpy-support]',\n",
" 'joblib',\n",
" 'numpy',\n",
" 'scikit-learn'\n",
"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using a custom environment, you must also provide Python code for initializing and running your model. An example script is included with this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open('score.py') as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the InferenceConfig\n",
"Create the inference configuration to reference your environment and entry script during deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"\n",
"inference_config = InferenceConfig(entry_script='score.py', \n",
" source_directory='.',\n",
" environment=environment)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Provision the AKS Cluster\n",
"If you already have an AKS cluster attached to this workspace, skip the step below and provide the name of the cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AksCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"# Use the default configuration (can also provide parameters to customize)\n",
"prov_config = AksCompute.provisioning_configuration()\n",
"\n",
"aks_name = 'my-aks' \n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config) \n",
"aks_target.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Endpoint and add a version (AKS service)\n",
"This creates a new endpoint and adds a version behind it. By default the first version added is the default version. You can specify the traffic percentile a version takes behind an endpoint. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# deploying the model and create a new endpoint\n",
"from azureml.core.webservice import AksEndpoint\n",
"# from azureml.core.compute import ComputeTarget\n",
"\n",
"#select a created compute\n",
"compute = ComputeTarget(ws, 'my-aks')\n",
"namespace_name=\"endpointnamespace\"\n",
"# define the endpoint name\n",
"endpoint_name = \"myendpoint1\"\n",
"# define the service name\n",
"version_name= \"versiona\"\n",
"\n",
"endpoint_deployment_config = AksEndpoint.deploy_configuration(tags = {'modelVersion':'firstversion', 'department':'finance'}, \n",
" description = \"my first version\", namespace = namespace_name, \n",
" version_name = version_name, traffic_percentile = 40)\n",
"\n",
"endpoint = Model.deploy(ws, endpoint_name, [model], inference_config, endpoint_deployment_config, compute)\n",
"endpoint.wait_for_deployment(True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"endpoint.get_logs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add another version of the service to an existing endpoint\n",
"This adds another version behind an existing endpoint. You can specify the traffic percentile the new version takes. If no traffic_percentile is specified then it defaults to 0. All the unspecified traffic percentile (in this example 50) across all versions goes to default version."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Adding a new version to an existing Endpoint.\n",
"version_name_add=\"versionb\" \n",
"\n",
"endpoint.create_version(version_name = version_name_add, inference_config=inference_config, models=[model], tags = {'modelVersion':'secondversion', 'department':'finance'}, \n",
" description = \"my second version\", traffic_percentile = 10)\n",
"endpoint.wait_for_deployment(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Update an existing version in an endpoint\n",
"There are two types of versions: control and treatment. An endpoint contains one or more treatment versions but only one control version. This categorization helps compare the different versions against the defined control version."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"endpoint.update_version(version_name=endpoint.versions[version_name_add].name, description=\"my second version update\", traffic_percentile=40, is_default=True, is_control_version_type=True)\n",
"endpoint.wait_for_deployment(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test the web service using run method\n",
"Test the web sevice by passing in data. Run() method retrieves API keys behind the scenes to make sure that call is authenticated."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Scoring on endpoint\n",
"import json\n",
"test_sample = json.dumps({'data': [\n",
" [1,2,3,4,5,6,7,8,9,10], \n",
" [10,9,8,7,6,5,4,3,2,1]\n",
"]})\n",
"\n",
"test_sample_encoded = bytes(test_sample, encoding='utf8')\n",
"prediction = endpoint.run(input_data=test_sample_encoded)\n",
"print(prediction)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete Resources"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# deleting a version in an endpoint\n",
"endpoint.delete_version(version_name=version_name)\n",
"endpoint.wait_for_deployment(True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# deleting an endpoint, this will delete all versions in the endpoint and the endpoint itself\n",
"endpoint.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "shipatel"
}
],
"category": "deployment",
"compute": [
"None"
],
"datasets": [
"Diabetes"
],
"deployment": [
"Azure Kubernetes Service"
],
"exclude_from_index": false,
"framework": [
"Scikit-learn"
],
"friendly_name": "Deploy models to AKS using controlled roll out",
"index_order": 3,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
},
"star_tag": [
"featured"
],
"tags": [
"None"
],
"task": "Deploy a model with Azure Machine Learning"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,4 @@
name: deploy-aks-with-controlled-rollout
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,28 @@
import pickle
import json
import numpy
from sklearn.externals import joblib
from sklearn.linear_model import Ridge
from azureml.core.model import Model
def init():
global model
# note here "sklearn_regression_model.pkl" is the name of the model registered under
# this is a different behavior than before when the code is run locally, even though the code is the same.
model_path = Model.get_model_path('sklearn_regression_model.pkl')
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
# note you can pass in multiple rows for scoring
def run(raw_data):
try:
data = json.loads(raw_data)['data']
data = numpy.array(data)
result = model.predict(data)
# you can return any data type as long as it is JSON-serializable
return result.tolist()
except Exception as e:
error = str(e)
return error

View File

@@ -158,7 +158,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 5. *Create myenv.yml file*" "## 5. *Create myenv.yml file*\n",
"Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
] ]
}, },
{ {
@@ -169,7 +170,8 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'],\n",
" pip_packages=['azureml-defaults'])\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())" " f.write(myenv.serialize_to_string())"
@@ -189,10 +191,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= \"python\", \n", "\n",
" entry_script=\"score.py\",\n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
" conda_file=\"myenv.yml\")" "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
] ]
}, },
{ {
@@ -431,7 +434,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"aks_service.update(enable_app_insights=False)" "aks_service.update(enable_app_insights=False)\n",
"aks_service.wait_for_deployment(show_output = True)"
] ]
}, },
{ {

View File

@@ -244,7 +244,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Setting up inference configuration\n", "### Setting up inference configuration\n",
"First we create a YAML file that specifies which dependencies we would like to see in our container." "First we create a YAML file that specifies which dependencies we would like to see in our container. Please note that you must include azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
] ]
}, },
{ {
@@ -255,7 +255,7 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime==0.4.0\",\"azureml-core\"])\n", "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime==0.4.0\", \"azureml-core\", \"azureml-defaults\"])\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())" " f.write(myenv.serialize_to_string())"
@@ -275,11 +275,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= \"python\", \n", "\n",
" entry_script=\"score.py\",\n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
" conda_file=\"myenv.yml\",\n", "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
" extra_docker_file_steps = \"Dockerfile\")"
] ]
}, },
{ {
@@ -316,9 +316,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.webservice import Webservice\n",
"from random import randint\n",
"\n",
"aci_service_name = 'my-aci-service-15ad'\n", "aci_service_name = 'my-aci-service-15ad'\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
@@ -376,7 +373,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#aci_service.delete()" "aci_service.delete()"
] ]
} }
], ],
@@ -386,6 +383,22 @@
"name": "viswamy" "name": "viswamy"
} }
], ],
"category": "deployment",
"compute": [
"local"
],
"datasets": [
"PASCAL VOC"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"ONNX"
],
"friendly_name": "Convert and deploy TinyYolo with ONNX Runtime",
"index_order": 5,
"kernelspec": { "kernelspec": {
"display_name": "Python 3.6", "display_name": "Python 3.6",
"language": "python", "language": "python",
@@ -402,7 +415,14 @@
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.5" "version": "3.6.5"
} },
"star_tag": [
"featured"
],
"tags": [
"ONNX Converter"
],
"task": "Object Detection"
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2

View File

@@ -2,5 +2,6 @@ name: onnx-convert-aml-deploy-tinyyolo
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- numpy
- git+https://github.com/apple/coremltools@v2.1 - git+https://github.com/apple/coremltools@v2.1
- onnxmltools==1.3.1 - onnxmltools==1.3.1

View File

@@ -319,7 +319,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Write Environment File" "### Write Environment File\n",
"Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
] ]
}, },
{ {
@@ -330,7 +331,8 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())" " f.write(myenv.serialize_to_string())"
@@ -350,11 +352,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= \"python\", \n", "\n",
" entry_script=\"score.py\",\n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
" conda_file=\"myenv.yml\",\n", "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
" extra_docker_file_steps = \"Dockerfile\")"
] ]
}, },
{ {
@@ -391,8 +393,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.webservice import Webservice\n",
"\n",
"aci_service_name = 'onnx-demo-emotion'\n", "aci_service_name = 'onnx-demo-emotion'\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
@@ -726,7 +726,7 @@
"source": [ "source": [
"# remember to delete your service after you are done using it!\n", "# remember to delete your service after you are done using it!\n",
"\n", "\n",
"# aci_service.delete()" "aci_service.delete()"
] ]
}, },
{ {
@@ -755,6 +755,22 @@
"name": "viswamy" "name": "viswamy"
} }
], ],
"category": "deployment",
"compute": [
"Local"
],
"datasets": [
"Emotion FER"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"ONNX"
],
"friendly_name": "Deploy Facial Expression Recognition (FER+) with ONNX Runtime",
"index_order": 2,
"kernelspec": { "kernelspec": {
"display_name": "Python 3.6", "display_name": "Python 3.6",
"language": "python", "language": "python",
@@ -772,7 +788,12 @@
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.5" "version": "3.6.5"
}, },
"msauthor": "vinitra.swamy" "msauthor": "vinitra.swamy",
"star_tag": [],
"tags": [
"ONNX Model Zoo"
],
"task": "Facial Expression Recognition"
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2

View File

@@ -306,7 +306,7 @@
"source": [ "source": [
"### Write Environment File\n", "### Write Environment File\n",
"\n", "\n",
"This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine." "This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
] ]
}, },
{ {
@@ -317,7 +317,7 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n", "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())" " f.write(myenv.serialize_to_string())"
@@ -337,11 +337,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= \"python\", \n", "\n",
" entry_script=\"score.py\",\n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
" extra_docker_file_steps = \"Dockerfile\",\n", "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
" conda_file=\"myenv.yml\")"
] ]
}, },
{ {
@@ -378,8 +378,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.webservice import Webservice\n",
"\n",
"aci_service_name = 'onnx-demo-mnist'\n", "aci_service_name = 'onnx-demo-mnist'\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
@@ -735,7 +733,7 @@
"source": [ "source": [
"# remember to delete your service after you are done using it!\n", "# remember to delete your service after you are done using it!\n",
"\n", "\n",
"# aci_service.delete()" "aci_service.delete()"
] ]
}, },
{ {
@@ -763,6 +761,22 @@
"name": "viswamy" "name": "viswamy"
} }
], ],
"category": "deployment",
"compute": [
"Local"
],
"datasets": [
"MNIST"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"ONNX"
],
"friendly_name": "Deploy MNIST digit recognition with ONNX Runtime",
"index_order": 1,
"kernelspec": { "kernelspec": {
"display_name": "Python 3.6", "display_name": "Python 3.6",
"language": "python", "language": "python",
@@ -780,7 +794,12 @@
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.5" "version": "3.6.5"
}, },
"msauthor": "vinitra.swamy" "msauthor": "vinitra.swamy",
"star_tag": [],
"tags": [
"ONNX Model Zoo"
],
"task": "Image Classification"
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2

View File

@@ -241,7 +241,8 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())" " f.write(myenv.serialize_to_string())"
@@ -251,7 +252,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Create the inference configuration object" "Create the inference configuration object. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
] ]
}, },
{ {
@@ -261,11 +262,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= \"python\", \n", "\n",
" entry_script=\"score.py\",\n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
" conda_file=\"myenv.yml\",\n", "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
" extra_docker_file_steps = \"Dockerfile\")"
] ]
}, },
{ {
@@ -302,7 +303,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.webservice import Webservice\n",
"from random import randint\n", "from random import randint\n",
"\n", "\n",
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n", "aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
@@ -362,7 +362,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#aci_service.delete()" "aci_service.delete()"
] ]
} }
], ],
@@ -372,6 +372,22 @@
"name": "viswamy" "name": "viswamy"
} }
], ],
"category": "deployment",
"compute": [
"Local"
],
"datasets": [
"ImageNet"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"ONNX"
],
"friendly_name": "Deploy ResNet50 with ONNX Runtime",
"index_order": 4,
"kernelspec": { "kernelspec": {
"display_name": "Python 3.6", "display_name": "Python 3.6",
"language": "python", "language": "python",
@@ -388,7 +404,12 @@
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.5" "version": "3.6.5"
} },
"star_tag": [],
"tags": [
"ONNX Model Zoo"
],
"task": "Image Classification"
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2

View File

@@ -405,7 +405,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create inference configuration\n", "### Create inference configuration\n",
"First we create a YAML file that specifies which dependencies we would like to see in our container." "First we create a YAML file that specifies which dependencies we would like to see in our container. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
] ]
}, },
{ {
@@ -416,7 +416,7 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\", \"azureml-defaults\"])\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())" " f.write(myenv.serialize_to_string())"
@@ -436,11 +436,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= \"python\", \n", "\n",
" entry_script=\"score.py\",\n", "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
" conda_file=\"myenv.yml\",\n", "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
" extra_docker_file_steps = \"Dockerfile\")"
] ]
}, },
{ {
@@ -477,7 +477,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import Model\n", "from azureml.core.model import Model\n",
"from random import randint\n", "from random import randint\n",
"\n", "\n",
@@ -538,7 +537,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#aci_service.delete()" "aci_service.delete()"
] ]
} }
], ],
@@ -548,6 +547,22 @@
"name": "viswamy" "name": "viswamy"
} }
], ],
"category": "deployment",
"compute": [
"AML Compute"
],
"datasets": [
"MNIST"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"ONNX"
],
"friendly_name": "Train MNIST in PyTorch, convert, and deploy with ONNX Runtime",
"index_order": 3,
"kernelspec": { "kernelspec": {
"display_name": "Python 3.6", "display_name": "Python 3.6",
"language": "python", "language": "python",
@@ -565,6 +580,11 @@
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.6" "version": "3.6.6"
}, },
"star_tag": [],
"tags": [
"ONNX Converter"
],
"task": "Image Classification",
"widgets": { "widgets": {
"application/vnd.jupyter.widget-state+json": { "application/vnd.jupyter.widget-state+json": {
"state": { "state": {

View File

@@ -0,0 +1,314 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploying a web service to Azure Kubernetes Service (AKS)\n",
"This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n",
"We then test and delete the service, image and model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"print(azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get workspace\n",
"Load existing workspace from the config file info."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.workspace import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Register the model\n",
"Register an existing trained model, add descirption and tags. Prior to registering the model, you should have a TensorFlow [Saved Model](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md) in the `resnet50` directory. You can download a [pretrained resnet50](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NCHW_jpg.tar.gz) and unpack it to that directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Register the model\n",
"from azureml.core.model import Model\n",
"model = Model.register(model_path = \"resnet50\", # this points to a local file\n",
" model_name = \"resnet50\", # this is the name the model is registered as\n",
" tags = {'area': \"Image classification\", 'type': \"classification\"},\n",
" description = \"Image classification trained on Imagenet Dataset\",\n",
" workspace = ws)\n",
"\n",
"print(model.name, model.description, model.version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Provision the AKS Cluster\n",
"This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AksCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your GPU cluster\n",
"gpu_cluster_name = \"aks-gpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
" print(\"Found existing gpu cluster\")\n",
"except ComputeTargetException:\n",
" print(\"Creating new gpu-cluster\")\n",
" \n",
" # Specify the configuration for the new cluster\n",
" compute_config = AksCompute.provisioning_configuration(cluster_purpose=AksCompute.ClusterPurpose.DEV_TEST,\n",
" agent_count=1,\n",
" vm_size=\"Standard_NV6\")\n",
" # Create the cluster with the specified name and configuration\n",
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
"\n",
" # Wait for the cluster to complete, show the output log\n",
" gpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy the model as a web service to AKS\n",
"\n",
"First create a scoring script"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"import json\n",
"import os\n",
"from azureml.contrib.services.aml_request import AMLRequest, rawhttp\n",
"from azureml.contrib.services.aml_response import AMLResponse\n",
"\n",
"def init():\n",
" global session\n",
" global input_name\n",
" global output_name\n",
" \n",
" session = tf.Session()\n",
"\n",
" # AZUREML_MODEL_DIR is an environment variable created during deployment.\n",
" # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n",
" # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n",
" model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'resnet50')\n",
" model = tf.saved_model.loader.load(session, ['serve'], model_path)\n",
" if len(model.signature_def['serving_default'].inputs) > 1:\n",
" raise ValueError(\"This score.py only supports one input\")\n",
" input_name = [tensor.name for tensor in model.signature_def['serving_default'].inputs.values()][0]\n",
" output_name = [tensor.name for tensor in model.signature_def['serving_default'].outputs.values()]\n",
" \n",
"\n",
"@rawhttp\n",
"def run(request):\n",
" if request.method == 'POST':\n",
" reqBody = request.get_data(False)\n",
" resp = score(reqBody)\n",
" return AMLResponse(resp, 200)\n",
" if request.method == 'GET':\n",
" respBody = str.encode(\"GET is not supported\")\n",
" return AMLResponse(respBody, 405)\n",
" return AMLResponse(\"bad request\", 500)\n",
"\n",
"def score(data):\n",
" result = session.run(output_name, {input_name: [data]})\n",
" return json.dumps(result[1].tolist())\n",
"\n",
"if __name__ == \"__main__\":\n",
" init()\n",
" with open(\"test_image.jpg\", 'rb') as f:\n",
" content = f.read()\n",
" print(score(content))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now create the deployment configuration objects and deploy the model as a webservice."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set the web service configuration (using default here)\n",
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AksWebservice\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.environment import Environment, DEFAULT_GPU_IMAGE\n",
"\n",
"env = Environment('deploytocloudenv')\n",
"# Please see [Azure ML Containers repository](https://github.com/Azure/AzureML-Containers#featured-tags)\n",
"# for open-sourced GPU base images.\n",
"env.docker.base_image = DEFAULT_GPU_IMAGE\n",
"env.python.conda_dependencies = CondaDependencies.create(conda_packages=['tensorflow-gpu==1.12.0','numpy'],\n",
" pip_packages=['azureml-contrib-services', 'azureml-defaults'])\n",
"\n",
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)\n",
"aks_config = AksWebservice.deploy_configuration()\n",
"\n",
"# # Enable token auth and disable (key) auth on the webservice\n",
"# aks_config = AksWebservice.deploy_configuration(token_auth_enabled=True, auth_enabled=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_service_name ='gpu-rn50'\n",
"\n",
"aks_service = Model.deploy(workspace=ws,\n",
" name=aks_service_name,\n",
" models=[model],\n",
" inference_config=inference_config,\n",
" deployment_config=aks_config,\n",
" deployment_target=gpu_cluster)\n",
"\n",
"aks_service.wait_for_deployment(show_output = True)\n",
"print(aks_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Test the web service\n",
"We test the web sevice by passing the test images content."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"import requests\n",
"\n",
"# if (key) auth is enabled, fetch keys and include in the request\n",
"key1, key2 = aks_service.get_keys()\n",
"\n",
"headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
"\n",
"# # if token auth is enabled, fetch token and include in the request\n",
"# access_token, fetch_after = aks_service.get_token()\n",
"# headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + access_token}\n",
"\n",
"test_sample = open('snowleopardgaze.jpg', 'rb').read()\n",
"resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clean up\n",
"Delete the service, image, model and compute target"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_service.delete()\n",
"model.delete()\n",
"gpu_cluster.delete()\n"
]
}
],
"metadata": {
"authors": [
{
"name": "aashishb"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,5 @@
name: production-deploy-to-aks-gpu
dependencies:
- pip:
- azureml-sdk
- tensorflow

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

View File

@@ -198,6 +198,106 @@
"inf_config = InferenceConfig(entry_script='score.py', environment=myenv)" "inf_config = InferenceConfig(entry_script='score.py', environment=myenv)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Profiling\n",
"\n",
"Profile your model to understand how much CPU and memory the service, created as a result of its deployment, will need. Profiling returns information such as CPU usage, memory usage, and response latency. It also provides a CPU and memory recommendation based on the resource usage. You can profile your model (or more precisely the service built based on your model) on any CPU and/or memory combination where 0.1 <= CPU <= 3.5 and 0.1GB <= memory <= 15GB. If you do not provide a CPU and/or memory requirement, we will test it on the default configuration of 3.5 CPU and 15GB memory.\n",
"\n",
"In order to profile your model you will need:\n",
"- a registered model\n",
"- an entry script\n",
"- an inference configuration\n",
"- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n",
"\n",
"At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n",
"\n",
"Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from azureml.core import Datastore\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.data import dataset_type_definitions\n",
"\n",
"input_json = {'data': [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n",
" [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]}\n",
"# create a string that can be put in the body of the request\n",
"serialized_input_json = json.dumps(input_json)\n",
"dataset_content = []\n",
"for i in range(100):\n",
" dataset_content.append(serialized_input_json)\n",
"sample_request_data = '\\n'.join(dataset_content)\n",
"file_name = 'sample_request_data.txt'\n",
"f = open(file_name, 'w')\n",
"f.write(sample_request_data)\n",
"f.close()\n",
"\n",
"# upload the txt file created above to the Datastore and create a dataset from it\n",
"data_store = Datastore.get_default(ws)\n",
"data_store.upload_files(['./' + file_name], target_path='sample_request_data')\n",
"datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]\n",
"sample_request_data = Dataset.Tabular.from_delimited_files(\n",
" datastore_path,\n",
" separator='\\n',\n",
" infer_column_types=True,\n",
" header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n",
"sample_request_data = sample_request_data.register(workspace=ws,\n",
" name='sample_request_data',\n",
" create_new_version=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have an input dataset we are ready to go ahead with profiling. In this case we are testing the previously introduced sklearn regression model on 1 CPU and 0.5 GB memory. The memory usage and recommendation presented in the result is measured in Gigabytes. The CPU usage and recommendation is measured in CPU cores."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.model import Model, InferenceConfig\n",
"\n",
"\n",
"environment = Environment('my-sklearn-environment')\n",
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
" 'azureml-defaults',\n",
" 'inference-schema[numpy-support]',\n",
" 'joblib',\n",
" 'numpy',\n",
" 'scikit-learn'\n",
"])\n",
"inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n",
"# if cpu and memory_in_gb parameters are not provided\n",
"# the model will be profiled on default configuration of\n",
"# 3.5CPU and 15GB memory\n",
"profile = Model.profile(ws,\n",
" 'sklearn-%s' % datetime.now().strftime('%m%d%Y-%H%M%S'),\n",
" [model],\n",
" inference_config,\n",
" input_dataset=sample_request_data,\n",
" cpu=1.0,\n",
" memory_in_gb=0.5)\n",
"\n",
"profile.wait_for_completion(True)\n",
"details = profile.get_details()"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -318,7 +418,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"sample-deploy-to-aks"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Set the web service configuration (using default here)\n", "# Set the web service configuration (using default here)\n",
@@ -331,7 +435,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"sample-deploy-to-aks"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"%%time\n", "%%time\n",

Some files were not shown because too many files have changed in this diff Show More