mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 09:37:04 -05:00
Compare commits
75 Commits
azureml-sd
...
azureml-sd
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9662505517 | ||
|
|
8e103c02ff | ||
|
|
ecb5157add | ||
|
|
d7d23d5e7c | ||
|
|
83a21ba53a | ||
|
|
3c9cb89c1a | ||
|
|
cca7c2e26f | ||
|
|
e895d7c2bf | ||
|
|
3588eb9665 | ||
|
|
a09e726f31 | ||
|
|
4fb1d9ee5b | ||
|
|
b05ff80e9d | ||
|
|
512630472b | ||
|
|
ae1337fe70 | ||
|
|
c95f970dc8 | ||
|
|
9b9d112719 | ||
|
|
fe8fcd4b48 | ||
|
|
296ae01587 | ||
|
|
8f4efe15eb | ||
|
|
d179080467 | ||
|
|
0040644e7a | ||
|
|
8aa04307fb | ||
|
|
a525da4488 | ||
|
|
e149565a8a | ||
|
|
75610ec31c | ||
|
|
0c2c450b6b | ||
|
|
0d548eabff | ||
|
|
e4029801e6 | ||
|
|
156974ee7b | ||
|
|
1f05157d24 | ||
|
|
2214ea8616 | ||
|
|
b54b2566de | ||
|
|
57b0f701f8 | ||
|
|
d658c85208 | ||
|
|
a5f627a9b6 | ||
|
|
a8b08bdff0 | ||
|
|
0dc3f34b86 | ||
|
|
9ba7d5e5bb | ||
|
|
c6ad2f8ec0 | ||
|
|
33d6def8c3 | ||
|
|
69d4344dff | ||
|
|
34aeec1439 | ||
|
|
a9b9ebbf7d | ||
|
|
41fa508d53 | ||
|
|
e1bfa98844 | ||
|
|
2bcee9aa20 | ||
|
|
37541b1071 | ||
|
|
4aff1310a7 | ||
|
|
51ecb7c54f | ||
|
|
4e7fc7c82c | ||
|
|
7db93bcb1d | ||
|
|
fcbe925640 | ||
|
|
bedfbd649e | ||
|
|
fb760f648d | ||
|
|
a9a0713d2f | ||
|
|
c9d018b52c | ||
|
|
53dbd0afcf | ||
|
|
e3a64b1f16 | ||
|
|
732eecfc7c | ||
|
|
6995c086ff | ||
|
|
80bba4c7ae | ||
|
|
3c581b533f | ||
|
|
cc688caa4e | ||
|
|
da225e116e | ||
|
|
73c5d02880 | ||
|
|
e472b54f1b | ||
|
|
716c6d8bb1 | ||
|
|
23189c6f40 | ||
|
|
361b57ed29 | ||
|
|
3f531fd211 | ||
|
|
111f5e8d73 | ||
|
|
96c59d5c2b | ||
|
|
ce3214b7c6 | ||
|
|
53199d17de | ||
|
|
54c883412c |
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
|
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
## Quick installation
|
## Quick installation
|
||||||
@@ -17,11 +17,11 @@ This [index](.index.md) should assist in navigating the Azure Machine Learning n
|
|||||||
|
|
||||||
If you want to...
|
If you want to...
|
||||||
|
|
||||||
* ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/img-classification-part2-deploy.ipynb).
|
* ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/image-classification-mnist-data/img-classification-part2-deploy.ipynb).
|
||||||
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
|
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
|
||||||
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
|
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
|
||||||
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
|
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
|
||||||
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
|
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
|
||||||
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb).
|
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb).
|
||||||
|
|
||||||
## Tutorials
|
## Tutorials
|
||||||
|
|||||||
@@ -103,7 +103,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"This notebook was created using version 1.0.72 of the Azure ML SDK\")\n",
|
"print(\"This notebook was created using version 1.1.0rc0 of the Azure ML SDK\")\n",
|
||||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -9,7 +9,6 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not
|
|||||||
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
|
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
|
||||||
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
|
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
|
||||||
* [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history.
|
* [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history.
|
||||||
* [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management.
|
|
||||||
* [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service.
|
* [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service.
|
||||||
* [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service.
|
* [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service.
|
||||||
|
|
||||||
|
|||||||
@@ -154,6 +154,12 @@ jupyter notebook
|
|||||||
- [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb)
|
- [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb)
|
||||||
- Continous retraining using Pipelines and Time-Series TabularDataset
|
- Continous retraining using Pipelines and Time-Series TabularDataset
|
||||||
|
|
||||||
|
- [auto-ml-classification-text-dnn.ipynb](classification-text-dnn/auto-ml-classification-text-dnn.ipynb)
|
||||||
|
- Classification with text data using deep learning in AutoML
|
||||||
|
- AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data.
|
||||||
|
- Depending on the compute cluster the user provides, AutoML tried out Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used.
|
||||||
|
- Bidirectional Long-Short Term neural network (BiLSTM) when a CPU compute is used, thereby optimizing the choice of DNN for the uesr's setup.
|
||||||
|
|
||||||
<a name="documentation"></a>
|
<a name="documentation"></a>
|
||||||
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
|
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
|
||||||
|
|
||||||
@@ -191,6 +197,17 @@ If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execut
|
|||||||
4) Check that the region is one of the supported regions: `eastus2`, `eastus`, `westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`
|
4) Check that the region is one of the supported regions: `eastus2`, `eastus`, `westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`
|
||||||
5) Check that you have access to the region using the Azure Portal.
|
5) Check that you have access to the region using the Azure Portal.
|
||||||
|
|
||||||
|
## import AutoMLConfig fails after upgrade from before 1.0.76 to 1.0.76 or later
|
||||||
|
There were package changes in automated machine learning version 1.0.76, which require the previous version to be uninstalled before upgrading to the new version.
|
||||||
|
If you have manually upgraded from a version of automated machine learning before 1.0.76 to 1.0.76 or later, you may get the error:
|
||||||
|
`ImportError: cannot import name 'AutoMLConfig'`
|
||||||
|
|
||||||
|
This can be resolved by running:
|
||||||
|
`pip uninstall azureml-train-automl` and then
|
||||||
|
`pip install azureml-train-automl`
|
||||||
|
|
||||||
|
The automl_setup.cmd script does this automatically.
|
||||||
|
|
||||||
## workspace.from_config fails
|
## workspace.from_config fails
|
||||||
If the call `ws = Workspace.from_config()` fails:
|
If the call `ws = Workspace.from_config()` fails:
|
||||||
1) Make sure that you have run the `configuration.ipynb` notebook successfully.
|
1) Make sure that you have run the `configuration.ipynb` notebook successfully.
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ name: azure_automl
|
|||||||
dependencies:
|
dependencies:
|
||||||
# The python interpreter version.
|
# The python interpreter version.
|
||||||
# Currently Azure ML only supports 3.5.2 and later.
|
# Currently Azure ML only supports 3.5.2 and later.
|
||||||
- pip
|
- pip<=19.3.1
|
||||||
- python>=3.5.2,<3.6.8
|
- python>=3.5.2,<3.6.8
|
||||||
- nb_conda
|
- nb_conda
|
||||||
- matplotlib==2.1.0
|
- matplotlib==2.1.0
|
||||||
@@ -13,8 +13,9 @@ dependencies:
|
|||||||
- scikit-learn>=0.19.0,<=0.20.3
|
- scikit-learn>=0.19.0,<=0.20.3
|
||||||
- pandas>=0.22.0,<=0.23.4
|
- pandas>=0.22.0,<=0.23.4
|
||||||
- py-xgboost<=0.80
|
- py-xgboost<=0.80
|
||||||
- pyarrow>=0.11.0
|
- fbprophet==0.5
|
||||||
- conda-forge::fbprophet==0.5
|
- pytorch=1.1.0
|
||||||
|
- cudatoolkit=9.0
|
||||||
|
|
||||||
- pip:
|
- pip:
|
||||||
# Required packages for AzureML execution, history, and data preparation.
|
# Required packages for AzureML execution, history, and data preparation.
|
||||||
@@ -23,6 +24,14 @@ dependencies:
|
|||||||
- azureml-train
|
- azureml-train
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- azureml-explain-model
|
- azureml-explain-model
|
||||||
|
- azureml-pipeline
|
||||||
- azureml-contrib-interpret
|
- azureml-contrib-interpret
|
||||||
- pandas_ml
|
- pytorch-transformers==1.0.0
|
||||||
|
- spacy==2.1.8
|
||||||
|
- joblib
|
||||||
|
- onnxruntime==1.0.0
|
||||||
|
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
|
||||||
|
|
||||||
|
channels:
|
||||||
|
- conda-forge
|
||||||
|
- pytorch
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ name: azure_automl
|
|||||||
dependencies:
|
dependencies:
|
||||||
# The python interpreter version.
|
# The python interpreter version.
|
||||||
# Currently Azure ML only supports 3.5.2 and later.
|
# Currently Azure ML only supports 3.5.2 and later.
|
||||||
- pip
|
- pip<=19.3.1
|
||||||
- nomkl
|
- nomkl
|
||||||
- python>=3.5.2,<3.6.8
|
- python>=3.5.2,<3.6.8
|
||||||
- nb_conda
|
- nb_conda
|
||||||
@@ -14,8 +14,9 @@ dependencies:
|
|||||||
- scikit-learn>=0.19.0,<=0.20.3
|
- scikit-learn>=0.19.0,<=0.20.3
|
||||||
- pandas>=0.22.0,<0.23.0
|
- pandas>=0.22.0,<0.23.0
|
||||||
- py-xgboost<=0.80
|
- py-xgboost<=0.80
|
||||||
- pyarrow>=0.11.0
|
- fbprophet==0.5
|
||||||
- conda-forge::fbprophet==0.5
|
- pytorch=1.1.0
|
||||||
|
- cudatoolkit=9.0
|
||||||
|
|
||||||
- pip:
|
- pip:
|
||||||
# Required packages for AzureML execution, history, and data preparation.
|
# Required packages for AzureML execution, history, and data preparation.
|
||||||
@@ -24,6 +25,14 @@ dependencies:
|
|||||||
- azureml-train
|
- azureml-train
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- azureml-explain-model
|
- azureml-explain-model
|
||||||
|
- azureml-pipeline
|
||||||
- azureml-contrib-interpret
|
- azureml-contrib-interpret
|
||||||
- pandas_ml
|
- pytorch-transformers==1.0.0
|
||||||
|
- spacy==2.1.8
|
||||||
|
- joblib
|
||||||
|
- onnxruntime==1.0.0
|
||||||
|
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
|
||||||
|
|
||||||
|
channels:
|
||||||
|
- conda-forge
|
||||||
|
- pytorch
|
||||||
|
|||||||
@@ -14,8 +14,9 @@ IF "%CONDA_EXE%"=="" GOTO CondaMissing
|
|||||||
call conda activate %conda_env_name% 2>nul:
|
call conda activate %conda_env_name% 2>nul:
|
||||||
|
|
||||||
if not errorlevel 1 (
|
if not errorlevel 1 (
|
||||||
echo Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment %conda_env_name%
|
echo Upgrading existing conda environment %conda_env_name%
|
||||||
call pip install --upgrade azureml-sdk[automl,notebooks,explain]
|
call pip uninstall azureml-train-automl -y -q
|
||||||
|
call conda env update --name %conda_env_name% --file %automl_env_file%
|
||||||
if errorlevel 1 goto ErrorExit
|
if errorlevel 1 goto ErrorExit
|
||||||
) else (
|
) else (
|
||||||
call conda env create -f %automl_env_file% -n %conda_env_name%
|
call conda env create -f %automl_env_file% -n %conda_env_name%
|
||||||
|
|||||||
@@ -22,8 +22,9 @@ fi
|
|||||||
|
|
||||||
if source activate $CONDA_ENV_NAME 2> /dev/null
|
if source activate $CONDA_ENV_NAME 2> /dev/null
|
||||||
then
|
then
|
||||||
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
echo "Upgrading existing conda environment" $CONDA_ENV_NAME
|
||||||
pip install --upgrade azureml-sdk[automl,notebooks,explain] &&
|
pip uninstall azureml-train-automl -y -q
|
||||||
|
conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
|
||||||
jupyter nbextension uninstall --user --py azureml.widgets
|
jupyter nbextension uninstall --user --py azureml.widgets
|
||||||
else
|
else
|
||||||
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
||||||
|
|||||||
@@ -22,8 +22,9 @@ fi
|
|||||||
|
|
||||||
if source activate $CONDA_ENV_NAME 2> /dev/null
|
if source activate $CONDA_ENV_NAME 2> /dev/null
|
||||||
then
|
then
|
||||||
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
echo "Upgrading existing conda environment" $CONDA_ENV_NAME
|
||||||
pip install --upgrade azureml-sdk[automl,notebooks,explain] &&
|
pip uninstall azureml-train-automl -y -q
|
||||||
|
conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
|
||||||
jupyter nbextension uninstall --user --py azureml.widgets
|
jupyter nbextension uninstall --user --py azureml.widgets
|
||||||
else
|
else
|
||||||
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
||||||
|
|||||||
@@ -92,6 +92,32 @@
|
|||||||
"from azureml.explain.model._internal.explanation_client import ExplanationClient"
|
"from azureml.explain.model._internal.explanation_client import ExplanationClient"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Accessing the Azure ML workspace requires authentication with Azure.\n",
|
||||||
|
"\n",
|
||||||
|
"The default authentication is interactive authentication using the default tenant. Executing the `ws = Workspace.from_config()` line in the cell below will prompt for authentication the first time that it is run.\n",
|
||||||
|
"\n",
|
||||||
|
"If you have multiple Azure tenants, you can specify the tenant by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
|
||||||
|
"\n",
|
||||||
|
"```\n",
|
||||||
|
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
|
||||||
|
"auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n",
|
||||||
|
"ws = Workspace.from_config(auth = auth)\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
|
||||||
|
"\n",
|
||||||
|
"```\n",
|
||||||
|
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
|
||||||
|
"auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n",
|
||||||
|
"ws = Workspace.from_config(auth = auth)\n",
|
||||||
|
"```\n",
|
||||||
|
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -285,9 +311,10 @@
|
|||||||
"|**task**|classification or regression or forecasting|\n",
|
"|**task**|classification or regression or forecasting|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
"|**blacklist_models** or **whitelist_models** |*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|\n",
|
"|**blacklist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run. <br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|\n",
|
||||||
|
"| **whitelist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to use in this run. Same values listed above for **blacklist_models** allowed for **whitelist_models**.|\n",
|
||||||
"|**experiment_exit_score**| Value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
|
"|**experiment_exit_score**| Value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
|
||||||
"|**experiment_timeout_minutes**| Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.|\n",
|
"|**experiment_timeout_hours**| Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n",
|
||||||
"|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
|
"|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
|
||||||
"|**featurization**| 'auto' / 'off' Indicator for whether featurization step should be done automatically or not. Note: If the input data is sparse, featurization cannot be turned on.|\n",
|
"|**featurization**| 'auto' / 'off' Indicator for whether featurization step should be done automatically or not. Note: If the input data is sparse, featurization cannot be turned on.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
@@ -305,7 +332,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"experiment_timeout_minutes\" : 20,\n",
|
" \"experiment_timeout_hours\" : 0.3,\n",
|
||||||
" \"enable_early_stopping\" : True,\n",
|
" \"enable_early_stopping\" : True,\n",
|
||||||
" \"iteration_timeout_minutes\": 5,\n",
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
" \"max_concurrent_iterations\": 4,\n",
|
" \"max_concurrent_iterations\": 4,\n",
|
||||||
@@ -334,8 +361,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while."
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -382,6 +408,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"# Wait for the remote run to complete\n",
|
||||||
"remote_run.wait_for_completion()"
|
"remote_run.wait_for_completion()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -463,8 +490,31 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model's explanation\n",
|
"### Retrieve the Best Model's explanation\n",
|
||||||
"Retrieve the explanation from the best_run which includes explanations for engineered features and raw features.\n",
|
"Retrieve the explanation from the best_run which includes explanations for engineered features and raw features. Make sure that the run for generating explanations for the best model is completed."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Wait for the best model explanation run to complete\n",
|
||||||
|
"from azureml.train.automl.run import AutoMLRun\n",
|
||||||
|
"model_explainability_run_id = remote_run.get_properties().get('ModelExplainRunId')\n",
|
||||||
|
"print(model_explainability_run_id)\n",
|
||||||
|
"if model_explainability_run_id is not None:\n",
|
||||||
|
" model_explainability_run = AutoMLRun(experiment=experiment, run_id=model_explainability_run_id)\n",
|
||||||
|
" model_explainability_run.wait_for_completion()\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"# Get the best run object\n",
|
||||||
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
"#### Download engineered feature importance from artifact store\n",
|
"#### Download engineered feature importance from artifact store\n",
|
||||||
"You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run."
|
"You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run."
|
||||||
]
|
]
|
||||||
@@ -475,13 +525,32 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
|
||||||
"client = ExplanationClient.from_run(best_run)\n",
|
"client = ExplanationClient.from_run(best_run)\n",
|
||||||
"engineered_explanations = client.download_model_explanation(raw=False)\n",
|
"engineered_explanations = client.download_model_explanation(raw=False)\n",
|
||||||
"exp_data = engineered_explanations.get_feature_importance_dict()\n",
|
"exp_data = engineered_explanations.get_feature_importance_dict()\n",
|
||||||
"exp_data"
|
"exp_data"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Download raw feature importance from artifact store\n",
|
||||||
|
"You can use ExplanationClient to download the raw feature explanations from the artifact store of the best_run."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"client = ExplanationClient.from_run(best_run)\n",
|
||||||
|
"engineered_explanations = client.download_model_explanation(raw=True)\n",
|
||||||
|
"exp_data = engineered_explanations.get_feature_importance_dict()\n",
|
||||||
|
"exp_data"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -515,7 +584,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
"from azureml.automl.runtime.onnx_convert import OnnxConverter\n",
|
||||||
"onnx_fl_path = \"./best_model.onnx\"\n",
|
"onnx_fl_path = \"./best_model.onnx\"\n",
|
||||||
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
||||||
]
|
]
|
||||||
@@ -527,15 +596,6 @@
|
|||||||
"### Predict with the ONNX model, using onnxruntime package"
|
"### Predict with the ONNX model, using onnxruntime package"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"test_df = test_dataset.to_pandas_dataframe()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -552,12 +612,8 @@
|
|||||||
"else:\n",
|
"else:\n",
|
||||||
" python_version_compatible = False\n",
|
" python_version_compatible = False\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"import onnxruntime\n",
|
||||||
" import onnxruntime\n",
|
"from azureml.automl.runtime.onnx_convert import OnnxInferenceHelper\n",
|
||||||
" from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n",
|
|
||||||
" onnxrt_present = True\n",
|
|
||||||
"except ImportError:\n",
|
|
||||||
" onnxrt_present = False\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"def get_onnx_res(run):\n",
|
"def get_onnx_res(run):\n",
|
||||||
" res_path = 'onnx_resource.json'\n",
|
" res_path = 'onnx_resource.json'\n",
|
||||||
@@ -566,7 +622,8 @@
|
|||||||
" onnx_res = json.load(f)\n",
|
" onnx_res = json.load(f)\n",
|
||||||
" return onnx_res\n",
|
" return onnx_res\n",
|
||||||
"\n",
|
"\n",
|
||||||
"if onnxrt_present and python_version_compatible: \n",
|
"if python_version_compatible:\n",
|
||||||
|
" test_df = test_dataset.to_pandas_dataframe()\n",
|
||||||
" mdl_bytes = onnx_mdl.SerializeToString()\n",
|
" mdl_bytes = onnx_mdl.SerializeToString()\n",
|
||||||
" onnx_res = get_onnx_res(best_run)\n",
|
" onnx_res = get_onnx_res(best_run)\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -576,10 +633,7 @@
|
|||||||
" print(pred_onnx)\n",
|
" print(pred_onnx)\n",
|
||||||
" print(pred_prob_onnx)\n",
|
" print(pred_prob_onnx)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" if not python_version_compatible:\n",
|
" print('Please use Python version 3.6 or 3.7 to run the inference helper.')"
|
||||||
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
|
||||||
" if not onnxrt_present:\n",
|
|
||||||
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -613,20 +667,6 @@
|
|||||||
"best_run, fitted_model = remote_run.get_output()"
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import os\n",
|
|
||||||
"import shutil\n",
|
|
||||||
"\n",
|
|
||||||
"sript_folder = os.path.join(os.getcwd(), 'inference')\n",
|
|
||||||
"project_folder = '/inference'\n",
|
|
||||||
"os.makedirs(project_folder, exist_ok=True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -680,10 +720,10 @@
|
|||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime = \"python\", \n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=conda_env_file_name)\n",
|
||||||
" entry_script = script_file_name,\n",
|
"inference_config = InferenceConfig(entry_script=script_file_name, environment=myenv)\n",
|
||||||
" conda_file = conda_env_file_name)\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
@@ -826,13 +866,6 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
|
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
|
||||||
]
|
]
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
|||||||
@@ -2,12 +2,10 @@ name: auto-ml-classification-bank-marketing-all-features
|
|||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- interpret
|
|
||||||
- azureml-defaults
|
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- interpret
|
||||||
- onnxruntime
|
- onnxruntime==1.0.0
|
||||||
- azureml-explain-model
|
- azureml-explain-model
|
||||||
- azureml-contrib-interpret
|
- azureml-contrib-interpret
|
||||||
|
|||||||
@@ -210,10 +210,9 @@
|
|||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"n_cross_validations\": 3,\n",
|
" \"n_cross_validations\": 3,\n",
|
||||||
" \"primary_metric\": 'average_precision_score_weighted',\n",
|
" \"primary_metric\": 'average_precision_score_weighted',\n",
|
||||||
" \"preprocess\": True,\n",
|
|
||||||
" \"enable_early_stopping\": True,\n",
|
" \"enable_early_stopping\": True,\n",
|
||||||
" \"max_concurrent_iterations\": 2, # This is a limit for testing purpose, please increase it as per cluster size\n",
|
" \"max_concurrent_iterations\": 2, # This is a limit for testing purpose, please increase it as per cluster size\n",
|
||||||
" \"experiment_timeout_minutes\": 10, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n",
|
" \"experiment_timeout_hours\": 0.2, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n",
|
||||||
" \"verbosity\": logging.INFO,\n",
|
" \"verbosity\": logging.INFO,\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -230,8 +229,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while."
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -284,7 +282,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"widget-rundetails-sample"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
@@ -306,7 +308,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"#### Explain model\n",
|
"#### Explain model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Automated ML models can be explained and visualized using the SDK Explainability library. [Learn how to use the explainer](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.ipynb)."
|
"Automated ML models can be explained and visualized using the SDK Explainability library. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -335,17 +337,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Print the properties of the model\n",
|
"#### Print the properties of the model\n",
|
||||||
"The fitted_model is a python object and you can read the different properties of the object.\n",
|
"The fitted_model is a python object and you can read the different properties of the object.\n"
|
||||||
"See *Print the properties of the model* section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb)."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Deploy\n",
|
|
||||||
"\n",
|
|
||||||
"To deploy the model into a web service endpoint, see _Deploy_ section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -452,7 +444,7 @@
|
|||||||
"AML Compute"
|
"AML Compute"
|
||||||
],
|
],
|
||||||
"datasets": [
|
"datasets": [
|
||||||
"creditcard"
|
"Creditcard"
|
||||||
],
|
],
|
||||||
"deployment": [
|
"deployment": [
|
||||||
"None"
|
"None"
|
||||||
|
|||||||
@@ -2,10 +2,8 @@ name: auto-ml-classification-credit-card-fraud
|
|||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- interpret
|
|
||||||
- azureml-defaults
|
|
||||||
- azureml-explain-model
|
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- interpret
|
||||||
|
- azureml-explain-model
|
||||||
|
|||||||
@@ -0,0 +1,560 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Text Classification Using Deep Learning**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Data](#Data)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Evaluate](#Evaluate)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"This notebook demonstrates classification with text data using deep learning in AutoML.\n",
|
||||||
|
"\n",
|
||||||
|
"AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data. Depending on the compute cluster the user provides, AutoML tried out Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used, and Bidirectional Long-Short Term neural network (BiLSTM) when a CPU compute is used, thereby optimizing the choice of DNN for the uesr's setup.\n",
|
||||||
|
"\n",
|
||||||
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
|
"\n",
|
||||||
|
"An Enterprise workspace is required for this notebook. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade).\n",
|
||||||
|
"\n",
|
||||||
|
"Notebook synopsis:\n",
|
||||||
|
"1. Creating an Experiment in an existing Workspace\n",
|
||||||
|
"2. Configuration and remote run of AutoML for a text dataset (20 Newsgroups dataset from scikit-learn) for classification\n",
|
||||||
|
"3. Evaluating the final model on a test set\n",
|
||||||
|
"4. Deploying the model on ACI"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import logging\n",
|
||||||
|
"import os\n",
|
||||||
|
"import shutil\n",
|
||||||
|
"\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.core.dataset import Dataset\n",
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"from azureml.core.run import Run\n",
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"from azureml.core.model import Model \n",
|
||||||
|
"from helper import run_inference, get_result_df\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
|
"from sklearn.datasets import fetch_20newsgroups"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose an experiment name.\n",
|
||||||
|
"experiment_name = 'automl-classification-text-dnn'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace Name'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Set up a compute cluster\n",
|
||||||
|
"This section uses a user-provided compute cluster (named \"cpu-cluster\" in this example). If a cluster with this name does not exist in the user's workspace, the below code will create a new cluster. You can choose the parameters of the cluster as mentioned in the comments.\n",
|
||||||
|
"\n",
|
||||||
|
"Whether you provide/select a CPU or GPU cluster, AutoML will choose the appropriate DNN for that setup - BiLSTM or BERT text featurizer will be included in the candidate featurizers on CPU and GPU respectively."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"cpu-dnntext\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
"\n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # CPU for BiLSTM\n",
|
||||||
|
" # To use BERT, select a GPU such as \"STANDARD_NC6\" \n",
|
||||||
|
" # or similar GPU option\n",
|
||||||
|
" # available in your workspace\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
"\n",
|
||||||
|
"print('Checking cluster status...')\n",
|
||||||
|
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
"\n",
|
||||||
|
"# For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get data\n",
|
||||||
|
"For this notebook we will use 20 Newsgroups data from scikit-learn. We filter the data to contain four classes and take a sample as training data. Please note that for accuracy improvement, more data is needed. For this notebook we provide a small-data example so that you can use this template to use with your larger sized data."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"data_dir = \"text-dnn-data\" # Local directory to store data\n",
|
||||||
|
"blobstore_datadir = data_dir # Blob store directory to store data in\n",
|
||||||
|
"target_column_name = 'y'\n",
|
||||||
|
"feature_column_name = 'X'\n",
|
||||||
|
"\n",
|
||||||
|
"def get_20newsgroups_data():\n",
|
||||||
|
" '''Fetches 20 Newsgroups data from scikit-learn\n",
|
||||||
|
" Returns them in form of pandas dataframes\n",
|
||||||
|
" '''\n",
|
||||||
|
" remove = ('headers', 'footers', 'quotes')\n",
|
||||||
|
" categories = [\n",
|
||||||
|
" 'alt.atheism',\n",
|
||||||
|
" 'talk.religion.misc',\n",
|
||||||
|
" 'comp.graphics',\n",
|
||||||
|
" 'sci.space',\n",
|
||||||
|
" ]\n",
|
||||||
|
"\n",
|
||||||
|
" data = fetch_20newsgroups(subset = 'train', categories = categories,\n",
|
||||||
|
" shuffle = True, random_state = 42,\n",
|
||||||
|
" remove = remove)\n",
|
||||||
|
" data = pd.DataFrame({feature_column_name: data.data, target_column_name: data.target})\n",
|
||||||
|
"\n",
|
||||||
|
" data_train = data[:200]\n",
|
||||||
|
" data_test = data[200:300] \n",
|
||||||
|
"\n",
|
||||||
|
" data_train = remove_blanks_20news(data_train, feature_column_name, target_column_name)\n",
|
||||||
|
" data_test = remove_blanks_20news(data_test, feature_column_name, target_column_name)\n",
|
||||||
|
" \n",
|
||||||
|
" return data_train, data_test\n",
|
||||||
|
" \n",
|
||||||
|
"def remove_blanks_20news(data, feature_column_name, target_column_name):\n",
|
||||||
|
" \n",
|
||||||
|
" data[feature_column_name] = data[feature_column_name].replace(r'\\n', ' ', regex=True).apply(lambda x: x.strip())\n",
|
||||||
|
" data = data[data[feature_column_name] != '']\n",
|
||||||
|
" \n",
|
||||||
|
" return data"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Featch data and upload to datastore for use in training"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"data_train, data_test = get_20newsgroups_data()\n",
|
||||||
|
"\n",
|
||||||
|
"if not os.path.isdir(data_dir):\n",
|
||||||
|
" os.mkdir(data_dir)\n",
|
||||||
|
" \n",
|
||||||
|
"train_data_fname = data_dir + '/train_data.csv'\n",
|
||||||
|
"test_data_fname = data_dir + '/test_data.csv'\n",
|
||||||
|
"\n",
|
||||||
|
"data_train.to_csv(train_data_fname, index=False)\n",
|
||||||
|
"data_test.to_csv(test_data_fname, index=False)\n",
|
||||||
|
"\n",
|
||||||
|
"datastore = ws.get_default_datastore()\n",
|
||||||
|
"datastore.upload(src_dir=data_dir, target_path=blobstore_datadir,\n",
|
||||||
|
" overwrite=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"train_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, blobstore_datadir + '/train_data.csv')])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Prepare AutoML run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"experiment_timeout_minutes\": 20,\n",
|
||||||
|
" \"primary_metric\": 'accuracy',\n",
|
||||||
|
" \"max_concurrent_iterations\": 4, \n",
|
||||||
|
" \"max_cores_per_iteration\": -1,\n",
|
||||||
|
" \"enable_dnn\": True,\n",
|
||||||
|
" \"enable_early_stopping\": True,\n",
|
||||||
|
" \"validation_size\": 0.3,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
" \"enable_voting_ensemble\": False,\n",
|
||||||
|
" \"enable_stack_ensemble\": False,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
|
" debug_log = 'automl_errors.log',\n",
|
||||||
|
" compute_target=compute_target,\n",
|
||||||
|
" training_data=train_dataset,\n",
|
||||||
|
" label_column_name=target_column_name,\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Submit AutoML Run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"automl_run = experiment.submit(automl_config, show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"automl_run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Retrieve the Best Model\n",
|
||||||
|
"Below we select the best model pipeline from our iterations, use it to test on test data on the same compute cluster."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can test the model locally to get a feel of the input/output. This step may require additional package installations such as pytorch."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#best_run, fitted_model = automl_run.get_output()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploying the model\n",
|
||||||
|
"We now use the best fitted model from the AutoML Run to make predictions on the test set. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Get results stats, extract the best model from AutoML run, download and register the resultant best model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"summary_df = get_result_df(automl_run)\n",
|
||||||
|
"best_dnn_run_id = summary_df['run_id'].iloc[0]\n",
|
||||||
|
"best_dnn_run = Run(experiment, best_dnn_run_id)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model_dir = 'Model' # Local folder where the model will be stored temporarily\n",
|
||||||
|
"if not os.path.isdir(model_dir):\n",
|
||||||
|
" os.mkdir(model_dir)\n",
|
||||||
|
" \n",
|
||||||
|
"best_dnn_run.download_file('outputs/model.pkl', model_dir + '/model.pkl')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Register the model in your Azure Machine Learning Workspace. If you previously registered a model, please make sure to delete it so as to replace it with this new model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Register the model\n",
|
||||||
|
"model_name = 'textDNN-20News'\n",
|
||||||
|
"model = Model.register(model_path = model_dir + '/model.pkl',\n",
|
||||||
|
" model_name = model_name,\n",
|
||||||
|
" tags=None,\n",
|
||||||
|
" workspace=ws)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Evaluate on Test Data"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We now use the best fitted model from the AutoML Run to make predictions on the test set. \n",
|
||||||
|
"\n",
|
||||||
|
"Test set schema should match that of the training set."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"test_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, blobstore_datadir + '/test_data.csv')])\n",
|
||||||
|
"\n",
|
||||||
|
"# preview the first 3 rows of the dataset\n",
|
||||||
|
"test_dataset.take(3).to_pandas_dataframe()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"test_experiment = Experiment(ws, experiment_name + \"_test\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"script_folder = os.path.join(os.getcwd(), 'inference')\n",
|
||||||
|
"os.makedirs(script_folder, exist_ok=True)\n",
|
||||||
|
"shutil.copy2('infer.py', script_folder)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run, test_dataset,\n",
|
||||||
|
" target_column_name, model_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Display computed metrics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"test_run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"RunDetails(test_run).show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"test_run.wait_for_completion()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"pd.Series(test_run.get_metrics())"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "anshirga"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"compute": [
|
||||||
|
"AML Compute"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"None"
|
||||||
|
],
|
||||||
|
"deployment": [
|
||||||
|
"None"
|
||||||
|
],
|
||||||
|
"exclude_from_index": false,
|
||||||
|
"framework": [
|
||||||
|
"None"
|
||||||
|
],
|
||||||
|
"friendly_name": "DNN Text Featurization",
|
||||||
|
"index_order": 2,
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.7"
|
||||||
|
},
|
||||||
|
"tags": [
|
||||||
|
"None"
|
||||||
|
],
|
||||||
|
"task": "Text featurization using DNNs for classification"
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-classification-text-dnn
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- azurmel-train
|
||||||
@@ -0,0 +1,60 @@
|
|||||||
|
import pandas as pd
|
||||||
|
from azureml.core import Environment
|
||||||
|
from azureml.core.conda_dependencies import CondaDependencies
|
||||||
|
from azureml.train.estimator import Estimator
|
||||||
|
from azureml.core.run import Run
|
||||||
|
|
||||||
|
|
||||||
|
def run_inference(test_experiment, compute_target, script_folder, train_run,
|
||||||
|
test_dataset, target_column_name, model_name):
|
||||||
|
|
||||||
|
train_run.download_file('outputs/conda_env_v_1_0_0.yml',
|
||||||
|
'inference/condafile.yml')
|
||||||
|
|
||||||
|
inference_env = Environment("myenv")
|
||||||
|
inference_env.docker.enabled = True
|
||||||
|
inference_env.python.conda_dependencies = CondaDependencies(
|
||||||
|
conda_dependencies_file_path='inference/condafile.yml')
|
||||||
|
|
||||||
|
est = Estimator(source_directory=script_folder,
|
||||||
|
entry_script='infer.py',
|
||||||
|
script_params={
|
||||||
|
'--target_column_name': target_column_name,
|
||||||
|
'--model_name': model_name
|
||||||
|
},
|
||||||
|
inputs=[test_dataset.as_named_input('test_data')],
|
||||||
|
compute_target=compute_target,
|
||||||
|
environment_definition=inference_env)
|
||||||
|
|
||||||
|
run = test_experiment.submit(
|
||||||
|
est, tags={
|
||||||
|
'training_run_id': train_run.id,
|
||||||
|
'run_algorithm': train_run.properties['run_algorithm'],
|
||||||
|
'valid_score': train_run.properties['score'],
|
||||||
|
'primary_metric': train_run.properties['primary_metric']
|
||||||
|
})
|
||||||
|
|
||||||
|
run.log("run_algorithm", run.tags['run_algorithm'])
|
||||||
|
return run
|
||||||
|
|
||||||
|
|
||||||
|
def get_result_df(remote_run):
|
||||||
|
|
||||||
|
children = list(remote_run.get_children(recursive=True))
|
||||||
|
summary_df = pd.DataFrame(index=['run_id', 'run_algorithm',
|
||||||
|
'primary_metric', 'Score'])
|
||||||
|
goal_minimize = False
|
||||||
|
for run in children:
|
||||||
|
if('run_algorithm' in run.properties and 'score' in run.properties):
|
||||||
|
summary_df[run.id] = [run.id, run.properties['run_algorithm'],
|
||||||
|
run.properties['primary_metric'],
|
||||||
|
float(run.properties['score'])]
|
||||||
|
if('goal' in run.properties):
|
||||||
|
goal_minimize = run.properties['goal'].split('_')[-1] == 'min'
|
||||||
|
|
||||||
|
summary_df = summary_df.T.sort_values(
|
||||||
|
'Score',
|
||||||
|
ascending=goal_minimize).drop_duplicates(['run_algorithm'])
|
||||||
|
summary_df = summary_df.set_index('run_algorithm')
|
||||||
|
|
||||||
|
return summary_df
|
||||||
@@ -0,0 +1,54 @@
|
|||||||
|
import numpy as np
|
||||||
|
import argparse
|
||||||
|
from azureml.core import Run
|
||||||
|
from sklearn.externals import joblib
|
||||||
|
from azureml.automl.core._vendor.automl.client.core.common import metrics
|
||||||
|
from automl.client.core.common import constants
|
||||||
|
from azureml.core.model import Model
|
||||||
|
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument(
|
||||||
|
'--target_column_name', type=str, dest='target_column_name',
|
||||||
|
help='Target Column Name')
|
||||||
|
parser.add_argument(
|
||||||
|
'--model_name', type=str, dest='model_name',
|
||||||
|
help='Name of registered model')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
target_column_name = args.target_column_name
|
||||||
|
model_name = args.model_name
|
||||||
|
|
||||||
|
print('args passed are: ')
|
||||||
|
print('Target column name: ', target_column_name)
|
||||||
|
print('Name of registered model: ', model_name)
|
||||||
|
|
||||||
|
model_path = Model.get_model_path(model_name)
|
||||||
|
# deserialize the model file back into a sklearn model
|
||||||
|
model = joblib.load(model_path)
|
||||||
|
|
||||||
|
run = Run.get_context()
|
||||||
|
# get input dataset by name
|
||||||
|
test_dataset = run.input_datasets['test_data']
|
||||||
|
|
||||||
|
X_test_df = test_dataset.drop_columns(columns=[target_column_name]) \
|
||||||
|
.to_pandas_dataframe()
|
||||||
|
y_test_df = test_dataset.with_timestamp_columns(None) \
|
||||||
|
.keep_columns(columns=[target_column_name]) \
|
||||||
|
.to_pandas_dataframe()
|
||||||
|
|
||||||
|
predicted = model.predict_proba(X_test_df)
|
||||||
|
|
||||||
|
# use automl metrics module
|
||||||
|
scores = metrics.compute_metrics_classification(
|
||||||
|
np.array(predicted),
|
||||||
|
np.array(y_test_df),
|
||||||
|
class_labels=model.classes_,
|
||||||
|
metrics=list(constants.Metric.SCALAR_CLASSIFICATION_SET)
|
||||||
|
)
|
||||||
|
|
||||||
|
print("scores:")
|
||||||
|
print(scores)
|
||||||
|
|
||||||
|
for key, value in scores.items():
|
||||||
|
run.log(key, value)
|
||||||
@@ -210,7 +210,24 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data Ingestion Pipeline \n",
|
"## Data Ingestion Pipeline \n",
|
||||||
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You can replace this with your own dataset, or you can skip this pipeline if you already have a time-series based `TabularDataset`.\n",
|
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You can replace this with your own dataset, or you can skip this pipeline if you already have a time-series based `TabularDataset`.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# The name and target column of the Dataset to create \n",
|
||||||
|
"dataset = \"NOAA-Weather-DS4\"\n",
|
||||||
|
"target_column_name = \"temperature\""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"### Upload Data Step\n",
|
"### Upload Data Step\n",
|
||||||
"The data ingestion pipeline has a single step with a script to query the latest weather data and upload it to the blob store. During the first run, the script will create and register a time-series based `TabularDataset` with the past one week of weather data. For each subsequent run, the script will create a partition in the blob store by querying NOAA for new weather data since the last modified time of the dataset (`dataset.data_changed_time`) and creating a data.csv file."
|
"The data ingestion pipeline has a single step with a script to query the latest weather data and upload it to the blob store. During the first run, the script will create and register a time-series based `TabularDataset` with the past one week of weather data. For each subsequent run, the script will create a partition in the blob store by querying NOAA for new weather data since the last modified time of the dataset (`dataset.data_changed_time`) and creating a data.csv file."
|
||||||
@@ -225,8 +242,6 @@
|
|||||||
"from azureml.pipeline.core import Pipeline, PipelineParameter\n",
|
"from azureml.pipeline.core import Pipeline, PipelineParameter\n",
|
||||||
"from azureml.pipeline.steps import PythonScriptStep\n",
|
"from azureml.pipeline.steps import PythonScriptStep\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# The name of the Dataset to create \n",
|
|
||||||
"dataset = \"NOAA-Weather-DS4\"\n",
|
|
||||||
"ds_name = PipelineParameter(name=\"ds_name\", default_value=dataset)\n",
|
"ds_name = PipelineParameter(name=\"ds_name\", default_value=dataset)\n",
|
||||||
"upload_data_step = PythonScriptStep(script_name=\"upload_weather_data.py\", \n",
|
"upload_data_step = PythonScriptStep(script_name=\"upload_weather_data.py\", \n",
|
||||||
" allow_reuse=False,\n",
|
" allow_reuse=False,\n",
|
||||||
@@ -272,7 +287,7 @@
|
|||||||
"## Training Pipeline\n",
|
"## Training Pipeline\n",
|
||||||
"### Prepare Training Data Step\n",
|
"### Prepare Training Data Step\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Script to bring data into common X,y format. We need to set allow_reuse flag to False to allow the pipeline to run even when inputs don't change. We also need the name of the model to check the time the model was last trained."
|
"Script to check if new data is available since the model was last trained. If no new data is available, we cancel the remaining pipeline steps. We need to set allow_reuse flag to False to allow the pipeline to run even when inputs don't change. We also need the name of the model to check the time the model was last trained."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -283,11 +298,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.pipeline.core import PipelineData\n",
|
"from azureml.pipeline.core import PipelineData\n",
|
||||||
"\n",
|
"\n",
|
||||||
"target_column = PipelineParameter(\"target_column\", default_value=\"y\")\n",
|
|
||||||
"# The model name with which to register the trained model in the workspace.\n",
|
"# The model name with which to register the trained model in the workspace.\n",
|
||||||
"model_name = PipelineParameter(\"model_name\", default_value=\"y\")\n",
|
"model_name = PipelineParameter(\"model_name\", default_value=\"noaaweatherds\")"
|
||||||
"output_x = PipelineData(\"output_x\", datastore=dstor)\n",
|
|
||||||
"output_y = PipelineData(\"output_y\", datastore=dstor)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -299,16 +311,23 @@
|
|||||||
"data_prep_step = PythonScriptStep(script_name=\"check_data.py\", \n",
|
"data_prep_step = PythonScriptStep(script_name=\"check_data.py\", \n",
|
||||||
" allow_reuse=False,\n",
|
" allow_reuse=False,\n",
|
||||||
" name=\"check_data\",\n",
|
" name=\"check_data\",\n",
|
||||||
" arguments=[\"--target_column\", target_column,\n",
|
" arguments=[\"--ds_name\", ds_name,\n",
|
||||||
" \"--output_x\", output_x,\n",
|
" \"--model_name\", model_name],\n",
|
||||||
" \"--output_y\", output_y,\n",
|
|
||||||
" \"--ds_name\", ds_name,\n",
|
|
||||||
" \"--model_name\", model_name],\n",
|
|
||||||
" outputs=[output_x, output_y], \n",
|
|
||||||
" compute_target=compute_target, \n",
|
" compute_target=compute_target, \n",
|
||||||
" runconfig=conda_run_config)"
|
" runconfig=conda_run_config)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Dataset\n",
|
||||||
|
"train_ds = Dataset.get_by_name(ws, dataset)\n",
|
||||||
|
"train_ds = train_ds.drop_columns([\"partition_date\"])"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -323,14 +342,14 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl import AutoMLStep, AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
|
"from azureml.train.automl import AutoMLStep\n",
|
||||||
"\n",
|
"\n",
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\": 20,\n",
|
" \"iteration_timeout_minutes\": 10,\n",
|
||||||
" \"experiment_timeout_minutes\": 30,\n",
|
" \"experiment_timeout_hours\": 0.2,\n",
|
||||||
" \"n_cross_validations\": 3,\n",
|
" \"n_cross_validations\": 3,\n",
|
||||||
" \"primary_metric\": 'r2_score',\n",
|
" \"primary_metric\": 'r2_score',\n",
|
||||||
" \"preprocess\": True,\n",
|
|
||||||
" \"max_concurrent_iterations\": 3,\n",
|
" \"max_concurrent_iterations\": 3,\n",
|
||||||
" \"max_cores_per_iteration\": -1,\n",
|
" \"max_cores_per_iteration\": -1,\n",
|
||||||
" \"verbosity\": logging.INFO,\n",
|
" \"verbosity\": logging.INFO,\n",
|
||||||
@@ -341,8 +360,8 @@
|
|||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
" path = \".\",\n",
|
" path = \".\",\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" run_configuration=conda_run_config,\n",
|
" training_data = train_ds,\n",
|
||||||
" data_script = \"get_data.py\",\n",
|
" label_column_name = target_column_name,\n",
|
||||||
" **automl_settings\n",
|
" **automl_settings\n",
|
||||||
" )"
|
" )"
|
||||||
]
|
]
|
||||||
@@ -358,7 +377,7 @@
|
|||||||
"metrics_output_name = 'metrics_output'\n",
|
"metrics_output_name = 'metrics_output'\n",
|
||||||
"best_model_output_name = 'best_model_output'\n",
|
"best_model_output_name = 'best_model_output'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"metirics_data = PipelineData(name='metrics_data',\n",
|
"metrics_data = PipelineData(name='metrics_data',\n",
|
||||||
" datastore=dstor,\n",
|
" datastore=dstor,\n",
|
||||||
" pipeline_output_name=metrics_output_name,\n",
|
" pipeline_output_name=metrics_output_name,\n",
|
||||||
" training_output=TrainingOutput(type='Metrics'))\n",
|
" training_output=TrainingOutput(type='Metrics'))\n",
|
||||||
@@ -377,8 +396,7 @@
|
|||||||
"automl_step = AutoMLStep(\n",
|
"automl_step = AutoMLStep(\n",
|
||||||
" name='automl_module',\n",
|
" name='automl_module',\n",
|
||||||
" automl_config=automl_config,\n",
|
" automl_config=automl_config,\n",
|
||||||
" inputs=[output_x, output_y],\n",
|
" outputs=[metrics_data, model_data],\n",
|
||||||
" outputs=[metirics_data, model_data],\n",
|
|
||||||
" allow_reuse=False)"
|
" allow_reuse=False)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -431,7 +449,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"training_pipeline_run = experiment.submit(training_pipeline, pipeline_parameters={\n",
|
"training_pipeline_run = experiment.submit(training_pipeline, pipeline_parameters={\n",
|
||||||
" \"target_column\": \"temperature\", \"ds_name\": dataset, \"model_name\": \"noaaweatherds\"})"
|
" \"ds_name\": dataset, \"model_name\": \"noaaweatherds\"})"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -440,7 +458,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"training_pipeline_run.wait_for_completion()"
|
"training_pipeline_run.wait_for_completion(show_output=False)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -474,7 +492,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.pipeline.core import Schedule\n",
|
"from azureml.pipeline.core import Schedule\n",
|
||||||
"schedule = Schedule.create(workspace=ws, name=\"RetrainingSchedule\",\n",
|
"schedule = Schedule.create(workspace=ws, name=\"RetrainingSchedule\",\n",
|
||||||
" pipeline_parameters={\"target_column\": \"temperature\",\"ds_name\": dataset, \"model_name\": \"noaaweatherds\"},\n",
|
" pipeline_parameters={\"ds_name\": dataset, \"model_name\": \"noaaweatherds\"},\n",
|
||||||
" pipeline_id=published_pipeline.id, \n",
|
" pipeline_id=published_pipeline.id, \n",
|
||||||
" experiment_name=experiment_name, \n",
|
" experiment_name=experiment_name, \n",
|
||||||
" datastore=dstor,\n",
|
" datastore=dstor,\n",
|
||||||
|
|||||||
@@ -3,7 +3,6 @@ dependencies:
|
|||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-pipeline
|
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- azureml-pipeline
|
||||||
|
|||||||
@@ -15,32 +15,16 @@ if type(run) == _OfflineRun:
|
|||||||
else:
|
else:
|
||||||
ws = run.experiment.workspace
|
ws = run.experiment.workspace
|
||||||
|
|
||||||
|
print("Check for new data.")
|
||||||
def write_output(df, path):
|
|
||||||
os.makedirs(path, exist_ok=True)
|
|
||||||
print("%s created" % path)
|
|
||||||
df.to_csv(path + "/part-00000", index=False)
|
|
||||||
|
|
||||||
|
|
||||||
print("Check for new data and prepare the data")
|
|
||||||
|
|
||||||
parser = argparse.ArgumentParser("split")
|
parser = argparse.ArgumentParser("split")
|
||||||
parser.add_argument("--target_column", type=str, help="input split features")
|
|
||||||
parser.add_argument("--ds_name", help="input dataset name")
|
parser.add_argument("--ds_name", help="input dataset name")
|
||||||
parser.add_argument("--model_name", help="name of the deployed model")
|
parser.add_argument("--model_name", help="name of the deployed model")
|
||||||
parser.add_argument("--output_x", type=str,
|
|
||||||
help="output features")
|
|
||||||
parser.add_argument("--output_y", type=str,
|
|
||||||
help="output labels")
|
|
||||||
|
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
print("Argument 1(ds_name): %s" % args.ds_name)
|
print("Argument 1(ds_name): %s" % args.ds_name)
|
||||||
print("Argument 2(target_column): %s" % args.target_column)
|
print("Argument 2(model_name): %s" % args.model_name)
|
||||||
print("Argument 3(model_name): %s" % args.model_name)
|
|
||||||
print("Argument 4(output_x): %s" % args.output_x)
|
|
||||||
print("Argument 5(output_y): %s" % args.output_y)
|
|
||||||
|
|
||||||
# Get the latest registered model
|
# Get the latest registered model
|
||||||
try:
|
try:
|
||||||
@@ -54,22 +38,9 @@ except Exception as e:
|
|||||||
train_ds = Dataset.get_by_name(ws, args.ds_name)
|
train_ds = Dataset.get_by_name(ws, args.ds_name)
|
||||||
dataset_changed_time = train_ds.data_changed_time
|
dataset_changed_time = train_ds.data_changed_time
|
||||||
|
|
||||||
if dataset_changed_time > last_train_time:
|
if not dataset_changed_time > last_train_time:
|
||||||
# New data is available since the model was last trained
|
|
||||||
print("Dataset was last updated on {0}. Retraining...".format(dataset_changed_time))
|
|
||||||
train_ds = train_ds.drop_columns(["partition_date"])
|
|
||||||
X_train = train_ds.drop_columns(
|
|
||||||
columns=[args.target_column]).to_pandas_dataframe()
|
|
||||||
y_train = train_ds.keep_columns(
|
|
||||||
columns=[args.target_column]).to_pandas_dataframe()
|
|
||||||
|
|
||||||
non_null = y_train[args.target_column].notnull()
|
|
||||||
y = y_train[non_null]
|
|
||||||
X = X_train[non_null]
|
|
||||||
|
|
||||||
if not (args.output_x is None and args.output_y is None):
|
|
||||||
write_output(X, args.output_x)
|
|
||||||
write_output(y, args.output_y)
|
|
||||||
else:
|
|
||||||
print("Cancelling run since there is no new data.")
|
print("Cancelling run since there is no new data.")
|
||||||
run.parent.cancel()
|
run.parent.cancel()
|
||||||
|
else:
|
||||||
|
# New data is available since the model was last trained
|
||||||
|
print("Dataset was last updated on {0}. Retraining...".format(dataset_changed_time))
|
||||||
|
|||||||
@@ -1,15 +0,0 @@
|
|||||||
import os
|
|
||||||
import pandas as pd
|
|
||||||
|
|
||||||
|
|
||||||
def get_data():
|
|
||||||
print("In get_data")
|
|
||||||
print(os.environ['AZUREML_DATAREFERENCE_output_x'])
|
|
||||||
X_train = pd.read_csv(
|
|
||||||
os.environ['AZUREML_DATAREFERENCE_output_x'] + "/part-00000")
|
|
||||||
y_train = pd.read_csv(
|
|
||||||
os.environ['AZUREML_DATAREFERENCE_output_y'] + "/part-00000")
|
|
||||||
|
|
||||||
print(X_train.head(3))
|
|
||||||
|
|
||||||
return {"X": X_train.values, "y": y_train.values.flatten()}
|
|
||||||
@@ -58,7 +58,7 @@ except Exception as e:
|
|||||||
print(traceback.format_exc())
|
print(traceback.format_exc())
|
||||||
print("Dataset with name {0} not found, registering new dataset.".format(args.ds_name))
|
print("Dataset with name {0} not found, registering new dataset.".format(args.ds_name))
|
||||||
register_dataset = True
|
register_dataset = True
|
||||||
end_time_last_slice = datetime.today() - relativedelta(weeks=1)
|
end_time_last_slice = datetime.today() - relativedelta(weeks=2)
|
||||||
|
|
||||||
end_time = datetime.utcnow()
|
end_time = datetime.utcnow()
|
||||||
train_df = get_noaa_data(end_time_last_slice, end_time)
|
train_df = get_noaa_data(end_time_last_slice, end_time)
|
||||||
@@ -80,10 +80,10 @@ if train_df.size > 0:
|
|||||||
target_path=folder_name,
|
target_path=folder_name,
|
||||||
overwrite=True,
|
overwrite=True,
|
||||||
show_progress=True)
|
show_progress=True)
|
||||||
|
|
||||||
if register_dataset:
|
|
||||||
ds = Dataset.Tabular.from_delimited_files(dstor.path("{}/**/*.csv".format(
|
|
||||||
args.ds_name)), partition_format='/{partition_date:yyyy/MM/dd/hh/mm/ss}/data.csv')
|
|
||||||
ds.register(ws, name=args.ds_name)
|
|
||||||
else:
|
else:
|
||||||
print("No new data since {0}.".format(end_time_last_slice))
|
print("No new data since {0}.".format(end_time_last_slice))
|
||||||
|
|
||||||
|
if register_dataset:
|
||||||
|
ds = Dataset.Tabular.from_delimited_files(dstor.path("{}/**/*.csv".format(
|
||||||
|
args.ds_name)), partition_format='/{partition_date:yyyy/MM/dd/HH/mm/ss}/data.csv')
|
||||||
|
ds.register(ws, name=args.ds_name)
|
||||||
|
|||||||
@@ -301,7 +301,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Setting forecaster maximum horizon \n",
|
"### Setting forecaster maximum horizon \n",
|
||||||
"\n",
|
"\n",
|
||||||
"The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 4 periods (i.e. 4 months). Notice that this is much shorter than the number of days in the test set; we will need to use a rolling test to evaluate the performance on the whole test set. For more discussion of forecast horizons and guiding principles for setting them, please see the [energy demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand). "
|
"The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 12 periods (i.e. 12 months). Notice that this is much shorter than the number of months in the test set; we will need to use a rolling test to evaluate the performance on the whole test set. For more discussion of forecast horizons and guiding principles for setting them, please see the [energy demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand). "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -358,12 +358,12 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"automl_config = AutoMLConfig(task='forecasting', \n",
|
"automl_config = AutoMLConfig(task='forecasting', \n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" experiment_timeout_minutes = 60,\n",
|
" experiment_timeout_hours = 1,\n",
|
||||||
" training_data=train_dataset,\n",
|
" training_data=train_dataset,\n",
|
||||||
" label_column_name=target_column_name,\n",
|
" label_column_name=target_column_name,\n",
|
||||||
" validation_data=valid_dataset, \n",
|
" validation_data=valid_dataset, \n",
|
||||||
" verbosity=logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" compute_target = compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" max_concurrent_iterations=4,\n",
|
" max_concurrent_iterations=4,\n",
|
||||||
" max_cores_per_iteration=-1,\n",
|
" max_cores_per_iteration=-1,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
@@ -376,7 +376,7 @@
|
|||||||
"hidePrompt": false
|
"hidePrompt": false
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"We will now run the experiment, starting with 10 iterations of model search. The experiment can be continued for more iterations if more accurate results are required. You will see the currently running iterations printing to the console."
|
"We will now run the experiment, starting with 10 iterations of model search. The experiment can be continued for more iterations if more accurate results are required."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -555,10 +555,8 @@
|
|||||||
"import shutil\n",
|
"import shutil\n",
|
||||||
"\n",
|
"\n",
|
||||||
"script_folder = os.path.join(os.getcwd(), 'inference')\n",
|
"script_folder = os.path.join(os.getcwd(), 'inference')\n",
|
||||||
"project_folder = './inference'\n",
|
"os.makedirs(script_folder, exist_ok=True)\n",
|
||||||
"os.makedirs(project_folder, exist_ok=True)\n",
|
"shutil.copy2('infer.py', script_folder)"
|
||||||
"\n",
|
|
||||||
"!copy infer.py inference"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -5,8 +5,6 @@ dependencies:
|
|||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-train
|
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- azureml-train
|
||||||
- statsmodels
|
|
||||||
|
|||||||
@@ -76,9 +76,12 @@ def get_result_df(remote_run):
|
|||||||
def run_inference(test_experiment, compute_target, script_folder, train_run,
|
def run_inference(test_experiment, compute_target, script_folder, train_run,
|
||||||
test_dataset, lookback_dataset, max_horizon,
|
test_dataset, lookback_dataset, max_horizon,
|
||||||
target_column_name, time_column_name, freq):
|
target_column_name, time_column_name, freq):
|
||||||
train_run.download_file('outputs/model.pkl', 'inference/model.pkl')
|
model_base_name = 'model.pkl'
|
||||||
train_run.download_file('outputs/conda_env_v_1_0_0.yml',
|
if 'model_data_location' in train_run.properties:
|
||||||
'inference/condafile.yml')
|
model_location = train_run.properties['model_data_location']
|
||||||
|
_, model_base_name = model_location.rsplit('/', 1)
|
||||||
|
train_run.download_file('outputs/{}'.format(model_base_name), 'inference/{}'.format(model_base_name))
|
||||||
|
train_run.download_file('outputs/conda_env_v_1_0_0.yml', 'inference/condafile.yml')
|
||||||
|
|
||||||
inference_env = Environment("myenv")
|
inference_env = Environment("myenv")
|
||||||
inference_env.docker.enabled = True
|
inference_env.docker.enabled = True
|
||||||
@@ -91,7 +94,8 @@ def run_inference(test_experiment, compute_target, script_folder, train_run,
|
|||||||
'--max_horizon': max_horizon,
|
'--max_horizon': max_horizon,
|
||||||
'--target_column_name': target_column_name,
|
'--target_column_name': target_column_name,
|
||||||
'--time_column_name': time_column_name,
|
'--time_column_name': time_column_name,
|
||||||
'--frequency': freq
|
'--frequency': freq,
|
||||||
|
'--model_path': model_base_name
|
||||||
},
|
},
|
||||||
inputs=[test_dataset.as_named_input('test_data'),
|
inputs=[test_dataset.as_named_input('test_data'),
|
||||||
lookback_dataset.as_named_input('lookback_data')],
|
lookback_dataset.as_named_input('lookback_data')],
|
||||||
|
|||||||
@@ -232,6 +232,9 @@ parser.add_argument(
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'--frequency', type=str, dest='freq',
|
'--frequency', type=str, dest='freq',
|
||||||
help='Frequency of prediction')
|
help='Frequency of prediction')
|
||||||
|
parser.add_argument(
|
||||||
|
'--model_path', type=str, dest='model_path',
|
||||||
|
default='model.pkl', help='Filename of model to be loaded')
|
||||||
|
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
@@ -239,6 +242,7 @@ max_horizon = args.max_horizon
|
|||||||
target_column_name = args.target_column_name
|
target_column_name = args.target_column_name
|
||||||
time_column_name = args.time_column_name
|
time_column_name = args.time_column_name
|
||||||
freq = args.freq
|
freq = args.freq
|
||||||
|
model_path = args.model_path
|
||||||
|
|
||||||
|
|
||||||
print('args passed are: ')
|
print('args passed are: ')
|
||||||
@@ -246,6 +250,7 @@ print(max_horizon)
|
|||||||
print(target_column_name)
|
print(target_column_name)
|
||||||
print(time_column_name)
|
print(time_column_name)
|
||||||
print(freq)
|
print(freq)
|
||||||
|
print(model_path)
|
||||||
|
|
||||||
run = Run.get_context()
|
run = Run.get_context()
|
||||||
# get input dataset by name
|
# get input dataset by name
|
||||||
@@ -267,7 +272,8 @@ X_lookback_df = lookback_dataset.drop_columns(columns=[target_column_name])
|
|||||||
y_lookback_df = lookback_dataset.with_timestamp_columns(
|
y_lookback_df = lookback_dataset.with_timestamp_columns(
|
||||||
None).keep_columns(columns=[target_column_name])
|
None).keep_columns(columns=[target_column_name])
|
||||||
|
|
||||||
fitted_model = joblib.load('model.pkl')
|
fitted_model = joblib.load(model_path)
|
||||||
|
|
||||||
|
|
||||||
if hasattr(fitted_model, 'get_lookback'):
|
if hasattr(fitted_model, 'get_lookback'):
|
||||||
lookback = fitted_model.get_lookback()
|
lookback = fitted_model.get_lookback()
|
||||||
|
|||||||
@@ -42,7 +42,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"AutoML highlights here include built-in holiday featurization, accessing engineered feature names, and working with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n",
|
"AutoML highlights here include built-in holiday featurization, accessing engineered feature names, and working with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Notebook synopsis:\n",
|
"Notebook synopsis:\n",
|
||||||
"1. Creating an Experiment in an existing Workspace\n",
|
"1. Creating an Experiment in an existing Workspace\n",
|
||||||
@@ -98,6 +98,7 @@
|
|||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"output['Workspace'] = ws.name\n",
|
"output['Workspace'] = ws.name\n",
|
||||||
|
"output['SKU'] = ws.sku\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['Location'] = ws.location\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Run History Name'] = experiment_name\n",
|
"output['Run History Name'] = experiment_name\n",
|
||||||
@@ -127,7 +128,7 @@
|
|||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for your cluster.\n",
|
"# Choose a name for your cluster.\n",
|
||||||
"amlcompute_cluster_name = \"cpu-cluster-6\"\n",
|
"amlcompute_cluster_name = \"cpu-cluster-bike\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"found = False\n",
|
"found = False\n",
|
||||||
"# Check if this compute target already exists in the workspace.\n",
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
@@ -160,7 +161,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace), is paired with the storage account, which contains the default data store. We will use it to upload the bike share data and create [tabular dataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
|
"The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace) is paired with the storage account, which contains the default data store. We will use it to upload the bike share data and create [tabular dataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -201,7 +202,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'dataset/bike-no.csv')]).with_timestamp_columns(fine_grain_timestamp=time_column_name) \n",
|
"dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'dataset/bike-no.csv')]).with_timestamp_columns(fine_grain_timestamp=time_column_name) \n",
|
||||||
"dataset.take(5).to_pandas_dataframe()"
|
"dataset.take(5).to_pandas_dataframe().reset_index(drop=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -220,8 +221,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# select data that occurs before a specified date\n",
|
"# select data that occurs before a specified date\n",
|
||||||
"train = dataset.time_before(datetime(2012, 9, 1))\n",
|
"train = dataset.time_before(datetime(2012, 8, 31), include_boundary=True)\n",
|
||||||
"train.to_pandas_dataframe().tail(5)"
|
"train.to_pandas_dataframe().tail(5).reset_index(drop=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -230,8 +231,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"test = dataset.time_after(datetime(2012, 8, 31))\n",
|
"test = dataset.time_after(datetime(2012, 9, 1), include_boundary=True)\n",
|
||||||
"test.to_pandas_dataframe().head(5)"
|
"test.to_pandas_dataframe().head(5).reset_index(drop=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -246,8 +247,8 @@
|
|||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|forecasting|\n",
|
"|**task**|forecasting|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
||||||
"|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.constants.supportedmodels.regression?view=azure-ml-py).|\n",
|
"|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n",
|
||||||
"|**experiment_timeout_minutes**|Experimentation timeout in minutes.|\n",
|
"|**experiment_timeout_hours**|Experimentation timeout in hours.|\n",
|
||||||
"|**training_data**|Input dataset, containing both features and label column.|\n",
|
"|**training_data**|Input dataset, containing both features and label column.|\n",
|
||||||
"|**label_column_name**|The name of the label column.|\n",
|
"|**label_column_name**|The name of the label column.|\n",
|
||||||
"|**compute_target**|The remote compute for training.|\n",
|
"|**compute_target**|The remote compute for training.|\n",
|
||||||
@@ -259,7 +260,7 @@
|
|||||||
"|**target_lags**|The target_lags specifies how far back we will construct the lags of the target variable.|\n",
|
"|**target_lags**|The target_lags specifies how far back we will construct the lags of the target variable.|\n",
|
||||||
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
|
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_minutes parameter value to get results."
|
"This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -304,11 +305,11 @@
|
|||||||
"automl_config = AutoMLConfig(task='forecasting', \n",
|
"automl_config = AutoMLConfig(task='forecasting', \n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" blacklist_models = ['ExtremeRandomTrees'], \n",
|
" blacklist_models = ['ExtremeRandomTrees'], \n",
|
||||||
" experiment_timeout_minutes=20,\n",
|
" experiment_timeout_hours=0.3,\n",
|
||||||
" training_data=train,\n",
|
" training_data=train,\n",
|
||||||
" label_column_name=target_column_name,\n",
|
" label_column_name=target_column_name,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" enable_early_stopping = True,\n",
|
" enable_early_stopping=True,\n",
|
||||||
" n_cross_validations=3, \n",
|
" n_cross_validations=3, \n",
|
||||||
" max_concurrent_iterations=4,\n",
|
" max_concurrent_iterations=4,\n",
|
||||||
" max_cores_per_iteration=-1,\n",
|
" max_cores_per_iteration=-1,\n",
|
||||||
@@ -445,13 +446,12 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
|
"import shutil\n",
|
||||||
"\n",
|
"\n",
|
||||||
"script_folder = os.path.join(os.getcwd(), 'forecast')\n",
|
"script_folder = os.path.join(os.getcwd(), 'forecast')\n",
|
||||||
"project_folder = './forecast'\n",
|
"os.makedirs(script_folder, exist_ok=True)\n",
|
||||||
"os.makedirs(project_folder, exist_ok=True)\n",
|
"shutil.copy2('forecasting_script.py', script_folder)\n",
|
||||||
"\n",
|
"shutil.copy2('forecasting_helper.py', script_folder)"
|
||||||
"!copy forecasting_script.py forecast\n",
|
|
||||||
"!copy forecasting_helper.py forecast"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -586,7 +586,7 @@
|
|||||||
],
|
],
|
||||||
"category": "tutorial",
|
"category": "tutorial",
|
||||||
"compute": [
|
"compute": [
|
||||||
"remote"
|
"Remote"
|
||||||
],
|
],
|
||||||
"datasets": [
|
"datasets": [
|
||||||
"BikeShare"
|
"BikeShare"
|
||||||
@@ -625,7 +625,7 @@
|
|||||||
"tags": [
|
"tags": [
|
||||||
"Forecasting"
|
"Forecasting"
|
||||||
],
|
],
|
||||||
"task": "forecasting",
|
"task": "Forecasting",
|
||||||
"version": 3
|
"version": 3
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -7,5 +7,3 @@ dependencies:
|
|||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
|
||||||
- statsmodels
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
import argparse
|
import argparse
|
||||||
import azureml.train.automl
|
import azureml.train.automl
|
||||||
from azureml.automl.core._vendor.automl.client.core.runtime import forecasting_models
|
from azureml.automl.runtime._vendor.automl.client.core.runtime import forecasting_models
|
||||||
from azureml.core import Run
|
from azureml.core import Run
|
||||||
from sklearn.externals import joblib
|
from sklearn.externals import joblib
|
||||||
import forecasting_helper
|
import forecasting_helper
|
||||||
@@ -32,18 +32,17 @@ test_dataset = run.input_datasets['test_data']
|
|||||||
|
|
||||||
grain_column_names = []
|
grain_column_names = []
|
||||||
|
|
||||||
df = test_dataset.to_pandas_dataframe()
|
df = test_dataset.to_pandas_dataframe().reset_index(drop=True)
|
||||||
|
|
||||||
X_test_df = test_dataset.drop_columns(columns=[target_column_name])
|
X_test_df = test_dataset.drop_columns(columns=[target_column_name]).to_pandas_dataframe().reset_index(drop=True)
|
||||||
y_test_df = test_dataset.with_timestamp_columns(
|
y_test_df = test_dataset.with_timestamp_columns(None).keep_columns(columns=[target_column_name]).to_pandas_dataframe()
|
||||||
None).keep_columns(columns=[target_column_name])
|
|
||||||
|
|
||||||
fitted_model = joblib.load('model.pkl')
|
fitted_model = joblib.load('model.pkl')
|
||||||
|
|
||||||
df_all = forecasting_helper.do_rolling_forecast(
|
df_all = forecasting_helper.do_rolling_forecast(
|
||||||
fitted_model,
|
fitted_model,
|
||||||
X_test_df.to_pandas_dataframe(),
|
X_test_df,
|
||||||
y_test_df.to_pandas_dataframe().values.T[0],
|
y_test_df.values.T[0],
|
||||||
target_column_name,
|
target_column_name,
|
||||||
time_column_name,
|
time_column_name,
|
||||||
max_horizon,
|
max_horizon,
|
||||||
|
|||||||
@@ -31,8 +31,8 @@
|
|||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Advanced Forecasting\n",
|
"Advanced Forecasting\n",
|
||||||
"1. [Advanced Training](#Advanced Training)\n",
|
"1. [Advanced Training](#advanced_training)\n",
|
||||||
"1. [Advanced Results](#Advanced Results)"
|
"1. [Advanced Results](#advanced_results)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -211,7 +211,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"dataset = Dataset.Tabular.from_delimited_files(path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/nyc_energy.csv\").with_timestamp_columns(fine_grain_timestamp=time_column_name) \n",
|
"dataset = Dataset.Tabular.from_delimited_files(path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/nyc_energy.csv\").with_timestamp_columns(fine_grain_timestamp=time_column_name) \n",
|
||||||
"dataset.take(5).to_pandas_dataframe()"
|
"dataset.take(5).to_pandas_dataframe().reset_index(drop=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -253,7 +253,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# split into train based on time\n",
|
"# split into train based on time\n",
|
||||||
"train = dataset.time_before(datetime(2017, 8, 8, 5), include_boundary=True)\n",
|
"train = dataset.time_before(datetime(2017, 8, 8, 5), include_boundary=True)\n",
|
||||||
"train.to_pandas_dataframe().sort_values(time_column_name).tail(5)"
|
"train.to_pandas_dataframe().reset_index(drop=True).sort_values(time_column_name).tail(5)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -263,8 +263,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# split into test based on time\n",
|
"# split into test based on time\n",
|
||||||
"test = dataset.time_between(datetime(2017, 8, 8, 5), datetime(2017, 8, 10, 5))\n",
|
"test = dataset.time_between(datetime(2017, 8, 8, 6), datetime(2017, 8, 10, 5))\n",
|
||||||
"test.to_pandas_dataframe().head(5)"
|
"test.to_pandas_dataframe().reset_index(drop=True).head(5)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -301,8 +301,8 @@
|
|||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|forecasting|\n",
|
"|**task**|forecasting|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
"|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.constants.supportedmodels.regression?view=azure-ml-py).|\n",
|
"|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n",
|
||||||
"|**experiment_timeout_minutes**|Maximum amount of time in minutes that the experiment take before it terminates.|\n",
|
"|**experiment_timeout_hours**|Maximum amount of time in hours that the experiment take before it terminates.|\n",
|
||||||
"|**training_data**|The training data to be used within the experiment.|\n",
|
"|**training_data**|The training data to be used within the experiment.|\n",
|
||||||
"|**label_column_name**|The name of the label column.|\n",
|
"|**label_column_name**|The name of the label column.|\n",
|
||||||
"|**compute_target**|The remote compute for training.|\n",
|
"|**compute_target**|The remote compute for training.|\n",
|
||||||
@@ -316,7 +316,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_minutes parameter value to get results."
|
"This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -333,11 +333,11 @@
|
|||||||
"automl_config = AutoMLConfig(task='forecasting', \n",
|
"automl_config = AutoMLConfig(task='forecasting', \n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" blacklist_models = ['ExtremeRandomTrees', 'AutoArima', 'Prophet'], \n",
|
" blacklist_models = ['ExtremeRandomTrees', 'AutoArima', 'Prophet'], \n",
|
||||||
" experiment_timeout_minutes=20,\n",
|
" experiment_timeout_hours=0.3,\n",
|
||||||
" training_data=train,\n",
|
" training_data=train,\n",
|
||||||
" label_column_name=target_column_name,\n",
|
" label_column_name=target_column_name,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" enable_early_stopping = True,\n",
|
" enable_early_stopping=True,\n",
|
||||||
" n_cross_validations=3, \n",
|
" n_cross_validations=3, \n",
|
||||||
" verbosity=logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
@@ -454,7 +454,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"X_test = test.to_pandas_dataframe()\n",
|
"X_test = test.to_pandas_dataframe().reset_index(drop=True)\n",
|
||||||
"y_test = X_test.pop(target_column_name).values"
|
"y_test = X_test.pop(target_column_name).values"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -463,11 +463,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Forecast Function\n",
|
"### Forecast Function\n",
|
||||||
"For forecasting, we will use the forecast function instead of the predict function. There are two reasons for this.\n",
|
"For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see notebook on [high frequency forecasting](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb)."
|
||||||
"\n",
|
|
||||||
"We need to pass the recent values of the target variable y, whereas the scikit-compatible predict function only takes the non-target variables 'test'. In our case, the test data immediately follows the training data, and we fill the target variable with NaN. The NaN serves as a question mark for the forecaster to fill with the actuals. Using the forecast function will produce forecasts using the shortest possible forecast horizon. The last time at which a definite (non-NaN) value is seen is the forecast origin - the last time when the value of the target is known.\n",
|
|
||||||
"\n",
|
|
||||||
"Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -476,15 +472,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Replace ALL values in y by NaN.\n",
|
|
||||||
"# The forecast origin will be at the beginning of the first forecast period.\n",
|
|
||||||
"# (Which is the same time as the end of the last training period.)\n",
|
|
||||||
"y_query = y_test.copy().astype(np.float)\n",
|
|
||||||
"y_query.fill(np.nan)\n",
|
|
||||||
"# The featurized data, aligned to y, will also be returned.\n",
|
"# The featurized data, aligned to y, will also be returned.\n",
|
||||||
"# This contains the assumptions that were made in the forecast\n",
|
"# This contains the assumptions that were made in the forecast\n",
|
||||||
"# and helps align the forecast to the original data\n",
|
"# and helps align the forecast to the original data\n",
|
||||||
"y_predictions, X_trans = fitted_model.forecast(X_test, y_query)"
|
"y_predictions, X_trans = fitted_model.forecast(X_test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -557,7 +548,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Advanced Training\n",
|
"## Advanced Training <a id=\"advanced_training\"></a>\n",
|
||||||
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
|
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -587,7 +578,7 @@
|
|||||||
"automl_config = AutoMLConfig(task='forecasting', \n",
|
"automl_config = AutoMLConfig(task='forecasting', \n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" blacklist_models = ['ElasticNet','ExtremeRandomTrees','GradientBoosting','XGBoostRegressor','ExtremeRandomTrees', 'AutoArima', 'Prophet'], #These models are blacklisted for tutorial purposes, remove this for real use cases. \n",
|
" blacklist_models = ['ElasticNet','ExtremeRandomTrees','GradientBoosting','XGBoostRegressor','ExtremeRandomTrees', 'AutoArima', 'Prophet'], #These models are blacklisted for tutorial purposes, remove this for real use cases. \n",
|
||||||
" experiment_timeout_minutes=20,\n",
|
" experiment_timeout_hours=0.3,\n",
|
||||||
" training_data=train,\n",
|
" training_data=train,\n",
|
||||||
" label_column_name=target_column_name,\n",
|
" label_column_name=target_column_name,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
@@ -642,7 +633,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Advanced Results\n",
|
"## Advanced Results<a id=\"advanced_results\"></a>\n",
|
||||||
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
|
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -652,15 +643,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Replace ALL values in y by NaN.\n",
|
|
||||||
"# The forecast origin will be at the beginning of the first forecast period.\n",
|
|
||||||
"# (Which is the same time as the end of the last training period.)\n",
|
|
||||||
"y_query = y_test.copy().astype(np.float)\n",
|
|
||||||
"y_query.fill(np.nan)\n",
|
|
||||||
"# The featurized data, aligned to y, will also be returned.\n",
|
"# The featurized data, aligned to y, will also be returned.\n",
|
||||||
"# This contains the assumptions that were made in the forecast\n",
|
"# This contains the assumptions that were made in the forecast\n",
|
||||||
"# and helps align the forecast to the original data\n",
|
"# and helps align the forecast to the original data\n",
|
||||||
"y_predictions, X_trans = fitted_model_lags.forecast(X_test, y_query)"
|
"y_predictions, X_trans = fitted_model_lags.forecast(X_test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -730,14 +716,7 @@
|
|||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.8"
|
"version": "3.6.8"
|
||||||
},
|
}
|
||||||
"star_tag": [
|
|
||||||
"featured"
|
|
||||||
],
|
|
||||||
"tags": [
|
|
||||||
""
|
|
||||||
],
|
|
||||||
"task": "Forecasting"
|
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
|
|||||||
@@ -2,11 +2,9 @@ name: auto-ml-forecasting-energy-demand
|
|||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- interpret
|
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- interpret
|
||||||
- statsmodels
|
|
||||||
- azureml-explain-model
|
- azureml-explain-model
|
||||||
- azureml-contrib-interpret
|
- azureml-contrib-interpret
|
||||||
|
|||||||
@@ -103,6 +103,7 @@
|
|||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"output['Workspace'] = ws.name\n",
|
"output['Workspace'] = ws.name\n",
|
||||||
|
"output['SKU'] = ws.sku\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['Location'] = ws.location\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Run History Name'] = experiment_name\n",
|
"output['Run History Name'] = experiment_name\n",
|
||||||
@@ -257,7 +258,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"amlcompute_cluster_name = \"cpu-cluster-8\"\n",
|
"amlcompute_cluster_name = \"cpu-cluster-fcfn\"\n",
|
||||||
" \n",
|
" \n",
|
||||||
"found = False\n",
|
"found = False\n",
|
||||||
"# Check if this compute target already exists in the workspace.\n",
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
@@ -334,7 +335,7 @@
|
|||||||
"automl_config = AutoMLConfig(task='forecasting',\n",
|
"automl_config = AutoMLConfig(task='forecasting',\n",
|
||||||
" debug_log='automl_forecasting_function.log',\n",
|
" debug_log='automl_forecasting_function.log',\n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" experiment_timeout_minutes=15,\n",
|
" experiment_timeout_hours=0.25,\n",
|
||||||
" enable_early_stopping=True,\n",
|
" enable_early_stopping=True,\n",
|
||||||
" training_data=train_data,\n",
|
" training_data=train_data,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
@@ -376,9 +377,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `X_test` and `y_query` below, taken together, form the **forecast request**. The two are interpreted as aligned - `y_query` could actally be a column in `X_test`. `NaN`s in `y_query` are the question marks. These will be filled with the forecasts.\n",
|
"We use `X_test` as a **forecast request** to generate the predictions."
|
||||||
"\n",
|
|
||||||
"When the forecast period immediately follows the training period, the models retain the last few points of data. You can simply fill `y_query` filled with question marks - the model has the data for the lookback already.\n"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -407,8 +406,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"y_query = np.repeat(np.NaN, X_test.shape[0])\n",
|
"y_pred_no_gap, xy_nogap = fitted_model.forecast(X_test)\n",
|
||||||
"y_pred_no_gap, xy_nogap = fitted_model.forecast(X_test, y_query)\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# xy_nogap contains the predictions in the _automl_target_col column.\n",
|
"# xy_nogap contains the predictions in the _automl_target_col column.\n",
|
||||||
"# Those same numbers are output in y_pred_no_gap\n",
|
"# Those same numbers are output in y_pred_no_gap\n",
|
||||||
@@ -436,7 +434,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"quantiles = fitted_model.forecast_quantiles(X_test, y_query)\n",
|
"quantiles = fitted_model.forecast_quantiles(X_test)\n",
|
||||||
"quantiles"
|
"quantiles"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -447,7 +445,7 @@
|
|||||||
"#### Distribution forecasts\n",
|
"#### Distribution forecasts\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Often the figure of interest is not just the point prediction, but the prediction at some quantile of the distribution. \n",
|
"Often the figure of interest is not just the point prediction, but the prediction at some quantile of the distribution. \n",
|
||||||
"This arises when the forecast is used to control some kind of inventory, for example of grocery items of virtual machines for a cloud service. In such case, the control point is usually something like \"we want the item to be in stock and not run out 99% of the time\". This is called a \"service level\". Here is how you get quantile forecasts."
|
"This arises when the forecast is used to control some kind of inventory, for example of grocery items or virtual machines for a cloud service. In such case, the control point is usually something like \"we want the item to be in stock and not run out 99% of the time\". This is called a \"service level\". Here is how you get quantile forecasts."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -459,10 +457,10 @@
|
|||||||
"# specify which quantiles you would like \n",
|
"# specify which quantiles you would like \n",
|
||||||
"fitted_model.quantiles = [0.01, 0.5, 0.95]\n",
|
"fitted_model.quantiles = [0.01, 0.5, 0.95]\n",
|
||||||
"# use forecast_quantiles function, not the forecast() one\n",
|
"# use forecast_quantiles function, not the forecast() one\n",
|
||||||
"y_pred_quantiles = fitted_model.forecast_quantiles(X_test, y_query)\n",
|
"y_pred_quantiles = fitted_model.forecast_quantiles(X_test)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# it all nicely aligns column-wise\n",
|
"# it all nicely aligns column-wise\n",
|
||||||
"pd.concat([X_test.reset_index(), pd.DataFrame({'query' : y_query}), y_pred_quantiles], axis=1)"
|
"pd.concat([X_test.reset_index(), y_pred_quantiles], axis=1)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -471,7 +469,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"#### Destination-date forecast: \"just do something\"\n",
|
"#### Destination-date forecast: \"just do something\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In some scenarios, the X_test is not known. The forecast is likely to be weak, becaus it is missing contemporaneous predictors, which we will need to impute. If you still wish to predict forward under the assumption that the last known values will be carried forward, you can forecast out to \"destination date\". The destination date still needs to fit within the maximum horizon from training."
|
"In some scenarios, the X_test is not known. The forecast is likely to be weak, because it is missing contemporaneous predictors, which we will need to impute. If you still wish to predict forward under the assumption that the last known values will be carried forward, you can forecast out to \"destination date\". The destination date still needs to fit within the maximum horizon from training."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -538,9 +536,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"try: \n",
|
"try: \n",
|
||||||
" y_query = y_away.copy()\n",
|
" y_pred_away, xy_away = fitted_model.forecast(X_away)\n",
|
||||||
" y_query.fill(np.NaN)\n",
|
|
||||||
" y_pred_away, xy_away = fitted_model.forecast(X_away, y_query)\n",
|
|
||||||
" xy_away\n",
|
" xy_away\n",
|
||||||
"except Exception as e:\n",
|
"except Exception as e:\n",
|
||||||
" print(e)"
|
" print(e)"
|
||||||
@@ -550,7 +546,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"How should we read that eror message? The forecast origin is at the last time themodel saw an actual values of `y` (the target). That was at the end of the training data! Because the model received all `NaN` (and not an actual target value), it is attempting to forecast from the end of training data. But the requested forecast periods are past the maximum horizon. We need to provide a define `y` value to establish the forecast origin.\n",
|
"How should we read that eror message? The forecast origin is at the last time the model saw an actual value of `y` (the target). That was at the end of the training data! The model is attempting to forecast from the end of training data. But the requested forecast periods are past the maximum horizon. We need to provide a define `y` value to establish the forecast origin.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We will use this helper function to take the required amount of context from the data preceding the testing data. It's definition is intentionally simplified to keep the idea in the clear."
|
"We will use this helper function to take the required amount of context from the data preceding the testing data. It's definition is intentionally simplified to keep the idea in the clear."
|
||||||
]
|
]
|
||||||
@@ -710,7 +706,7 @@
|
|||||||
],
|
],
|
||||||
"category": "tutorial",
|
"category": "tutorial",
|
||||||
"compute": [
|
"compute": [
|
||||||
"remote"
|
"Remote"
|
||||||
],
|
],
|
||||||
"datasets": [
|
"datasets": [
|
||||||
"None"
|
"None"
|
||||||
@@ -739,13 +735,13 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.7"
|
"version": "3.6.8"
|
||||||
},
|
},
|
||||||
"tags": [
|
"tags": [
|
||||||
"Forecasting",
|
"Forecasting",
|
||||||
"Confidence Intervals"
|
"Confidence Intervals"
|
||||||
],
|
],
|
||||||
"task": "forecasting"
|
"task": "Forecasting"
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
|
|||||||
@@ -6,6 +6,4 @@ dependencies:
|
|||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- pandas_ml
|
|
||||||
- statsmodels
|
|
||||||
- matplotlib
|
- matplotlib
|
||||||
|
|||||||
@@ -40,7 +40,7 @@
|
|||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.\n",
|
"In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration notebook](../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
|
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
|
||||||
]
|
]
|
||||||
@@ -92,6 +92,7 @@
|
|||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"output['Workspace'] = ws.name\n",
|
"output['Workspace'] = ws.name\n",
|
||||||
|
"output['SKU'] = ws.sku\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['Location'] = ws.location\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Run History Name'] = experiment_name\n",
|
"output['Run History Name'] = experiment_name\n",
|
||||||
@@ -121,7 +122,7 @@
|
|||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for your cluster.\n",
|
"# Choose a name for your cluster.\n",
|
||||||
"amlcompute_cluster_name = \"cpu-cluster-7\"\n",
|
"amlcompute_cluster_name = \"cpu-cluster-oj\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"found = False\n",
|
"found = False\n",
|
||||||
"# Check if this compute target already exists in the workspace.\n",
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
@@ -135,7 +136,7 @@
|
|||||||
" print('Creating a new compute target...')\n",
|
" print('Creating a new compute target...')\n",
|
||||||
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
" #vm_priority = 'lowpriority', # optional\n",
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
" max_nodes = 4)\n",
|
" max_nodes = 6)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # Create the cluster.\n",
|
" # Create the cluster.\n",
|
||||||
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
@@ -324,9 +325,9 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
|
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up-to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
|
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *X_valid* and *y_valid* parameters of AutoMLConfig.\n",
|
"Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *validation_data* parameter of AutoMLConfig.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here is a summary of AutoMLConfig parameters used for training the OJ model:\n",
|
"Here is a summary of AutoMLConfig parameters used for training the OJ model:\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -334,7 +335,7 @@
|
|||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|forecasting|\n",
|
"|**task**|forecasting|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
||||||
"|**experiment_timeout_minutes**|Experimentation timeout in minutes.|\n",
|
"|**experiment_timeout_hours**|Experimentation timeout in hours.|\n",
|
||||||
"|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n",
|
"|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n",
|
||||||
"|**training_data**|Input dataset, containing both features and label column.|\n",
|
"|**training_data**|Input dataset, containing both features and label column.|\n",
|
||||||
"|**label_column_name**|The name of the label column.|\n",
|
"|**label_column_name**|The name of the label column.|\n",
|
||||||
@@ -365,14 +366,12 @@
|
|||||||
"automl_config = AutoMLConfig(task='forecasting',\n",
|
"automl_config = AutoMLConfig(task='forecasting',\n",
|
||||||
" debug_log='automl_oj_sales_errors.log',\n",
|
" debug_log='automl_oj_sales_errors.log',\n",
|
||||||
" primary_metric='normalized_mean_absolute_error',\n",
|
" primary_metric='normalized_mean_absolute_error',\n",
|
||||||
" experiment_timeout_minutes=15,\n",
|
" experiment_timeout_hours=0.25,\n",
|
||||||
" training_data=train_dataset,\n",
|
" training_data=train_dataset,\n",
|
||||||
" label_column_name=target_column_name,\n",
|
" label_column_name=target_column_name,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" enable_early_stopping = True,\n",
|
" enable_early_stopping=True,\n",
|
||||||
" n_cross_validations=3,\n",
|
" n_cross_validations=3,\n",
|
||||||
" max_concurrent_iterations=4,\n",
|
|
||||||
" max_cores_per_iteration=-1,\n",
|
|
||||||
" verbosity=logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" **time_series_settings)"
|
" **time_series_settings)"
|
||||||
]
|
]
|
||||||
@@ -455,9 +454,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data. \n",
|
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data."
|
||||||
"\n",
|
|
||||||
"We will first create a query `y_query`, which is aligned index-for-index to `X_test`. This is a vector of target values where each `NaN` serves the function of the question mark to be replaced by forecast. Passing definite values in the `y` argument allows the `forecast` function to make predictions on data that does not immediately follow the train data which contains `y`. In each grain, the last time point where the model sees a definite value of `y` is that grain's _forecast origin_."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -466,15 +463,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Replace ALL values in y by NaN.\n",
|
|
||||||
"# The forecast origin will be at the beginning of the first forecast period.\n",
|
|
||||||
"# (Which is the same time as the end of the last training period.)\n",
|
|
||||||
"y_query = y_test.copy().astype(np.float)\n",
|
|
||||||
"y_query.fill(np.nan)\n",
|
|
||||||
"# The featurized data, aligned to y, will also be returned.\n",
|
"# The featurized data, aligned to y, will also be returned.\n",
|
||||||
"# This contains the assumptions that were made in the forecast\n",
|
"# This contains the assumptions that were made in the forecast\n",
|
||||||
"# and helps align the forecast to the original data\n",
|
"# and helps align the forecast to the original data\n",
|
||||||
"y_predictions, X_trans = fitted_model.forecast(X_test, y_query)"
|
"y_predictions, X_trans = fitted_model.forecast(X_test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -570,138 +562,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Develop the scoring script\n",
|
"### Develop the scoring script\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Serializing and deserializing complex data frames may be tricky. We first develop the ```run()``` function of the scoring script locally, then write it into a scoring script. It is much easier to debug any quirks of the scoring function without crossing two compute environments. For this exercise, we handle a common quirk of how pandas dataframes serialize time stamp values."
|
"For the deployment we need a function which will run the forecast on serialized data. It can be obtained from the best_run."
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# this is where we test the run function of the scoring script interactively\n",
|
|
||||||
"# before putting it in the scoring script\n",
|
|
||||||
"\n",
|
|
||||||
"timestamp_columns = ['WeekStarting']\n",
|
|
||||||
"\n",
|
|
||||||
"def run(rawdata, test_model = None):\n",
|
|
||||||
" \"\"\"\n",
|
|
||||||
" Intended to process 'rawdata' string produced by\n",
|
|
||||||
" \n",
|
|
||||||
" {'X': X_test.to_json(), y' : y_test.to_json()}\n",
|
|
||||||
" \n",
|
|
||||||
" Don't convert the X payload to numpy.array, use it as pandas.DataFrame\n",
|
|
||||||
" \"\"\"\n",
|
|
||||||
" try:\n",
|
|
||||||
" # unpack the data frame with timestamp \n",
|
|
||||||
" rawobj = json.loads(rawdata) # rawobj is now a dict of strings \n",
|
|
||||||
" X_pred = pd.read_json(rawobj['X'], convert_dates=False) # load the pandas DF from a json string\n",
|
|
||||||
" for col in timestamp_columns: # fix timestamps\n",
|
|
||||||
" X_pred[col] = pd.to_datetime(X_pred[col], unit='ms') \n",
|
|
||||||
" \n",
|
|
||||||
" y_pred = np.array(rawobj['y']) # reconstitute numpy array from serialized list\n",
|
|
||||||
" \n",
|
|
||||||
" if test_model is None:\n",
|
|
||||||
" result = model.forecast(X_pred, y_pred) # use the global model from init function\n",
|
|
||||||
" else:\n",
|
|
||||||
" result = test_model.forecast(X_pred, y_pred) # use the model on which we are testing\n",
|
|
||||||
" \n",
|
|
||||||
" except Exception as e:\n",
|
|
||||||
" result = str(e)\n",
|
|
||||||
" return json.dumps({\"error\": result})\n",
|
|
||||||
" \n",
|
|
||||||
" forecast_as_list = result[0].tolist()\n",
|
|
||||||
" index_as_df = result[1].index.to_frame().reset_index(drop=True)\n",
|
|
||||||
" \n",
|
|
||||||
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
|
||||||
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
|
||||||
" })"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# test the run function here before putting in the scoring script\n",
|
|
||||||
"import json\n",
|
|
||||||
"\n",
|
|
||||||
"test_sample = json.dumps({'X': X_test.to_json(), 'y' : y_query.tolist()})\n",
|
|
||||||
"response = run(test_sample, fitted_model)\n",
|
|
||||||
"\n",
|
|
||||||
"# unpack the response, dealing with the timestamp serialization again\n",
|
|
||||||
"res_dict = json.loads(response)\n",
|
|
||||||
"y_fcst_all = pd.read_json(res_dict['index'])\n",
|
|
||||||
"y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
|
|
||||||
"y_fcst_all['forecast'] = res_dict['forecast']\n",
|
|
||||||
"y_fcst_all.head()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Now that the function works locally in the notebook, let's write it down into the scoring script. The scoring script is authored by the data scientist. Adjust it to taste, adding inputs, outputs and processing as needed."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%%writefile score_fcast.py\n",
|
|
||||||
"import pickle\n",
|
|
||||||
"import json\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"import azureml.train.automl\n",
|
|
||||||
"from sklearn.externals import joblib\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"def init():\n",
|
|
||||||
" global model\n",
|
|
||||||
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
|
||||||
" # deserialize the model file back into a sklearn model\n",
|
|
||||||
" model = joblib.load(model_path)\n",
|
|
||||||
"\n",
|
|
||||||
"timestamp_columns = ['WeekStarting']\n",
|
|
||||||
"\n",
|
|
||||||
"def run(rawdata, test_model = None):\n",
|
|
||||||
" \"\"\"\n",
|
|
||||||
" Intended to process 'rawdata' string produced by\n",
|
|
||||||
" \n",
|
|
||||||
" {'X': X_test.to_json(), y' : y_test.to_json()}\n",
|
|
||||||
" \n",
|
|
||||||
" Don't convert the X payload to numpy.array, use it as pandas.DataFrame\n",
|
|
||||||
" \"\"\"\n",
|
|
||||||
" try:\n",
|
|
||||||
" # unpack the data frame with timestamp \n",
|
|
||||||
" rawobj = json.loads(rawdata) # rawobj is now a dict of strings \n",
|
|
||||||
" X_pred = pd.read_json(rawobj['X'], convert_dates=False) # load the pandas DF from a json string\n",
|
|
||||||
" for col in timestamp_columns: # fix timestamps\n",
|
|
||||||
" X_pred[col] = pd.to_datetime(X_pred[col], unit='ms') \n",
|
|
||||||
" \n",
|
|
||||||
" y_pred = np.array(rawobj['y']) # reconstitute numpy array from serialized list\n",
|
|
||||||
" \n",
|
|
||||||
" if test_model is None:\n",
|
|
||||||
" result = model.forecast(X_pred, y_pred) # use the global model from init function\n",
|
|
||||||
" else:\n",
|
|
||||||
" result = test_model.forecast(X_pred, y_pred) # use the model on which we are testing\n",
|
|
||||||
" \n",
|
|
||||||
" except Exception as e:\n",
|
|
||||||
" result = str(e)\n",
|
|
||||||
" return json.dumps({\"error\": result})\n",
|
|
||||||
" \n",
|
|
||||||
" # prepare to send over wire as json\n",
|
|
||||||
" forecast_as_list = result[0].tolist()\n",
|
|
||||||
" index_as_df = result[1].index.to_frame().reset_index(drop=True)\n",
|
|
||||||
" \n",
|
|
||||||
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
|
||||||
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
|
||||||
" })"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -711,11 +572,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"script_file_name = 'score_fcast.py'\n",
|
"script_file_name = 'score_fcast.py'\n",
|
||||||
"with open(script_file_name, 'r') as cefr:\n",
|
"best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file_name)"
|
||||||
" content = cefr.read()\n",
|
|
||||||
"\n",
|
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
|
||||||
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -773,14 +630,18 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# we send the data to the service serialized into a json string\n",
|
"import json\n",
|
||||||
"test_sample = json.dumps({'X':X_test.to_json(), 'y' : y_query.tolist()})\n",
|
"X_query = X_test.copy()\n",
|
||||||
|
"# We have to convert datetime to string, because Timestamps cannot be serialized to JSON.\n",
|
||||||
|
"X_query[time_column_name] = X_query[time_column_name].astype(str)\n",
|
||||||
|
"# The Service object accept the complex dictionary, which is internally converted to JSON string.\n",
|
||||||
|
"# The section 'data' contains the data frame in the form of dictionary.\n",
|
||||||
|
"test_sample = json.dumps({'data': X_query.to_dict(orient='records')})\n",
|
||||||
"response = aci_service.run(input_data = test_sample)\n",
|
"response = aci_service.run(input_data = test_sample)\n",
|
||||||
"\n",
|
|
||||||
"# translate from networkese to datascientese\n",
|
"# translate from networkese to datascientese\n",
|
||||||
"try: \n",
|
"try: \n",
|
||||||
" res_dict = json.loads(response)\n",
|
" res_dict = json.loads(response)\n",
|
||||||
" y_fcst_all = pd.read_json(res_dict['index'])\n",
|
" y_fcst_all = pd.DataFrame(res_dict['index'])\n",
|
||||||
" y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
|
" y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
|
||||||
" y_fcst_all['forecast'] = res_dict['forecast'] \n",
|
" y_fcst_all['forecast'] = res_dict['forecast'] \n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
@@ -823,7 +684,7 @@
|
|||||||
"category": "tutorial",
|
"category": "tutorial",
|
||||||
"celltoolbar": "Raw Cell Format",
|
"celltoolbar": "Raw Cell Format",
|
||||||
"compute": [
|
"compute": [
|
||||||
"remote"
|
"Remote"
|
||||||
],
|
],
|
||||||
"datasets": [
|
"datasets": [
|
||||||
"Orange Juice Sales"
|
"Orange Juice Sales"
|
||||||
@@ -852,8 +713,11 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.7"
|
"version": "3.6.8"
|
||||||
},
|
},
|
||||||
|
"tags": [
|
||||||
|
"None"
|
||||||
|
],
|
||||||
"task": "Forecasting"
|
"task": "Forecasting"
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -7,5 +7,3 @@ dependencies:
|
|||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
|
||||||
- statsmodels
|
|
||||||
|
|||||||
@@ -155,8 +155,7 @@
|
|||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"n_cross_validations\": 3,\n",
|
" \"n_cross_validations\": 3,\n",
|
||||||
" \"primary_metric\": 'average_precision_score_weighted',\n",
|
" \"primary_metric\": 'average_precision_score_weighted',\n",
|
||||||
" \"preprocess\": True,\n",
|
" \"experiment_timeout_hours\": 0.2, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ability to find the best model possible\n",
|
||||||
" \"experiment_timeout_minutes\": 10, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n",
|
|
||||||
" \"verbosity\": logging.INFO,\n",
|
" \"verbosity\": logging.INFO,\n",
|
||||||
" \"enable_stack_ensemble\": False\n",
|
" \"enable_stack_ensemble\": False\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
@@ -260,17 +259,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Print the properties of the model\n",
|
"#### Print the properties of the model\n",
|
||||||
"The fitted_model is a python object and you can read the different properties of the object.\n",
|
"The fitted_model is a python object and you can read the different properties of the object.\n"
|
||||||
"See *Print the properties of the model* section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb)."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Deploy\n",
|
|
||||||
"\n",
|
|
||||||
"To deploy the model into a web service endpoint, see _Deploy_ section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -2,10 +2,8 @@ name: auto-ml-classification-credit-card-fraud-local
|
|||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- interpret
|
|
||||||
- azureml-defaults
|
|
||||||
- azureml-explain-model
|
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- interpret
|
||||||
|
- azureml-explain-model
|
||||||
|
|||||||
@@ -1,756 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Automated Machine Learning\n",
|
|
||||||
"_**Regression with Deployment using Hardware Performance Dataset**_\n",
|
|
||||||
"\n",
|
|
||||||
"## Contents\n",
|
|
||||||
"1. [Introduction](#Introduction)\n",
|
|
||||||
"1. [Setup](#Setup)\n",
|
|
||||||
"1. [Data](#Data)\n",
|
|
||||||
"1. [Train](#Train)\n",
|
|
||||||
"1. [Results](#Results)\n",
|
|
||||||
"1. [Test](#Test)\n",
|
|
||||||
"1. [Acknowledgements](#Acknowledgements)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Introduction\n",
|
|
||||||
"In this example we use the Predicting Compressive Strength of Concrete Dataset to showcase how you can use AutoML for a regression problem. The regression goal is to predict the compressive strength of concrete based off of different ingredient combinations and the quantities of those ingredients.\n",
|
|
||||||
"\n",
|
|
||||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
|
||||||
"\n",
|
|
||||||
"In this notebook you will learn how to:\n",
|
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
|
||||||
"3. Train the model using local compute.\n",
|
|
||||||
"4. Explore the results.\n",
|
|
||||||
"5. Test the best fitted model."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Setup\n",
|
|
||||||
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import logging\n",
|
|
||||||
"\n",
|
|
||||||
"from matplotlib import pyplot as plt\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"import os\n",
|
|
||||||
" \n",
|
|
||||||
"\n",
|
|
||||||
"import azureml.core\n",
|
|
||||||
"from azureml.core.experiment import Experiment\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
|
||||||
"from azureml.core.dataset import Dataset\n",
|
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ws = Workspace.from_config()\n",
|
|
||||||
"\n",
|
|
||||||
"# Choose a name for the experiment.\n",
|
|
||||||
"experiment_name = 'automl-regression-concrete'\n",
|
|
||||||
"\n",
|
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
|
||||||
"\n",
|
|
||||||
"output = {}\n",
|
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
|
||||||
"output['Workspace Name'] = ws.name\n",
|
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
|
||||||
"output['Location'] = ws.location\n",
|
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
|
||||||
"outputDf.T"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Create or Attach existing AmlCompute\n",
|
|
||||||
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
|
||||||
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
|
||||||
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
|
||||||
"\n",
|
|
||||||
"# Choose a name for your cluster.\n",
|
|
||||||
"amlcompute_cluster_name = \"automlcl\"\n",
|
|
||||||
"\n",
|
|
||||||
"found = False\n",
|
|
||||||
"# Check if this compute target already exists in the workspace.\n",
|
|
||||||
"cts = ws.compute_targets\n",
|
|
||||||
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
|
||||||
" found = True\n",
|
|
||||||
" print('Found existing compute target.')\n",
|
|
||||||
" compute_target = cts[amlcompute_cluster_name]\n",
|
|
||||||
" \n",
|
|
||||||
"if not found:\n",
|
|
||||||
" print('Creating a new compute target...')\n",
|
|
||||||
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
|
||||||
" #vm_priority = 'lowpriority', # optional\n",
|
|
||||||
" max_nodes = 6)\n",
|
|
||||||
"\n",
|
|
||||||
" # Create the cluster.\n",
|
|
||||||
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
|
||||||
" \n",
|
|
||||||
"print('Checking cluster status...')\n",
|
|
||||||
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
|
|
||||||
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
|
||||||
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
|
||||||
" \n",
|
|
||||||
"# For a more detailed view of current AmlCompute status, use get_status()."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Data\n",
|
|
||||||
"\n",
|
|
||||||
"Create a run configuration for the remote run."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
|
||||||
"import pkg_resources\n",
|
|
||||||
"\n",
|
|
||||||
"# create a new RunConfig object\n",
|
|
||||||
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
|
||||||
"\n",
|
|
||||||
"# Set compute target to AmlCompute\n",
|
|
||||||
"conda_run_config.target = compute_target\n",
|
|
||||||
"conda_run_config.environment.docker.enabled = True\n",
|
|
||||||
"\n",
|
|
||||||
"cd = CondaDependencies.create(conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
|
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Load Data\n",
|
|
||||||
"\n",
|
|
||||||
"Load the concrete strength dataset into X and y. X contains the training features, which are inputs to the model. y contains the training labels, which are the expected output of the model."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n",
|
|
||||||
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
|
|
||||||
"X = dataset.drop_columns(columns=['CONCRETE'])\n",
|
|
||||||
"y = dataset.keep_columns(columns=['CONCRETE'], validate=True)\n",
|
|
||||||
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
|
||||||
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
|
|
||||||
"dataset.take(5).to_pandas_dataframe()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Train\n",
|
|
||||||
"\n",
|
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
|
||||||
"\n",
|
|
||||||
"|Property|Description|\n",
|
|
||||||
"|-|-|\n",
|
|
||||||
"|**task**|classification or regression|\n",
|
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
|
||||||
"\n",
|
|
||||||
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"automl_settings = {\n",
|
|
||||||
" \"iteration_timeout_minutes\": 5,\n",
|
|
||||||
" \"iterations\": 10,\n",
|
|
||||||
" \"n_cross_validations\": 5,\n",
|
|
||||||
" \"primary_metric\": 'spearman_correlation',\n",
|
|
||||||
" \"preprocess\": True,\n",
|
|
||||||
" \"max_concurrent_iterations\": 5,\n",
|
|
||||||
" \"verbosity\": logging.INFO,\n",
|
|
||||||
"}\n",
|
|
||||||
"\n",
|
|
||||||
"automl_config = AutoMLConfig(task = 'regression',\n",
|
|
||||||
" debug_log = 'automl.log',\n",
|
|
||||||
" run_configuration=conda_run_config,\n",
|
|
||||||
" X = X_train,\n",
|
|
||||||
" y = y_train,\n",
|
|
||||||
" **automl_settings\n",
|
|
||||||
" )"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"remote_run = experiment.submit(automl_config, show_output = True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"remote_run"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Results\n",
|
|
||||||
"Widget for Monitoring Runs\n",
|
|
||||||
"The widget will first report a \u00e2\u20ac\u0153loading status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
|
||||||
"Note: The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.widgets import RunDetails\n",
|
|
||||||
"RunDetails(remote_run).show() "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"\n",
|
|
||||||
"Retrieve All Child Runs\n",
|
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"children = list(remote_run.get_children())\n",
|
|
||||||
"metricslist = {}\n",
|
|
||||||
"for run in children:\n",
|
|
||||||
" properties = run.get_properties()\n",
|
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
|
||||||
"\n",
|
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
|
||||||
"rundata"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Retrieve the Best Model\n",
|
|
||||||
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
|
||||||
"print(best_run)\n",
|
|
||||||
"print(fitted_model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Best Model Based on Any Other Metric\n",
|
|
||||||
"Show the run and the model that has the smallest root_mean_squared_error value (which turned out to be the same as the one with largest spearman_correlation value):"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"lookup_metric = \"root_mean_squared_error\"\n",
|
|
||||||
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
|
||||||
"print(best_run)\n",
|
|
||||||
"print(fitted_model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"iteration = 3\n",
|
|
||||||
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
|
||||||
"print(third_run)\n",
|
|
||||||
"print(third_model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Register the Fitted Model for Deployment\n",
|
|
||||||
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"description = 'AutoML Model'\n",
|
|
||||||
"tags = None\n",
|
|
||||||
"model = remote_run.register_model(description = description, tags = tags)\n",
|
|
||||||
"\n",
|
|
||||||
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Create Scoring Script\n",
|
|
||||||
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%%writefile score.py\n",
|
|
||||||
"import pickle\n",
|
|
||||||
"import json\n",
|
|
||||||
"import numpy\n",
|
|
||||||
"import azureml.train.automl\n",
|
|
||||||
"from sklearn.externals import joblib\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"\n",
|
|
||||||
"def init():\n",
|
|
||||||
" global model\n",
|
|
||||||
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
|
||||||
" # deserialize the model file back into a sklearn model\n",
|
|
||||||
" model = joblib.load(model_path)\n",
|
|
||||||
"\n",
|
|
||||||
"def run(rawdata):\n",
|
|
||||||
" try:\n",
|
|
||||||
" data = json.loads(rawdata)['data']\n",
|
|
||||||
" data = numpy.array(data)\n",
|
|
||||||
" result = model.predict(data)\n",
|
|
||||||
" except Exception as e:\n",
|
|
||||||
" result = str(e)\n",
|
|
||||||
" return json.dumps({\"error\": result})\n",
|
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Create a YAML File for the Environment"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"for p in ['azureml-train-automl', 'azureml-core']:\n",
|
|
||||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-defaults','azureml-train-automl'])\n",
|
|
||||||
"\n",
|
|
||||||
"conda_env_file_name = 'myenv.yml'\n",
|
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Substitute the actual version number in the environment file.\n",
|
|
||||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
|
||||||
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
|
||||||
"\n",
|
|
||||||
"with open(conda_env_file_name, 'r') as cefr:\n",
|
|
||||||
" content = cefr.read()\n",
|
|
||||||
"\n",
|
|
||||||
"with open(conda_env_file_name, 'w') as cefw:\n",
|
|
||||||
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
|
|
||||||
"\n",
|
|
||||||
"# Substitute the actual model id in the script file.\n",
|
|
||||||
"\n",
|
|
||||||
"script_file_name = 'score.py'\n",
|
|
||||||
"\n",
|
|
||||||
"with open(script_file_name, 'r') as cefr:\n",
|
|
||||||
" content = cefr.read()\n",
|
|
||||||
"\n",
|
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
|
||||||
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Deploy the model as a Web Service on Azure Container Instance"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
|
||||||
"from azureml.core.webservice import Webservice\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"\n",
|
|
||||||
"inference_config = InferenceConfig(runtime = \"python\", \n",
|
|
||||||
" entry_script = script_file_name,\n",
|
|
||||||
" conda_file = conda_env_file_name)\n",
|
|
||||||
"\n",
|
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
|
||||||
" memory_gb = 1, \n",
|
|
||||||
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
|
||||||
" description = 'sample service for Automl Regression')\n",
|
|
||||||
"\n",
|
|
||||||
"aci_service_name = 'automl-sample-concrete'\n",
|
|
||||||
"print(aci_service_name)\n",
|
|
||||||
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
|
||||||
"print(aci_service.state)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Delete a Web Service\n",
|
|
||||||
"\n",
|
|
||||||
"Deletes the specified web service."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#aci_service.delete()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Get Logs from a Deployed Web Service\n",
|
|
||||||
"\n",
|
|
||||||
"Gets logs from a deployed web service."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#aci_service.get_logs()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Test\n",
|
|
||||||
"\n",
|
|
||||||
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"X_test = X_test.to_pandas_dataframe()\n",
|
|
||||||
"y_test = y_test.to_pandas_dataframe()\n",
|
|
||||||
"y_test = np.array(y_test)\n",
|
|
||||||
"y_test = y_test[:,0]\n",
|
|
||||||
"X_train = X_train.to_pandas_dataframe()\n",
|
|
||||||
"y_train = y_train.to_pandas_dataframe()\n",
|
|
||||||
"y_train = np.array(y_train)\n",
|
|
||||||
"y_train = y_train[:,0]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"##### Predict on training and test set, and calculate residual values."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"y_pred_train = fitted_model.predict(X_train)\n",
|
|
||||||
"y_residual_train = y_train - y_pred_train\n",
|
|
||||||
"\n",
|
|
||||||
"y_pred_test = fitted_model.predict(X_test)\n",
|
|
||||||
"y_residual_test = y_test - y_pred_test\n",
|
|
||||||
"\n",
|
|
||||||
"y_residual_train.shape"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%matplotlib inline\n",
|
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
|
||||||
"\n",
|
|
||||||
"# Set up a multi-plot chart.\n",
|
|
||||||
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
|
||||||
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
|
||||||
"f.set_figheight(6)\n",
|
|
||||||
"f.set_figwidth(16)\n",
|
|
||||||
"\n",
|
|
||||||
"# Plot residual values of training set.\n",
|
|
||||||
"a0.axis([0, 360, -200, 200])\n",
|
|
||||||
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
|
||||||
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
|
||||||
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
|
||||||
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
|
||||||
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
|
||||||
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
|
||||||
"\n",
|
|
||||||
"# Plot a histogram.\n",
|
|
||||||
"#a0.hist(y_residual_train, orientation = 'horizontal', color = ['b']*len(y_residual_train), bins = 10, histtype = 'step')\n",
|
|
||||||
"#a0.hist(y_residual_train, orientation = 'horizontal', color = ['b']*len(y_residual_train), alpha = 0.2, bins = 10)\n",
|
|
||||||
"\n",
|
|
||||||
"# Plot residual values of test set.\n",
|
|
||||||
"a1.axis([0, 90, -200, 200])\n",
|
|
||||||
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
|
||||||
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
|
||||||
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
|
||||||
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
|
||||||
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
|
||||||
"a1.set_yticklabels([])\n",
|
|
||||||
"\n",
|
|
||||||
"# Plot a histogram.\n",
|
|
||||||
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), bins = 10, histtype = 'step')\n",
|
|
||||||
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), alpha = 0.2, bins = 10)\n",
|
|
||||||
"\n",
|
|
||||||
"plt.show()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Calculate metrics for the prediction\n",
|
|
||||||
"\n",
|
|
||||||
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
|
||||||
"from the trained model that was returned."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Plot outputs\n",
|
|
||||||
"%matplotlib notebook\n",
|
|
||||||
"test_pred = plt.scatter(y_test, y_pred_test, color='b')\n",
|
|
||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
|
||||||
"plt.show()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Acknowledgements\n",
|
|
||||||
"\n",
|
|
||||||
"This Predicting Compressive Strength of Concrete Dataset is made available under the CC0 1.0 Universal (CC0 1.0)\n",
|
|
||||||
"Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0)\n",
|
|
||||||
"Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set and http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength\n",
|
|
||||||
"\n",
|
|
||||||
"I-Cheng Yeh, \"Modeling of strength of high performance concrete using artificial neural networks,\" Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). \n",
|
|
||||||
"\n",
|
|
||||||
"Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science."
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "v-rasav"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"category": "tutorial",
|
|
||||||
"compute": [
|
|
||||||
"AML Compute"
|
|
||||||
],
|
|
||||||
"datasets": [
|
|
||||||
"Concrete"
|
|
||||||
],
|
|
||||||
"deployment": [
|
|
||||||
"Azure Container Instance"
|
|
||||||
],
|
|
||||||
"exclude_from_index": false,
|
|
||||||
"framework": [
|
|
||||||
"Azure ML AutoML"
|
|
||||||
],
|
|
||||||
"friendly_name": "Regression with deployment using concrete dataset",
|
|
||||||
"index_order": 1,
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.7.1"
|
|
||||||
},
|
|
||||||
"tags": [
|
|
||||||
""
|
|
||||||
],
|
|
||||||
"task": "Regression"
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,12 +0,0 @@
|
|||||||
name: auto-ml-regression-concrete-strength
|
|
||||||
dependencies:
|
|
||||||
- pip:
|
|
||||||
- azureml-sdk
|
|
||||||
- interpret
|
|
||||||
- azureml-defaults
|
|
||||||
- azureml-explain-model
|
|
||||||
- azureml-train-automl
|
|
||||||
- azureml-widgets
|
|
||||||
- matplotlib
|
|
||||||
- pandas_ml
|
|
||||||
- azureml-dataprep[pandas]
|
|
||||||
@@ -206,9 +206,9 @@
|
|||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|classification, regression or forecasting|\n",
|
"|**task**|classification, regression or forecasting|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
"|**experiment_timeout_minutes**| Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.|\n",
|
"|**experiment_timeout_hours**| Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n",
|
||||||
"|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
|
"|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
|
||||||
"|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Note: If the input data is sparse, featurization cannot be turned on.|\n",
|
"|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*. Note: If the input data is sparse, featurization cannot be turned on.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|"
|
"|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|"
|
||||||
@@ -244,7 +244,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"featurization_config = FeaturizationConfig()\n",
|
"featurization_config = FeaturizationConfig()\n",
|
||||||
"featurization_config.blocked_transformers = ['LabelEncoder']\n",
|
"featurization_config.blocked_transformers = ['LabelEncoder']\n",
|
||||||
"#featurization_config.drop_columns = ['ERP', 'MMIN']\n",
|
"#featurization_config.drop_columns = ['MMIN']\n",
|
||||||
"featurization_config.add_column_purpose('MYCT', 'Numeric')\n",
|
"featurization_config.add_column_purpose('MYCT', 'Numeric')\n",
|
||||||
"featurization_config.add_column_purpose('VendorName', 'CategoricalHash')\n",
|
"featurization_config.add_column_purpose('VendorName', 'CategoricalHash')\n",
|
||||||
"#default strategy mean, add transformer param for for 3 columns\n",
|
"#default strategy mean, add transformer param for for 3 columns\n",
|
||||||
@@ -262,7 +262,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"enable_early_stopping\": True, \n",
|
" \"enable_early_stopping\": True, \n",
|
||||||
" \"experiment_timeout_minutes\" : 10,\n",
|
" \"experiment_timeout_hours\" : 0.2,\n",
|
||||||
" \"max_concurrent_iterations\": 4,\n",
|
" \"max_concurrent_iterations\": 4,\n",
|
||||||
" \"max_cores_per_iteration\": -1,\n",
|
" \"max_cores_per_iteration\": -1,\n",
|
||||||
" \"n_cross_validations\": 5,\n",
|
" \"n_cross_validations\": 5,\n",
|
||||||
@@ -558,7 +558,6 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# specify CondaDependencies obj\n",
|
"# specify CondaDependencies obj\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n",
|
"conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n",
|
||||||
" conda_packages=['scikit-learn', 'numpy','py-xgboost<=0.80'],\n",
|
|
||||||
" pip_packages=azureml_pip_packages)"
|
" pip_packages=azureml_pip_packages)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -634,7 +633,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n",
|
"from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n",
|
||||||
"explainer_setup_class = automl_setup_model_explanations(fitted_model, 'regression', X_test=X_test)"
|
"explainer_setup_class = automl_setup_model_explanations(fitted_model, 'regression', X_test=X_test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -653,11 +652,11 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.explain.model._internal.explanation_client import ExplanationClient\n",
|
"from azureml.explain.model._internal.explanation_client import ExplanationClient\n",
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard\n",
|
"from interpret_community.widget import ExplanationDashboard\n",
|
||||||
"client = ExplanationClient.from_run(automl_run)\n",
|
"client = ExplanationClient.from_run(automl_run)\n",
|
||||||
"engineered_explanations = client.download_model_explanation(raw=False)\n",
|
"engineered_explanations = client.download_model_explanation(raw=False)\n",
|
||||||
"print(engineered_explanations.get_feature_importance_dict())\n",
|
"print(engineered_explanations.get_feature_importance_dict())\n",
|
||||||
"ExplanationDashboard(engineered_explanations, explainer_setup_class.automl_estimator, explainer_setup_class.X_test_transform)"
|
"ExplanationDashboard(engineered_explanations, explainer_setup_class.automl_estimator, datasetX=explainer_setup_class.X_test_transform)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -676,7 +675,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"raw_explanations = client.download_model_explanation(raw=True)\n",
|
"raw_explanations = client.download_model_explanation(raw=True)\n",
|
||||||
"print(raw_explanations.get_feature_importance_dict())\n",
|
"print(raw_explanations.get_feature_importance_dict())\n",
|
||||||
"ExplanationDashboard(raw_explanations, explainer_setup_class.automl_pipeline, explainer_setup_class.X_test_raw)"
|
"ExplanationDashboard(raw_explanations, explainer_setup_class.automl_pipeline, datasetX=explainer_setup_class.X_test_raw)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -718,20 +717,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"conda_dep = automl_run.get_environment().python.conda_dependencies\n",
|
||||||
"\n",
|
|
||||||
"azureml_pip_packages = [\n",
|
|
||||||
" 'azureml-explain-model', 'azureml-train-automl', 'azureml-defaults'\n",
|
|
||||||
"]\n",
|
|
||||||
" \n",
|
|
||||||
"\n",
|
|
||||||
"# specify CondaDependencies obj\n",
|
|
||||||
"myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas', 'numpy', 'py-xgboost<=0.80'],\n",
|
|
||||||
" pip_packages=azureml_pip_packages,\n",
|
|
||||||
" pin_sdk_version=True)\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())\n",
|
" f.write(conda_dep.serialize_to_string())\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"r\") as f:\n",
|
"with open(\"myenv.yml\",\"r\") as f:\n",
|
||||||
" print(f.read())"
|
" print(f.read())"
|
||||||
@@ -772,6 +761,7 @@
|
|||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||||
" memory_gb=1, \n",
|
" memory_gb=1, \n",
|
||||||
@@ -779,9 +769,8 @@
|
|||||||
" \"method\" : \"local_explanation\"}, \n",
|
" \"method\" : \"local_explanation\"}, \n",
|
||||||
" description='Get local explanations for Machine test data')\n",
|
" description='Get local explanations for Machine test data')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" entry_script=\"score_explain.py\",\n",
|
"inference_config = InferenceConfig(entry_script=\"score_explain.py\", environment=myenv)\n",
|
||||||
" conda_file=\"myenv.yml\")\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# Use configs and models generated above\n",
|
"# Use configs and models generated above\n",
|
||||||
"service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
|
"service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
|
||||||
|
|||||||
@@ -2,12 +2,10 @@ name: auto-ml-regression-hardware-performance-explanation-and-featurization
|
|||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- interpret
|
|
||||||
- azureml-defaults
|
|
||||||
- azureml-explain-model
|
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- interpret
|
||||||
|
- azureml-explain-model
|
||||||
- azureml-explain-model
|
- azureml-explain-model
|
||||||
- azureml-contrib-interpret
|
- azureml-contrib-interpret
|
||||||
|
|||||||
@@ -5,7 +5,8 @@ import os
|
|||||||
import pickle
|
import pickle
|
||||||
import azureml.train.automl
|
import azureml.train.automl
|
||||||
import azureml.explain.model
|
import azureml.explain.model
|
||||||
from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations
|
from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \
|
||||||
|
automl_setup_model_explanations
|
||||||
from sklearn.externals import joblib
|
from sklearn.externals import joblib
|
||||||
from azureml.core.model import Model
|
from azureml.core.model import Model
|
||||||
|
|
||||||
|
|||||||
@@ -6,7 +6,8 @@ from azureml.core.run import Run
|
|||||||
from azureml.core.experiment import Experiment
|
from azureml.core.experiment import Experiment
|
||||||
from sklearn.externals import joblib
|
from sklearn.externals import joblib
|
||||||
from azureml.core.dataset import Dataset
|
from azureml.core.dataset import Dataset
|
||||||
from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations
|
from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \
|
||||||
|
automl_setup_model_explanations, automl_check_model_if_explainable
|
||||||
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
|
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
|
||||||
from azureml.explain.model.mimic_wrapper import MimicWrapper
|
from azureml.explain.model.mimic_wrapper import MimicWrapper
|
||||||
from automl.client.core.common.constants import MODEL_PATH
|
from automl.client.core.common.constants import MODEL_PATH
|
||||||
@@ -24,6 +25,11 @@ ws = run.experiment.workspace
|
|||||||
experiment = Experiment(ws, '<<experimnet_name>>')
|
experiment = Experiment(ws, '<<experimnet_name>>')
|
||||||
automl_run = Run(experiment=experiment, run_id='<<run_id>>')
|
automl_run = Run(experiment=experiment, run_id='<<run_id>>')
|
||||||
|
|
||||||
|
# Check if this AutoML model is explainable
|
||||||
|
if not automl_check_model_if_explainable(automl_run):
|
||||||
|
raise Exception("Model explanations is currently not supported for " + automl_run.get_properties().get(
|
||||||
|
'run_algorithm'))
|
||||||
|
|
||||||
# Download the best model from the artifact store
|
# Download the best model from the artifact store
|
||||||
automl_run.download_file(name=MODEL_PATH, output_file_path='model.pkl')
|
automl_run.download_file(name=MODEL_PATH, output_file_path='model.pkl')
|
||||||
|
|
||||||
|
|||||||
@@ -1,761 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Automated Machine Learning\n",
|
|
||||||
"_**Regression with Deployment using Hardware Performance Dataset**_\n",
|
|
||||||
"\n",
|
|
||||||
"## Contents\n",
|
|
||||||
"1. [Introduction](#Introduction)\n",
|
|
||||||
"1. [Setup](#Setup)\n",
|
|
||||||
"1. [Data](#Data)\n",
|
|
||||||
"1. [Train](#Train)\n",
|
|
||||||
"1. [Results](#Results)\n",
|
|
||||||
"1. [Test](#Test)\n",
|
|
||||||
"1. [Acknowledgements](#Acknowledgements)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Introduction\n",
|
|
||||||
"In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n",
|
|
||||||
"\n",
|
|
||||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
|
||||||
"\n",
|
|
||||||
"In this notebook you will learn how to:\n",
|
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
|
||||||
"3. Train the model using local compute.\n",
|
|
||||||
"4. Explore the results.\n",
|
|
||||||
"5. Test the best fitted model."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Setup\n",
|
|
||||||
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import logging\n",
|
|
||||||
"\n",
|
|
||||||
"from matplotlib import pyplot as plt\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"import os\n",
|
|
||||||
" \n",
|
|
||||||
"\n",
|
|
||||||
"import azureml.core\n",
|
|
||||||
"from azureml.core.experiment import Experiment\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
|
||||||
"from azureml.core.dataset import Dataset\n",
|
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ws = Workspace.from_config()\n",
|
|
||||||
"\n",
|
|
||||||
"# Choose a name for the experiment.\n",
|
|
||||||
"experiment_name = 'automl-regression-hardware'\n",
|
|
||||||
"\n",
|
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
|
||||||
"\n",
|
|
||||||
"output = {}\n",
|
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
|
||||||
"output['Workspace Name'] = ws.name\n",
|
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
|
||||||
"output['Location'] = ws.location\n",
|
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
|
||||||
"outputDf.T"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Create or Attach existing AmlCompute\n",
|
|
||||||
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
|
||||||
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
|
||||||
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
|
||||||
"\n",
|
|
||||||
"# Choose a name for your cluster.\n",
|
|
||||||
"amlcompute_cluster_name = \"automlcl\"\n",
|
|
||||||
"\n",
|
|
||||||
"found = False\n",
|
|
||||||
"# Check if this compute target already exists in the workspace.\n",
|
|
||||||
"cts = ws.compute_targets\n",
|
|
||||||
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
|
||||||
" found = True\n",
|
|
||||||
" print('Found existing compute target.')\n",
|
|
||||||
" compute_target = cts[amlcompute_cluster_name]\n",
|
|
||||||
" \n",
|
|
||||||
"if not found:\n",
|
|
||||||
" print('Creating a new compute target...')\n",
|
|
||||||
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
|
||||||
" #vm_priority = 'lowpriority', # optional\n",
|
|
||||||
" max_nodes = 6)\n",
|
|
||||||
"\n",
|
|
||||||
" # Create the cluster.\n",
|
|
||||||
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
|
||||||
" \n",
|
|
||||||
"print('Checking cluster status...')\n",
|
|
||||||
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
|
|
||||||
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
|
||||||
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
|
||||||
" \n",
|
|
||||||
"# For a more detailed view of current AmlCompute status, use get_status()."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Data\n",
|
|
||||||
"\n",
|
|
||||||
"Create a run configuration for the remote run."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
|
||||||
"import pkg_resources\n",
|
|
||||||
"\n",
|
|
||||||
"# create a new RunConfig object\n",
|
|
||||||
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
|
||||||
"\n",
|
|
||||||
"# Set compute target to AmlCompute\n",
|
|
||||||
"conda_run_config.target = compute_target\n",
|
|
||||||
"conda_run_config.environment.docker.enabled = True\n",
|
|
||||||
"\n",
|
|
||||||
"cd = CondaDependencies.create(conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
|
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Load Data\n",
|
|
||||||
"\n",
|
|
||||||
"Load the hardware performance dataset into X and y. X contains the training features, which are inputs to the model. y contains the training labels, which are the expected output of the model."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
|
|
||||||
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
|
|
||||||
"X = dataset.drop_columns(columns=['ERP'])\n",
|
|
||||||
"y = dataset.keep_columns(columns=['ERP'], validate=True)\n",
|
|
||||||
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
|
||||||
"y_train, y_test = y.random_split(percentage=0.8, seed=223)\n",
|
|
||||||
"dataset.take(5).to_pandas_dataframe()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"\n",
|
|
||||||
"## Train\n",
|
|
||||||
"\n",
|
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
|
||||||
"\n",
|
|
||||||
"|Property|Description|\n",
|
|
||||||
"|-|-|\n",
|
|
||||||
"|**task**|classification or regression|\n",
|
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
|
||||||
"\n",
|
|
||||||
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"automl_settings = {\n",
|
|
||||||
" \"iteration_timeout_minutes\": 5,\n",
|
|
||||||
" \"iterations\": 10,\n",
|
|
||||||
" \"n_cross_validations\": 5,\n",
|
|
||||||
" \"primary_metric\": 'spearman_correlation',\n",
|
|
||||||
" \"preprocess\": True,\n",
|
|
||||||
" \"max_concurrent_iterations\": 5,\n",
|
|
||||||
" \"verbosity\": logging.INFO,\n",
|
|
||||||
"}\n",
|
|
||||||
"\n",
|
|
||||||
"automl_config = AutoMLConfig(task = 'regression',\n",
|
|
||||||
" debug_log = 'automl_errors.log',\n",
|
|
||||||
" run_configuration=conda_run_config,\n",
|
|
||||||
" X = X_train,\n",
|
|
||||||
" y = y_train,\n",
|
|
||||||
" **automl_settings\n",
|
|
||||||
" )"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"remote_run = experiment.submit(automl_config, show_output = False)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"remote_run"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Results"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Widget for Monitoring Runs\n",
|
|
||||||
"\n",
|
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
|
||||||
"\n",
|
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.widgets import RunDetails\n",
|
|
||||||
"RunDetails(remote_run).show() "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Wait until the run finishes.\n",
|
|
||||||
"remote_run.wait_for_completion(show_output = True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Retrieve All Child Runs\n",
|
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"children = list(remote_run.get_children())\n",
|
|
||||||
"metricslist = {}\n",
|
|
||||||
"for run in children:\n",
|
|
||||||
" properties = run.get_properties()\n",
|
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
|
||||||
"\n",
|
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
|
||||||
"rundata"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Retrieve the Best Model\n",
|
|
||||||
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
|
||||||
"print(best_run)\n",
|
|
||||||
"print(fitted_model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
|
||||||
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"lookup_metric = \"root_mean_squared_error\"\n",
|
|
||||||
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
|
||||||
"print(best_run)\n",
|
|
||||||
"print(fitted_model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"iteration = 3\n",
|
|
||||||
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
|
||||||
"print(third_run)\n",
|
|
||||||
"print(third_model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Register the Fitted Model for Deployment\n",
|
|
||||||
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"description = 'AutoML Model'\n",
|
|
||||||
"tags = None\n",
|
|
||||||
"model = remote_run.register_model(description = description, tags = tags)\n",
|
|
||||||
"\n",
|
|
||||||
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Create Scoring Script\n",
|
|
||||||
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%%writefile score.py\n",
|
|
||||||
"import pickle\n",
|
|
||||||
"import json\n",
|
|
||||||
"import numpy\n",
|
|
||||||
"import azureml.train.automl\n",
|
|
||||||
"from sklearn.externals import joblib\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"\n",
|
|
||||||
"def init():\n",
|
|
||||||
" global model\n",
|
|
||||||
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
|
||||||
" # deserialize the model file back into a sklearn model\n",
|
|
||||||
" model = joblib.load(model_path)\n",
|
|
||||||
"\n",
|
|
||||||
"def run(rawdata):\n",
|
|
||||||
" try:\n",
|
|
||||||
" data = json.loads(rawdata)['data']\n",
|
|
||||||
" data = numpy.array(data)\n",
|
|
||||||
" result = model.predict(data)\n",
|
|
||||||
" except Exception as e:\n",
|
|
||||||
" result = str(e)\n",
|
|
||||||
" return json.dumps({\"error\": result})\n",
|
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Create a YAML File for the Environment"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"for p in ['azureml-train-automl', 'azureml-core']:\n",
|
|
||||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-defaults','azureml-train-automl'])\n",
|
|
||||||
"\n",
|
|
||||||
"conda_env_file_name = 'myenv.yml'\n",
|
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Substitute the actual version number in the environment file.\n",
|
|
||||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
|
||||||
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
|
||||||
"\n",
|
|
||||||
"with open(conda_env_file_name, 'r') as cefr:\n",
|
|
||||||
" content = cefr.read()\n",
|
|
||||||
"\n",
|
|
||||||
"with open(conda_env_file_name, 'w') as cefw:\n",
|
|
||||||
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
|
|
||||||
"\n",
|
|
||||||
"# Substitute the actual model id in the script file.\n",
|
|
||||||
"\n",
|
|
||||||
"script_file_name = 'score.py'\n",
|
|
||||||
"\n",
|
|
||||||
"with open(script_file_name, 'r') as cefr:\n",
|
|
||||||
" content = cefr.read()\n",
|
|
||||||
"\n",
|
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
|
||||||
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Deploy the model as a Web Service on Azure Container Instance"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
|
||||||
"from azureml.core.webservice import Webservice\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"\n",
|
|
||||||
"inference_config = InferenceConfig(runtime = \"python\", \n",
|
|
||||||
" entry_script = script_file_name,\n",
|
|
||||||
" conda_file = conda_env_file_name)\n",
|
|
||||||
"\n",
|
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
|
||||||
" memory_gb = 1, \n",
|
|
||||||
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
|
||||||
" description = 'sample service for Automl Regression')\n",
|
|
||||||
"\n",
|
|
||||||
"aci_service_name = 'automl-sample-hardware'\n",
|
|
||||||
"print(aci_service_name)\n",
|
|
||||||
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
|
||||||
"print(aci_service.state)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Delete a Web Service\n",
|
|
||||||
"\n",
|
|
||||||
"Deletes the specified web service."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#aci_service.delete()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Get Logs from a Deployed Web Service\n",
|
|
||||||
"\n",
|
|
||||||
"Gets logs from a deployed web service."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#aci_service.get_logs()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Test\n",
|
|
||||||
"\n",
|
|
||||||
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"X_test = X_test.to_pandas_dataframe()\n",
|
|
||||||
"y_test = y_test.to_pandas_dataframe()\n",
|
|
||||||
"y_test = np.array(y_test)\n",
|
|
||||||
"y_test = y_test[:,0]\n",
|
|
||||||
"X_train = X_train.to_pandas_dataframe()\n",
|
|
||||||
"y_train = y_train.to_pandas_dataframe()\n",
|
|
||||||
"y_train = np.array(y_train)\n",
|
|
||||||
"y_train = y_train[:,0]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"##### Predict on training and test set, and calculate residual values."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"y_pred_train = fitted_model.predict(X_train)\n",
|
|
||||||
"y_residual_train = y_train - y_pred_train\n",
|
|
||||||
"\n",
|
|
||||||
"y_pred_test = fitted_model.predict(X_test)\n",
|
|
||||||
"y_residual_test = y_test - y_pred_test"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Calculate metrics for the prediction\n",
|
|
||||||
"\n",
|
|
||||||
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
|
||||||
"from the trained model that was returned."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%matplotlib inline\n",
|
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
|
||||||
"\n",
|
|
||||||
"# Set up a multi-plot chart.\n",
|
|
||||||
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
|
||||||
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
|
||||||
"f.set_figheight(6)\n",
|
|
||||||
"f.set_figwidth(16)\n",
|
|
||||||
"\n",
|
|
||||||
"# Plot residual values of training set.\n",
|
|
||||||
"a0.axis([0, 360, -200, 200])\n",
|
|
||||||
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
|
||||||
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
|
||||||
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
|
||||||
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)),fontsize = 12)\n",
|
|
||||||
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
|
||||||
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
|
||||||
"\n",
|
|
||||||
"# Plot residual values of test set.\n",
|
|
||||||
"a1.axis([0, 90, -200, 200])\n",
|
|
||||||
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
|
||||||
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
|
||||||
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
|
||||||
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)),fontsize = 12)\n",
|
|
||||||
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
|
||||||
"a1.set_yticklabels([])\n",
|
|
||||||
"\n",
|
|
||||||
"plt.show()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%matplotlib notebook\n",
|
|
||||||
"test_pred = plt.scatter(y_test, y_pred_test, color='')\n",
|
|
||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
|
||||||
"plt.show()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Acknowledgements\n",
|
|
||||||
"This Predicting Hardware Performance Dataset is made available under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/faizunnabi/comp-hardware-performance and https://archive.ics.uci.edu/ml/datasets/Computer+Hardware\n",
|
|
||||||
"\n",
|
|
||||||
"_**Citation Found Here**_\n"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "v-rasav"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"category": "tutorial",
|
|
||||||
"compute": [
|
|
||||||
"AML Compute"
|
|
||||||
],
|
|
||||||
"datasets": [
|
|
||||||
"Concrete"
|
|
||||||
],
|
|
||||||
"deployment": [
|
|
||||||
"Azure Container Instance"
|
|
||||||
],
|
|
||||||
"exclude_from_index": false,
|
|
||||||
"framework": [
|
|
||||||
"Azure ML AutoML"
|
|
||||||
],
|
|
||||||
"friendly_name": "Regression with deployment using hardware performance dataset",
|
|
||||||
"index_order": 1,
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.7.1"
|
|
||||||
},
|
|
||||||
"star_tag": [
|
|
||||||
"featured"
|
|
||||||
],
|
|
||||||
"tags": [
|
|
||||||
""
|
|
||||||
],
|
|
||||||
"task": "Regression"
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,12 +0,0 @@
|
|||||||
name: auto-ml-regression-hardware-performance
|
|
||||||
dependencies:
|
|
||||||
- pip:
|
|
||||||
- azureml-sdk
|
|
||||||
- interpret
|
|
||||||
- azureml-defaults
|
|
||||||
- azureml-explain-model
|
|
||||||
- azureml-train-automl
|
|
||||||
- azureml-widgets
|
|
||||||
- matplotlib
|
|
||||||
- pandas_ml
|
|
||||||
- azureml-dataprep[pandas]
|
|
||||||
@@ -188,15 +188,18 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"automlconfig-remarks-sample"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"n_cross_validations\": 3,\n",
|
" \"n_cross_validations\": 3,\n",
|
||||||
" \"primary_metric\": 'r2_score',\n",
|
" \"primary_metric\": 'r2_score',\n",
|
||||||
" \"preprocess\": True,\n",
|
|
||||||
" \"enable_early_stopping\": True, \n",
|
" \"enable_early_stopping\": True, \n",
|
||||||
" \"experiment_timeout_minutes\": 20, #for real scenarios we reccommend a timeout of at least one hour \n",
|
" \"experiment_timeout_hours\": 0.3, #for real scenarios we reccommend a timeout of at least one hour \n",
|
||||||
" \"max_concurrent_iterations\": 4,\n",
|
" \"max_concurrent_iterations\": 4,\n",
|
||||||
" \"max_cores_per_iteration\": -1,\n",
|
" \"max_cores_per_iteration\": -1,\n",
|
||||||
" \"verbosity\": logging.INFO,\n",
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
|||||||
@@ -5,5 +5,3 @@ dependencies:
|
|||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
|
||||||
- paramiko<2.5.0
|
|
||||||
|
|||||||
@@ -140,6 +140,9 @@
|
|||||||
"framework": [
|
"framework": [
|
||||||
"Azure ML AutoML"
|
"Azure ML AutoML"
|
||||||
],
|
],
|
||||||
|
"tags": [
|
||||||
|
""
|
||||||
|
],
|
||||||
"friendly_name": "Forecasting with automated ML SQL integration",
|
"friendly_name": "Forecasting with automated ML SQL integration",
|
||||||
"index_order": 1,
|
"index_order": 1,
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
@@ -151,9 +154,6 @@
|
|||||||
"name": "sql",
|
"name": "sql",
|
||||||
"version": ""
|
"version": ""
|
||||||
},
|
},
|
||||||
"tags": [
|
|
||||||
""
|
|
||||||
],
|
|
||||||
"task": "Forecasting"
|
"task": "Forecasting"
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -56,7 +56,7 @@ CREATE OR ALTER PROCEDURE [dbo].[AutoMLTrain]
|
|||||||
@task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.
|
@task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.
|
||||||
@experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.
|
@experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.
|
||||||
@iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline.
|
@iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline.
|
||||||
@experiment_timeout_minutes INT = 60, -- The maximum time in minutes for training all pipelines.
|
@experiment_timeout_hours FLOAT = 1, -- The maximum time in hours for training all pipelines.
|
||||||
@n_cross_validations INT = 3, -- The number of cross validations.
|
@n_cross_validations INT = 3, -- The number of cross validations.
|
||||||
@blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.
|
@blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.
|
||||||
-- The list of possible models can be found at:
|
-- The list of possible models can be found at:
|
||||||
@@ -131,8 +131,8 @@ if __name__.startswith("sqlindb"):
|
|||||||
|
|
||||||
X_train = data_train
|
X_train = data_train
|
||||||
|
|
||||||
if experiment_timeout_minutes == 0:
|
if experiment_timeout_hours == 0:
|
||||||
experiment_timeout_minutes = None
|
experiment_timeout_hours = None
|
||||||
|
|
||||||
if experiment_exit_score == 0:
|
if experiment_exit_score == 0:
|
||||||
experiment_exit_score = None
|
experiment_exit_score = None
|
||||||
@@ -163,7 +163,7 @@ if __name__.startswith("sqlindb"):
|
|||||||
debug_log = log_file_name,
|
debug_log = log_file_name,
|
||||||
primary_metric = primary_metric,
|
primary_metric = primary_metric,
|
||||||
iteration_timeout_minutes = iteration_timeout_minutes,
|
iteration_timeout_minutes = iteration_timeout_minutes,
|
||||||
experiment_timeout_minutes = experiment_timeout_minutes,
|
experiment_timeout_hours = experiment_timeout_hours,
|
||||||
iterations = iterations,
|
iterations = iterations,
|
||||||
n_cross_validations = n_cross_validations,
|
n_cross_validations = n_cross_validations,
|
||||||
preprocess = preprocess,
|
preprocess = preprocess,
|
||||||
@@ -204,7 +204,7 @@ if __name__.startswith("sqlindb"):
|
|||||||
@iterations INT, @task NVARCHAR(40),
|
@iterations INT, @task NVARCHAR(40),
|
||||||
@experiment_name NVARCHAR(32),
|
@experiment_name NVARCHAR(32),
|
||||||
@iteration_timeout_minutes INT,
|
@iteration_timeout_minutes INT,
|
||||||
@experiment_timeout_minutes INT,
|
@experiment_timeout_hours FLOAT,
|
||||||
@n_cross_validations INT,
|
@n_cross_validations INT,
|
||||||
@blacklist_models NVARCHAR(MAX),
|
@blacklist_models NVARCHAR(MAX),
|
||||||
@whitelist_models NVARCHAR(MAX),
|
@whitelist_models NVARCHAR(MAX),
|
||||||
@@ -223,7 +223,7 @@ if __name__.startswith("sqlindb"):
|
|||||||
, @task = @task
|
, @task = @task
|
||||||
, @experiment_name = @experiment_name
|
, @experiment_name = @experiment_name
|
||||||
, @iteration_timeout_minutes = @iteration_timeout_minutes
|
, @iteration_timeout_minutes = @iteration_timeout_minutes
|
||||||
, @experiment_timeout_minutes = @experiment_timeout_minutes
|
, @experiment_timeout_hours = @experiment_timeout_hours
|
||||||
, @n_cross_validations = @n_cross_validations
|
, @n_cross_validations = @n_cross_validations
|
||||||
, @blacklist_models = @blacklist_models
|
, @blacklist_models = @blacklist_models
|
||||||
, @whitelist_models = @whitelist_models
|
, @whitelist_models = @whitelist_models
|
||||||
|
|||||||
@@ -235,7 +235,7 @@
|
|||||||
" @task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.\r\n",
|
" @task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.\r\n",
|
||||||
" @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.\r\n",
|
" @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.\r\n",
|
||||||
" @iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline. \r\n",
|
" @iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline. \r\n",
|
||||||
" @experiment_timeout_minutes INT = 60, -- The maximum time in minutes for training all pipelines.\r\n",
|
" @experiment_timeout_hours FLOAT = 1, -- The maximum time in hours for training all pipelines.\r\n",
|
||||||
" @n_cross_validations INT = 3, -- The number of cross validations.\r\n",
|
" @n_cross_validations INT = 3, -- The number of cross validations.\r\n",
|
||||||
" @blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.\r\n",
|
" @blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.\r\n",
|
||||||
" -- The list of possible models can be found at:\r\n",
|
" -- The list of possible models can be found at:\r\n",
|
||||||
@@ -307,8 +307,8 @@
|
|||||||
"\r\n",
|
"\r\n",
|
||||||
" X_train = data_train\r\n",
|
" X_train = data_train\r\n",
|
||||||
"\r\n",
|
"\r\n",
|
||||||
" if experiment_timeout_minutes == 0:\r\n",
|
" if experiment_timeout_hours == 0:\r\n",
|
||||||
" experiment_timeout_minutes = None\r\n",
|
" experiment_timeout_hours = None\r\n",
|
||||||
"\r\n",
|
"\r\n",
|
||||||
" if experiment_exit_score == 0:\r\n",
|
" if experiment_exit_score == 0:\r\n",
|
||||||
" experiment_exit_score = None\r\n",
|
" experiment_exit_score = None\r\n",
|
||||||
@@ -337,7 +337,7 @@
|
|||||||
" debug_log = log_file_name, \r\n",
|
" debug_log = log_file_name, \r\n",
|
||||||
" primary_metric = primary_metric, \r\n",
|
" primary_metric = primary_metric, \r\n",
|
||||||
" iteration_timeout_minutes = iteration_timeout_minutes, \r\n",
|
" iteration_timeout_minutes = iteration_timeout_minutes, \r\n",
|
||||||
" experiment_timeout_minutes = experiment_timeout_minutes,\r\n",
|
" experiment_timeout_hours = experiment_timeout_hours,\r\n",
|
||||||
" iterations = iterations, \r\n",
|
" iterations = iterations, \r\n",
|
||||||
" n_cross_validations = n_cross_validations, \r\n",
|
" n_cross_validations = n_cross_validations, \r\n",
|
||||||
" preprocess = preprocess,\r\n",
|
" preprocess = preprocess,\r\n",
|
||||||
@@ -378,7 +378,7 @@
|
|||||||
"\t\t\t\t @iterations INT, @task NVARCHAR(40),\r\n",
|
"\t\t\t\t @iterations INT, @task NVARCHAR(40),\r\n",
|
||||||
"\t\t\t\t @experiment_name NVARCHAR(32),\r\n",
|
"\t\t\t\t @experiment_name NVARCHAR(32),\r\n",
|
||||||
"\t\t\t\t @iteration_timeout_minutes INT,\r\n",
|
"\t\t\t\t @iteration_timeout_minutes INT,\r\n",
|
||||||
"\t\t\t\t @experiment_timeout_minutes INT,\r\n",
|
"\t\t\t\t @experiment_timeout_hours FLOAT,\r\n",
|
||||||
"\t\t\t\t @n_cross_validations INT,\r\n",
|
"\t\t\t\t @n_cross_validations INT,\r\n",
|
||||||
"\t\t\t\t @blacklist_models NVARCHAR(MAX),\r\n",
|
"\t\t\t\t @blacklist_models NVARCHAR(MAX),\r\n",
|
||||||
"\t\t\t\t @whitelist_models NVARCHAR(MAX),\r\n",
|
"\t\t\t\t @whitelist_models NVARCHAR(MAX),\r\n",
|
||||||
@@ -396,7 +396,7 @@
|
|||||||
"\t, @task = @task\r\n",
|
"\t, @task = @task\r\n",
|
||||||
"\t, @experiment_name = @experiment_name\r\n",
|
"\t, @experiment_name = @experiment_name\r\n",
|
||||||
"\t, @iteration_timeout_minutes = @iteration_timeout_minutes\r\n",
|
"\t, @iteration_timeout_minutes = @iteration_timeout_minutes\r\n",
|
||||||
"\t, @experiment_timeout_minutes = @experiment_timeout_minutes\r\n",
|
"\t, @experiment_timeout_hours = @experiment_timeout_hours\r\n",
|
||||||
"\t, @n_cross_validations = @n_cross_validations\r\n",
|
"\t, @n_cross_validations = @n_cross_validations\r\n",
|
||||||
"\t, @blacklist_models = @blacklist_models\r\n",
|
"\t, @blacklist_models = @blacklist_models\r\n",
|
||||||
"\t, @whitelist_models = @whitelist_models\r\n",
|
"\t, @whitelist_models = @whitelist_models\r\n",
|
||||||
|
|||||||
@@ -11,6 +11,13 @@
|
|||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Register Azure Databricks trained model and deploy it to ACI\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -161,9 +168,9 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) #showing how to add libs as an eg. - not needed for this model.\n",
|
"myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) # showing how to add libs as an eg. - not needed for this model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"mydeployenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myacienv.serialize_to_string())"
|
" f.write(myacienv.serialize_to_string())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -175,18 +182,37 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"#deploy to ACI\n",
|
"#deploy to ACI\n",
|
||||||
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
||||||
|
"from azureml.exceptions import WebserviceException\n",
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"myaci_config = AciWebservice.deploy_configuration(cpu_cores = 2, \n",
|
"myaci_config = AciWebservice.deploy_configuration(cpu_cores = 2, \n",
|
||||||
" memory_gb = 2, \n",
|
" memory_gb = 2, \n",
|
||||||
" tags = {'name':'Databricks Azure ML ACI'}, \n",
|
" tags = {'name':'Databricks Azure ML ACI'}, \n",
|
||||||
" description = 'This is for ADB and AML example.')\n",
|
" description = 'This is for ADB and AML example.')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= 'spark-py', \n",
|
"service_name = 'aciws'\n",
|
||||||
" entry_script='score_sparkml.py',\n",
|
|
||||||
" conda_file='mydeployenv.yml')\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"myservice = Model.deploy(ws, 'aciws', [mymodel], inference_config, myaci_config)\n",
|
"# Remove any existing service under the same name.\n",
|
||||||
|
"try:\n",
|
||||||
|
" Webservice(ws, service_name).delete()\n",
|
||||||
|
"except WebserviceException:\n",
|
||||||
|
" pass\n",
|
||||||
|
"\n",
|
||||||
|
"myenv = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')\n",
|
||||||
|
"# we need to add extra packages to procured environment\n",
|
||||||
|
"# in order to deploy amended environment we need to rename it\n",
|
||||||
|
"myenv.name = 'myenv'\n",
|
||||||
|
"model_dependencies = CondaDependencies('myenv.yml')\n",
|
||||||
|
"for pip_dep in model_dependencies.pip_packages:\n",
|
||||||
|
" myenv.python.conda_dependencies.add_pip_package(pip_dep)\n",
|
||||||
|
"for conda_dep in model_dependencies.conda_packages:\n",
|
||||||
|
" myenv.python.conda_dependencies.add_conda_package(conda_dep)\n",
|
||||||
|
"inference_config = InferenceConfig(entry_script='score_sparkml.py', environment=myenv)\n",
|
||||||
|
"\n",
|
||||||
|
"myservice = Model.deploy(ws, service_name, [mymodel], inference_config, myaci_config)\n",
|
||||||
"myservice.wait_for_deployment(show_output=True)"
|
"myservice.wait_for_deployment(show_output=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -199,18 +225,6 @@
|
|||||||
"help(Webservice)"
|
"help(Webservice)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# List images by ws\n",
|
|
||||||
"\n",
|
|
||||||
"for i in ContainerImage.list(workspace = ws):\n",
|
|
||||||
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -258,6 +272,15 @@
|
|||||||
"myservice.delete()"
|
"myservice.delete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploying to other types of computes\n",
|
||||||
|
"\n",
|
||||||
|
"In order to learn how to deploy to other types of compute targets, such as AKS, please take a look at the set of notebooks in the [deployment](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment) folder."
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,298 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
|
||||||
"\n",
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"This notebook uses image from ACI notebook for deploying to AKS."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import azureml.core\n",
|
|
||||||
"\n",
|
|
||||||
"# Check core SDK version number\n",
|
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Set auth to be used by workspace related APIs.\n",
|
|
||||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
|
||||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
|
||||||
"auth = None"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"ws = Workspace.from_config(auth = auth)\n",
|
|
||||||
"print('Workspace name: ' + ws.name, \n",
|
|
||||||
" 'Azure region: ' + ws.location, \n",
|
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#Register the model\n",
|
|
||||||
"import os\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"\n",
|
|
||||||
"model_name = \"AdultCensus_runHistory_aks.mml\" # \n",
|
|
||||||
"model_name_dbfs = os.path.join(\"/dbfs\", model_name)\n",
|
|
||||||
"\n",
|
|
||||||
"print(\"copy model from dbfs to local\")\n",
|
|
||||||
"model_local = \"file:\" + os.getcwd() + \"/\" + model_name\n",
|
|
||||||
"dbutils.fs.cp(model_name, model_local, True)\n",
|
|
||||||
"\n",
|
|
||||||
"mymodel = Model.register(model_path = model_name, # this points to a local file\n",
|
|
||||||
" model_name = model_name, # this is the name the model is registered as, am using same name for both path and name. \n",
|
|
||||||
" description = \"ADB trained model by Parashar\",\n",
|
|
||||||
" workspace = ws)\n",
|
|
||||||
"\n",
|
|
||||||
"print(mymodel.name, mymodel.description, mymodel.version)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#%%writefile score_sparkml.py\n",
|
|
||||||
"score_sparkml = \"\"\"\n",
|
|
||||||
" \n",
|
|
||||||
"import json\n",
|
|
||||||
" \n",
|
|
||||||
"def init():\n",
|
|
||||||
" # One-time initialization of PySpark and predictive model\n",
|
|
||||||
" import pyspark\n",
|
|
||||||
" from azureml.core.model import Model\n",
|
|
||||||
" from pyspark.ml import PipelineModel\n",
|
|
||||||
" \n",
|
|
||||||
" global trainedModel\n",
|
|
||||||
" global spark\n",
|
|
||||||
" \n",
|
|
||||||
" spark = pyspark.sql.SparkSession.builder.appName(\"ADB and AML notebook by Parashar\").getOrCreate()\n",
|
|
||||||
" model_name = \"{model_name}\" #interpolated\n",
|
|
||||||
" model_path = Model.get_model_path(model_name)\n",
|
|
||||||
" trainedModel = PipelineModel.load(model_path)\n",
|
|
||||||
" \n",
|
|
||||||
"def run(input_json):\n",
|
|
||||||
" if isinstance(trainedModel, Exception):\n",
|
|
||||||
" return json.dumps({{\"trainedModel\":str(trainedModel)}})\n",
|
|
||||||
" \n",
|
|
||||||
" try:\n",
|
|
||||||
" sc = spark.sparkContext\n",
|
|
||||||
" input_list = json.loads(input_json)\n",
|
|
||||||
" input_rdd = sc.parallelize(input_list)\n",
|
|
||||||
" input_df = spark.read.json(input_rdd)\n",
|
|
||||||
" \n",
|
|
||||||
" # Compute prediction\n",
|
|
||||||
" prediction = trainedModel.transform(input_df)\n",
|
|
||||||
" #result = prediction.first().prediction\n",
|
|
||||||
" predictions = prediction.collect()\n",
|
|
||||||
" \n",
|
|
||||||
" #Get each scored result\n",
|
|
||||||
" preds = [str(x['prediction']) for x in predictions]\n",
|
|
||||||
" result = \",\".join(preds)\n",
|
|
||||||
" # you can return any data type as long as it is JSON-serializable\n",
|
|
||||||
" return result.tolist()\n",
|
|
||||||
" except Exception as e:\n",
|
|
||||||
" result = str(e)\n",
|
|
||||||
" return result\n",
|
|
||||||
" \n",
|
|
||||||
"\"\"\".format(model_name=model_name)\n",
|
|
||||||
" \n",
|
|
||||||
"exec(score_sparkml)\n",
|
|
||||||
" \n",
|
|
||||||
"with open(\"score_sparkml.py\", \"w\") as file:\n",
|
|
||||||
" file.write(score_sparkml)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
|
||||||
"\n",
|
|
||||||
"myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) #showing how to add libs as an eg. - not needed for this model.\n",
|
|
||||||
"\n",
|
|
||||||
"with open(\"mydeployenv.yml\",\"w\") as f:\n",
|
|
||||||
" f.write(myacienv.serialize_to_string())"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#create AKS compute\n",
|
|
||||||
"#it may take 20-25 minutes to create a new cluster\n",
|
|
||||||
"\n",
|
|
||||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
|
||||||
"\n",
|
|
||||||
"# Use the default configuration (can also provide parameters to customize)\n",
|
|
||||||
"prov_config = AksCompute.provisioning_configuration()\n",
|
|
||||||
"\n",
|
|
||||||
"aks_name = 'ps-aks-demo2' \n",
|
|
||||||
"\n",
|
|
||||||
"# Create the cluster\n",
|
|
||||||
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
|
||||||
" name = aks_name, \n",
|
|
||||||
" provisioning_configuration = prov_config)\n",
|
|
||||||
"\n",
|
|
||||||
"aks_target.wait_for_completion(show_output = True)\n",
|
|
||||||
"\n",
|
|
||||||
"print(aks_target.provisioning_state)\n",
|
|
||||||
"print(aks_target.provisioning_errors)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#deploy to AKS\n",
|
|
||||||
"from azureml.core.webservice import AksWebservice, Webservice\n",
|
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
|
||||||
"\n",
|
|
||||||
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)\n",
|
|
||||||
"\n",
|
|
||||||
"inference_config = InferenceConfig(runtime = 'spark-py', \n",
|
|
||||||
" entry_script ='score_sparkml.py',\n",
|
|
||||||
" conda_file ='mydeployenv.yml')\n",
|
|
||||||
"\n",
|
|
||||||
"aks_service = Model.deploy(ws, 'ps-aks-service', [mymodel], inference_config, aks_config, aks_target)\n",
|
|
||||||
"aks_service.wait_for_deployment(show_output=True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"aks_service.deployment_status"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#for using the Web HTTP API \n",
|
|
||||||
"print(aks_service.scoring_uri)\n",
|
|
||||||
"print(aks_service.get_keys())"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import json\n",
|
|
||||||
"\n",
|
|
||||||
"#get the some sample data\n",
|
|
||||||
"test_data_path = \"AdultCensusIncomeTest\"\n",
|
|
||||||
"test = spark.read.parquet(test_data_path).limit(5)\n",
|
|
||||||
"\n",
|
|
||||||
"test_json = json.dumps(test.toJSON().collect())\n",
|
|
||||||
"\n",
|
|
||||||
"print(test_json)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#using data defined above predict if income is >50K (1) or <=50K (0)\n",
|
|
||||||
"aks_service.run(input_data=test_json)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#comment to not delete the web service\n",
|
|
||||||
"aks_service.delete()\n",
|
|
||||||
"#model.delete()\n",
|
|
||||||
"aks_target.delete() "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "pasha"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"name": "deploy-to-aks-existingimage-05",
|
|
||||||
"notebookId": 1030695628045968
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
|
||||||
}
|
|
||||||
@@ -640,7 +640,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults', 'azureml-sdk[automl]'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults', 'azureml-sdk[automl]'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"conda_env_file_name = 'mydeployenv.yml'\n",
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -661,22 +661,40 @@
|
|||||||
"# this will take 10-15 minutes to finish\n",
|
"# this will take 10-15 minutes to finish\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
||||||
|
"from azureml.exceptions import WebserviceException\n",
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"import uuid\n",
|
"import uuid\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"myaci_config = AciWebservice.deploy_configuration(\n",
|
"myaci_config = AciWebservice.deploy_configuration(\n",
|
||||||
" cpu_cores = 2, \n",
|
" cpu_cores = 2, \n",
|
||||||
" memory_gb = 2, \n",
|
" memory_gb = 2, \n",
|
||||||
" tags = {'name':'Databricks Azure ML ACI'}, \n",
|
" tags = {'name':'Databricks Azure ML ACI'}, \n",
|
||||||
" description = 'This is for ADB and AutoML example.')\n",
|
" description = 'This is for ADB and AutoML example.')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= 'spark-py', \n",
|
"myenv = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')\n",
|
||||||
" entry_script='score.py',\n",
|
"# we need to add extra packages to procured environment\n",
|
||||||
" conda_file='mydeployenv.yml')\n",
|
"# in order to deploy amended environment we need to rename it\n",
|
||||||
|
"myenv.name = 'myenv'\n",
|
||||||
|
"model_dependencies = CondaDependencies('myenv.yml')\n",
|
||||||
|
"for pip_dep in model_dependencies.pip_packages:\n",
|
||||||
|
" myenv.python.conda_dependencies.add_pip_package(pip_dep)\n",
|
||||||
|
"for conda_dep in model_dependencies.conda_packages:\n",
|
||||||
|
" myenv.python.conda_dependencies.add_conda_package(conda_dep)\n",
|
||||||
|
"inference_config = InferenceConfig(entry_script='score_sparkml.py', environment=myenv)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"guid = str(uuid.uuid4()).split(\"-\")[0]\n",
|
"guid = str(uuid.uuid4()).split(\"-\")[0]\n",
|
||||||
"service_name = \"myservice-{}\".format(guid)\n",
|
"service_name = \"myservice-{}\".format(guid)\n",
|
||||||
|
"\n",
|
||||||
|
"# Remove any existing service under the same name.\n",
|
||||||
|
"try:\n",
|
||||||
|
" Webservice(ws, service_name).delete()\n",
|
||||||
|
"except WebserviceException:\n",
|
||||||
|
" pass\n",
|
||||||
|
"\n",
|
||||||
"print(\"Creating service with name: {}\".format(service_name))\n",
|
"print(\"Creating service with name: {}\".format(service_name))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"myservice = Model.deploy(ws, service_name, [model], inference_config, myaci_config)\n",
|
"myservice = Model.deploy(ws, service_name, [model], inference_config, myaci_config)\n",
|
||||||
@@ -795,7 +813,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.8"
|
||||||
},
|
},
|
||||||
"name": "auto-ml-classification-local-adb",
|
"name": "auto-ml-classification-local-adb",
|
||||||
"notebookId": 2733885892129020
|
"notebookId": 2733885892129020
|
||||||
|
|||||||
@@ -345,7 +345,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"sample-akscompute-provision"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
|
|||||||
@@ -682,7 +682,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"sample-akswebservice-deploy-from-image"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
|
|||||||
@@ -195,7 +195,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Only Environments that were created using azureml-defaults version 1.0.48 or later will work with this new handling however.\n",
|
"You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Please note that your environment must include azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)."
|
"More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)."
|
||||||
]
|
]
|
||||||
@@ -221,23 +221,30 @@
|
|||||||
"## Create Inference Configuration\n",
|
"## Create Inference Configuration\n",
|
||||||
"\n",
|
"\n",
|
||||||
"There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.\n",
|
"There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.\n",
|
||||||
"Note: in that case, your entry_script, conda_file, and extra_docker_file_steps paths are relative paths to the source_directory path.\n",
|
"Note: in that case, environments's entry_script and file_path are relative paths to the source_directory path; myenv.docker.base_dockerfile is a string containing extra docker steps or contents of the docker file.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Sample code for using a source directory:\n",
|
"Sample code for using a source directory:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```python\n",
|
"```python\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"\n",
|
||||||
|
"myenv = Environment.from_conda_specification(name='myenv', file_path='env/myenv.yml')\n",
|
||||||
|
"\n",
|
||||||
|
"# explicitly set base_image to None when setting base_dockerfile\n",
|
||||||
|
"myenv.docker.base_image = None\n",
|
||||||
|
"# add extra docker commends to execute\n",
|
||||||
|
"myenv.docker.base_dockerfile = \"FROM ubuntu\\n RUN echo \\\"hello\\\"\"\n",
|
||||||
|
"\n",
|
||||||
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n",
|
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n",
|
||||||
" runtime= \"python\", \n",
|
|
||||||
" entry_script=\"x/y/score.py\",\n",
|
" entry_script=\"x/y/score.py\",\n",
|
||||||
" conda_file=\"env/myenv.yml\", \n",
|
" environment=myenv)\n",
|
||||||
" extra_docker_file_steps=\"helloworld.txt\")\n",
|
|
||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
" - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n",
|
" - file_path: input parameter to Environment constructor. Manages conda and python package dependencies.\n",
|
||||||
" - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python\n",
|
" - env.docker.base_dockerfile: any extra steps you want to inject into docker file\n",
|
||||||
" - entry_script = contains logic specific to initializing your model and running predictions\n",
|
" - source_directory: holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n",
|
||||||
" - conda_file = manages conda and python package dependencies.\n",
|
" - entry_script: contains logic specific to initializing your model and running predictions"
|
||||||
" - extra_docker_file_steps = optional: any extra steps you want to inject into docker file"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -20,7 +20,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Register model and deploy as webservice\n",
|
"# Register model and deploy as webservice in ACI\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Following this notebook, you will:\n",
|
"Following this notebook, you will:\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -45,6 +45,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"# Check core SDK version number.\n",
|
"# Check core SDK version number.\n",
|
||||||
"print('SDK version:', azureml.core.VERSION)"
|
"print('SDK version:', azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
@@ -70,6 +71,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
||||||
]
|
]
|
||||||
@@ -91,6 +93,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Dataset\n",
|
"from azureml.core import Dataset\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"datastore = ws.get_default_datastore()\n",
|
"datastore = ws.get_default_datastore()\n",
|
||||||
"datastore.upload_files(files=['./features.csv', './labels.csv'],\n",
|
"datastore.upload_files(files=['./features.csv', './labels.csv'],\n",
|
||||||
" target_path='sklearn_regression/',\n",
|
" target_path='sklearn_regression/',\n",
|
||||||
@@ -116,7 +119,8 @@
|
|||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"register model from file"
|
"register model from file",
|
||||||
|
"sample-model-register"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
@@ -124,6 +128,7 @@
|
|||||||
"from azureml.core import Model\n",
|
"from azureml.core import Model\n",
|
||||||
"from azureml.core.resource_configuration import ResourceConfiguration\n",
|
"from azureml.core.resource_configuration import ResourceConfiguration\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"model = Model.register(workspace=ws,\n",
|
"model = Model.register(workspace=ws,\n",
|
||||||
" model_name='my-sklearn-model', # Name of the registered model in your workspace.\n",
|
" model_name='my-sklearn-model', # Name of the registered model in your workspace.\n",
|
||||||
" model_path='./sklearn_regression_model.pkl', # Local file to upload and register as a model.\n",
|
" model_path='./sklearn_regression_model.pkl', # Local file to upload and register as a model.\n",
|
||||||
@@ -158,6 +163,8 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"The Azure Machine Learning service provides a default environment for supported model frameworks, including scikit-learn, based on the metadata you provided when registering your model. This is the easiest way to deploy your model.\n",
|
"The Azure Machine Learning service provides a default environment for supported model frameworks, including scikit-learn, based on the metadata you provided when registering your model. This is the easiest way to deploy your model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"Even when you deploy your model to ACI with a default environment you can still customize the deploy configuration (i.e. the number of cores and amount of memory made available for the deployment) using the [AciWebservice.deploy_configuration()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aci.aciwebservice#deploy-configuration-cpu-cores-none--memory-gb-none--tags-none--properties-none--description-none--location-none--auth-enabled-none--ssl-enabled-none--enable-app-insights-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--ssl-cname-none--dns-name-label-none--). Look at the \"Use a custom environment\" section of this notebook for more information on deploy configuration.\n",
|
||||||
|
"\n",
|
||||||
"**Note**: This step can take several minutes."
|
"**Note**: This step can take several minutes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -170,6 +177,7 @@
|
|||||||
"from azureml.core import Webservice\n",
|
"from azureml.core import Webservice\n",
|
||||||
"from azureml.exceptions import WebserviceException\n",
|
"from azureml.exceptions import WebserviceException\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"service_name = 'my-sklearn-service'\n",
|
"service_name = 'my-sklearn-service'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Remove any existing service under the same name.\n",
|
"# Remove any existing service under the same name.\n",
|
||||||
@@ -197,6 +205,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"input_payload = json.dumps({\n",
|
"input_payload = json.dumps({\n",
|
||||||
" 'data': [\n",
|
" 'data': [\n",
|
||||||
" [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n",
|
" [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n",
|
||||||
@@ -230,9 +239,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Use a custom environment (for all models)\n",
|
"### Use a custom environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you want more control over how your model is run, if it uses another framework, or if it has special runtime requirements, you can instead specify your own environment and scoring method.\n",
|
"If you want more control over how your model is run, if it uses another framework, or if it has special runtime requirements, you can instead specify your own environment and scoring method. Custom environments can be used for any model you want to deploy.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Specify the model's runtime environment by creating an [Environment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment%28class%29?view=azure-ml-py) object and providing the [CondaDependencies](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) needed by your model."
|
"Specify the model's runtime environment by creating an [Environment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment%28class%29?view=azure-ml-py) object and providing the [CondaDependencies](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) needed by your model."
|
||||||
]
|
]
|
||||||
@@ -246,6 +255,7 @@
|
|||||||
"from azureml.core import Environment\n",
|
"from azureml.core import Environment\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"environment = Environment('my-sklearn-environment')\n",
|
"environment = Environment('my-sklearn-environment')\n",
|
||||||
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
|
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
|
||||||
" 'azureml-defaults',\n",
|
" 'azureml-defaults',\n",
|
||||||
@@ -277,7 +287,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Deploy your model in the custom environment by providing an [InferenceConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py) object to [Model.deploy()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#deploy-workspace--name--models--inference-config--deployment-config-none--deployment-target-none-).\n",
|
"Deploy your model in the custom environment by providing an [InferenceConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py) object to [Model.deploy()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#deploy-workspace--name--models--inference-config--deployment-config-none--deployment-target-none-). In this case we are also using the [AciWebservice.deploy_configuration()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aci.aciwebservice#deploy-configuration-cpu-cores-none--memory-gb-none--tags-none--properties-none--description-none--location-none--auth-enabled-none--ssl-enabled-none--enable-app-insights-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--ssl-cname-none--dns-name-label-none--) method to generate a custom deploy configuration.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note**: This step can take several minutes."
|
"**Note**: This step can take several minutes."
|
||||||
]
|
]
|
||||||
@@ -287,15 +297,18 @@
|
|||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"azuremlexception-remarks-sample"
|
"azuremlexception-remarks-sample",
|
||||||
|
"sample-aciwebservice-deploy-config"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Webservice\n",
|
"from azureml.core import Webservice\n",
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"from azureml.exceptions import WebserviceException\n",
|
"from azureml.exceptions import WebserviceException\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"service_name = 'my-custom-env-service'\n",
|
"service_name = 'my-custom-env-service'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Remove any existing service under the same name.\n",
|
"# Remove any existing service under the same name.\n",
|
||||||
@@ -304,11 +317,14 @@
|
|||||||
"except WebserviceException:\n",
|
"except WebserviceException:\n",
|
||||||
" pass\n",
|
" pass\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(entry_script='score.py',\n",
|
"inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n",
|
||||||
" source_directory='.',\n",
|
"aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n",
|
||||||
" environment=environment)\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"service = Model.deploy(ws, service_name, [model], inference_config)\n",
|
"service = Model.deploy(workspace=ws,\n",
|
||||||
|
" name=service_name,\n",
|
||||||
|
" models=[model],\n",
|
||||||
|
" inference_config=inference_config,\n",
|
||||||
|
" deployment_config=aci_config)\n",
|
||||||
"service.wait_for_deployment(show_output=True)"
|
"service.wait_for_deployment(show_output=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -327,6 +343,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
"input_payload = json.dumps({\n",
|
"input_payload = json.dumps({\n",
|
||||||
" 'data': [\n",
|
" 'data': [\n",
|
||||||
" [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n",
|
" [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n",
|
||||||
@@ -404,7 +421,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
" - To run a production-ready web service, see the [notebook on deployment to Azure Kubernetes Service](../production-deploy-to-aks/production-deploy-to-aks.ipynb).\n",
|
" - To run a production-ready web service, see the [notebook on deployment to Azure Kubernetes Service](../production-deploy-to-aks/production-deploy-to-aks.ipynb).\n",
|
||||||
" - To run a local web service, see the [notebook on deployment to a local Docker container](../deploy-to-local/register-model-deploy-local.ipynb).\n",
|
" - To run a local web service, see the [notebook on deployment to a local Docker container](../deploy-to-local/register-model-deploy-local.ipynb).\n",
|
||||||
" - For more information on datasets, see the [notebook on training with datasets](../../work-with-data/datasets-tutorial/train-with-datasets.ipynb).\n",
|
" - For more information on datasets, see the [notebook on training with datasets](../../work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.ipynb).\n",
|
||||||
" - For more information on environments, see the [notebook on using environments](../../training/using-environments/using-environments.ipynb).\n",
|
" - For more information on environments, see the [notebook on using environments](../../training/using-environments/using-environments.ipynb).\n",
|
||||||
" - For information on all the available deployment targets, see [“How and where to deploy models”](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#choose-a-compute-target)."
|
" - For information on all the available deployment targets, see [“How and where to deploy models”](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#choose-a-compute-target)."
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -96,7 +96,8 @@
|
|||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"register model from file"
|
"register model from file",
|
||||||
|
"sample-model-register"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
@@ -188,6 +189,15 @@
|
|||||||
" return error"
|
" return error"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency for your environemnt. This package contains the functionality needed to host the model as a web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -205,16 +215,6 @@
|
|||||||
" - inference-schema[numpy-support]"
|
" - inference-schema[numpy-support]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%%writefile C:/abc/dockerstep/customDockerStep.txt\n",
|
|
||||||
"RUN echo \"this is test\""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -239,11 +239,10 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Create Inference Configuration\n",
|
"## Create Inference Configuration\n",
|
||||||
"\n",
|
"\n",
|
||||||
" - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n",
|
" - file_path: input parameter to Environment constructor. Manages conda and python package dependencies.\n",
|
||||||
" - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python\n",
|
" - env.docker.base_dockerfile: any extra steps you want to inject into docker file\n",
|
||||||
" - entry_script = contains logic specific to initializing your model and running predictions\n",
|
" - source_directory: holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n",
|
||||||
" - conda_file = manages conda and python package dependencies.\n",
|
" - entry_script: contains logic specific to initializing your model and running predictions"
|
||||||
" - extra_docker_file_steps = optional: any extra steps you want to inject into docker file"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -252,13 +251,19 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"myenv = Environment.from_conda_specification(name='myenv', file_path='env/myenv.yml')\n",
|
||||||
|
"\n",
|
||||||
|
"# explicitly set base_image to None when setting base_dockerfile\n",
|
||||||
|
"myenv.docker.base_image = None\n",
|
||||||
|
"myenv.docker.base_dockerfile = \"RUN echo \\\"this is test\\\"\"\n",
|
||||||
|
"\n",
|
||||||
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n",
|
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n",
|
||||||
" runtime=\"python\", \n",
|
|
||||||
" entry_script=\"x/y/score.py\",\n",
|
" entry_script=\"x/y/score.py\",\n",
|
||||||
" conda_file=\"env/myenv.yml\", \n",
|
" environment=myenv)\n"
|
||||||
" extra_docker_file_steps=\"dockerstep/customDockerStep.txt\")"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -166,7 +166,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"sample-localwebservice-deploy"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import LocalWebservice\n",
|
"from azureml.core.webservice import LocalWebservice\n",
|
||||||
@@ -341,9 +345,11 @@
|
|||||||
],
|
],
|
||||||
"category": "tutorial",
|
"category": "tutorial",
|
||||||
"compute": [
|
"compute": [
|
||||||
"local"
|
"Local"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"None"
|
||||||
],
|
],
|
||||||
"datasets": [],
|
|
||||||
"deployment": [
|
"deployment": [
|
||||||
"Local"
|
"Local"
|
||||||
],
|
],
|
||||||
|
|||||||
@@ -0,0 +1,369 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Deploy models to Azure Kubernetes Service (AKS) using controlled roll out\n",
|
||||||
|
"This notebook will show you how to deploy mulitple AKS webservices with the same scoring endpoint and how to roll out your models in a controlled manner by configuring % of scoring traffic going to each webservice. If you are using a Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to install the Azure Machine Learning Python SDK and create an Azure ML Workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Check for latest version\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"print(azureml.core.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Initialize workspace\n",
|
||||||
|
"Create a [Workspace](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace%28class%29?view=azure-ml-py) object from your persisted configuration."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register the model\n",
|
||||||
|
"Register a file or folder as a model by calling [Model.register()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#register-workspace--model-path--model-name--tags-none--properties-none--description-none--datasets-none--model-framework-none--model-framework-version-none--child-paths-none-).\n",
|
||||||
|
"In addition to the content of the model file itself, your registered model will also store model metadata -- model description, tags, and framework information -- that will be useful when managing and deploying models in your workspace. Using tags, for instance, you can categorize your models and apply filters when listing models in your workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Model\n",
|
||||||
|
"\n",
|
||||||
|
"model = Model.register(workspace=ws,\n",
|
||||||
|
" model_name='sklearn_regression_model.pkl', # Name of the registered model in your workspace.\n",
|
||||||
|
" model_path='./sklearn_regression_model.pkl', # Local file to upload and register as a model.\n",
|
||||||
|
" model_framework=Model.Framework.SCIKITLEARN, # Framework used to create the model.\n",
|
||||||
|
" model_framework_version='0.19.1', # Version of scikit-learn used to create the model.\n",
|
||||||
|
" description='Ridge regression model to predict diabetes progression.',\n",
|
||||||
|
" tags={'area': 'diabetes', 'type': 'regression'})\n",
|
||||||
|
"\n",
|
||||||
|
"print('Name:', model.name)\n",
|
||||||
|
"print('Version:', model.version)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register an environment (for all models)\n",
|
||||||
|
"\n",
|
||||||
|
"If you control over how your model is run, or if it has special runtime requirements, you can specify your own environment and scoring method.\n",
|
||||||
|
"\n",
|
||||||
|
"Specify the model's runtime environment by creating an [Environment](https://docs.microsoft.com/python/api/azureml-core/azureml.core.environment%28class%29?view=azure-ml-py) object and providing the [CondaDependencies](https://docs.microsoft.com/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) needed by your model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Environment\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"environment=Environment('my-sklearn-environment')\n",
|
||||||
|
"environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n",
|
||||||
|
" 'azureml-defaults',\n",
|
||||||
|
" 'inference-schema[numpy-support]',\n",
|
||||||
|
" 'joblib',\n",
|
||||||
|
" 'numpy',\n",
|
||||||
|
" 'scikit-learn'\n",
|
||||||
|
"])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"When using a custom environment, you must also provide Python code for initializing and running your model. An example script is included with this notebook."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"with open('score.py') as f:\n",
|
||||||
|
" print(f.read())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create the InferenceConfig\n",
|
||||||
|
"Create the inference configuration to reference your environment and entry script during deployment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"\n",
|
||||||
|
"inference_config = InferenceConfig(entry_script='score.py', \n",
|
||||||
|
" source_directory='.',\n",
|
||||||
|
" environment=environment)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Provision the AKS Cluster\n",
|
||||||
|
"If you already have an AKS cluster attached to this workspace, skip the step below and provide the name of the cluster."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AksCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"# Use the default configuration (can also provide parameters to customize)\n",
|
||||||
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
|
"\n",
|
||||||
|
"aks_name = 'my-aks' \n",
|
||||||
|
"# Create the cluster\n",
|
||||||
|
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
||||||
|
" name = aks_name, \n",
|
||||||
|
" provisioning_configuration = prov_config) \n",
|
||||||
|
"aks_target.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create an Endpoint and add a version (AKS service)\n",
|
||||||
|
"This creates a new endpoint and adds a version behind it. By default the first version added is the default version. You can specify the traffic percentile a version takes behind an endpoint. \n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# deploying the model and create a new endpoint\n",
|
||||||
|
"from azureml.core.webservice import AksEndpoint\n",
|
||||||
|
"# from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"#select a created compute\n",
|
||||||
|
"compute = ComputeTarget(ws, 'my-aks')\n",
|
||||||
|
"namespace_name=\"endpointnamespace\"\n",
|
||||||
|
"# define the endpoint name\n",
|
||||||
|
"endpoint_name = \"myendpoint1\"\n",
|
||||||
|
"# define the service name\n",
|
||||||
|
"version_name= \"versiona\"\n",
|
||||||
|
"\n",
|
||||||
|
"endpoint_deployment_config = AksEndpoint.deploy_configuration(tags = {'modelVersion':'firstversion', 'department':'finance'}, \n",
|
||||||
|
" description = \"my first version\", namespace = namespace_name, \n",
|
||||||
|
" version_name = version_name, traffic_percentile = 40)\n",
|
||||||
|
"\n",
|
||||||
|
"endpoint = Model.deploy(ws, endpoint_name, [model], inference_config, endpoint_deployment_config, compute)\n",
|
||||||
|
"endpoint.wait_for_deployment(True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"endpoint.get_logs()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Add another version of the service to an existing endpoint\n",
|
||||||
|
"This adds another version behind an existing endpoint. You can specify the traffic percentile the new version takes. If no traffic_percentile is specified then it defaults to 0. All the unspecified traffic percentile (in this example 50) across all versions goes to default version."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Adding a new version to an existing Endpoint.\n",
|
||||||
|
"version_name_add=\"versionb\" \n",
|
||||||
|
"\n",
|
||||||
|
"endpoint.create_version(version_name = version_name_add, inference_config=inference_config, models=[model], tags = {'modelVersion':'secondversion', 'department':'finance'}, \n",
|
||||||
|
" description = \"my second version\", traffic_percentile = 10)\n",
|
||||||
|
"endpoint.wait_for_deployment(True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Update an existing version in an endpoint\n",
|
||||||
|
"There are two types of versions: control and treatment. An endpoint contains one or more treatment versions but only one control version. This categorization helps compare the different versions against the defined control version."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"endpoint.update_version(version_name=endpoint.versions[version_name_add].name, description=\"my second version update\", traffic_percentile=40, is_default=True, is_control_version_type=True)\n",
|
||||||
|
"endpoint.wait_for_deployment(True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Test the web service using run method\n",
|
||||||
|
"Test the web sevice by passing in data. Run() method retrieves API keys behind the scenes to make sure that call is authenticated."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Scoring on endpoint\n",
|
||||||
|
"import json\n",
|
||||||
|
"test_sample = json.dumps({'data': [\n",
|
||||||
|
" [1,2,3,4,5,6,7,8,9,10], \n",
|
||||||
|
" [10,9,8,7,6,5,4,3,2,1]\n",
|
||||||
|
"]})\n",
|
||||||
|
"\n",
|
||||||
|
"test_sample_encoded = bytes(test_sample, encoding='utf8')\n",
|
||||||
|
"prediction = endpoint.run(input_data=test_sample_encoded)\n",
|
||||||
|
"print(prediction)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Delete Resources"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# deleting a version in an endpoint\n",
|
||||||
|
"endpoint.delete_version(version_name=version_name)\n",
|
||||||
|
"endpoint.wait_for_deployment(True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# deleting an endpoint, this will delete all versions in the endpoint and the endpoint itself\n",
|
||||||
|
"endpoint.delete()"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "shipatel"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"category": "deployment",
|
||||||
|
"compute": [
|
||||||
|
"None"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"Diabetes"
|
||||||
|
],
|
||||||
|
"deployment": [
|
||||||
|
"Azure Kubernetes Service"
|
||||||
|
],
|
||||||
|
"exclude_from_index": false,
|
||||||
|
"framework": [
|
||||||
|
"Scikit-learn"
|
||||||
|
],
|
||||||
|
"friendly_name": "Deploy models to AKS using controlled roll out",
|
||||||
|
"index_order": 3,
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.0"
|
||||||
|
},
|
||||||
|
"star_tag": [
|
||||||
|
"featured"
|
||||||
|
],
|
||||||
|
"tags": [
|
||||||
|
"None"
|
||||||
|
],
|
||||||
|
"task": "Deploy a model with Azure Machine Learning"
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,4 @@
|
|||||||
|
name: deploy-aks-with-controlled-rollout
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
@@ -0,0 +1,28 @@
|
|||||||
|
import pickle
|
||||||
|
import json
|
||||||
|
import numpy
|
||||||
|
from sklearn.externals import joblib
|
||||||
|
from sklearn.linear_model import Ridge
|
||||||
|
from azureml.core.model import Model
|
||||||
|
|
||||||
|
|
||||||
|
def init():
|
||||||
|
global model
|
||||||
|
# note here "sklearn_regression_model.pkl" is the name of the model registered under
|
||||||
|
# this is a different behavior than before when the code is run locally, even though the code is the same.
|
||||||
|
model_path = Model.get_model_path('sklearn_regression_model.pkl')
|
||||||
|
# deserialize the model file back into a sklearn model
|
||||||
|
model = joblib.load(model_path)
|
||||||
|
|
||||||
|
|
||||||
|
# note you can pass in multiple rows for scoring
|
||||||
|
def run(raw_data):
|
||||||
|
try:
|
||||||
|
data = json.loads(raw_data)['data']
|
||||||
|
data = numpy.array(data)
|
||||||
|
result = model.predict(data)
|
||||||
|
# you can return any data type as long as it is JSON-serializable
|
||||||
|
return result.tolist()
|
||||||
|
except Exception as e:
|
||||||
|
error = str(e)
|
||||||
|
return error
|
||||||
@@ -158,7 +158,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 5. *Create myenv.yml file*"
|
"## 5. *Create myenv.yml file*\n",
|
||||||
|
"Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -169,7 +170,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'],\n",
|
||||||
|
" pip_packages=['azureml-defaults'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
@@ -189,10 +191,11 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"\n",
|
||||||
" entry_script=\"score.py\",\n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" conda_file=\"myenv.yml\")"
|
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -431,7 +434,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"aks_service.update(enable_app_insights=False)"
|
"aks_service.update(enable_app_insights=False)\n",
|
||||||
|
"aks_service.wait_for_deployment(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -244,7 +244,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Setting up inference configuration\n",
|
"### Setting up inference configuration\n",
|
||||||
"First we create a YAML file that specifies which dependencies we would like to see in our container."
|
"First we create a YAML file that specifies which dependencies we would like to see in our container. Please note that you must include azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -255,7 +255,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime==0.4.0\",\"azureml-core\"])\n",
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime==0.4.0\", \"azureml-core\", \"azureml-defaults\"])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
@@ -275,11 +275,11 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"\n",
|
||||||
" entry_script=\"score.py\",\n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" conda_file=\"myenv.yml\",\n",
|
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
|
||||||
" extra_docker_file_steps = \"Dockerfile\")"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -316,9 +316,6 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
|
||||||
"from random import randint\n",
|
|
||||||
"\n",
|
|
||||||
"aci_service_name = 'my-aci-service-15ad'\n",
|
"aci_service_name = 'my-aci-service-15ad'\n",
|
||||||
"print(\"Service\", aci_service_name)\n",
|
"print(\"Service\", aci_service_name)\n",
|
||||||
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
||||||
@@ -376,7 +373,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"aci_service.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
@@ -386,6 +383,22 @@
|
|||||||
"name": "viswamy"
|
"name": "viswamy"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
"category": "deployment",
|
||||||
|
"compute": [
|
||||||
|
"local"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"PASCAL VOC"
|
||||||
|
],
|
||||||
|
"deployment": [
|
||||||
|
"Azure Container Instance"
|
||||||
|
],
|
||||||
|
"exclude_from_index": false,
|
||||||
|
"framework": [
|
||||||
|
"ONNX"
|
||||||
|
],
|
||||||
|
"friendly_name": "Convert and deploy TinyYolo with ONNX Runtime",
|
||||||
|
"index_order": 5,
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
@@ -402,7 +415,14 @@
|
|||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.5"
|
||||||
}
|
},
|
||||||
|
"star_tag": [
|
||||||
|
"featured"
|
||||||
|
],
|
||||||
|
"tags": [
|
||||||
|
"ONNX Converter"
|
||||||
|
],
|
||||||
|
"task": "Object Detection"
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
|
|||||||
@@ -2,5 +2,6 @@ name: onnx-convert-aml-deploy-tinyyolo
|
|||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
|
- numpy
|
||||||
- git+https://github.com/apple/coremltools@v2.1
|
- git+https://github.com/apple/coremltools@v2.1
|
||||||
- onnxmltools==1.3.1
|
- onnxmltools==1.3.1
|
||||||
|
|||||||
@@ -319,7 +319,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Write Environment File"
|
"### Write Environment File\n",
|
||||||
|
"Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -330,7 +331,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n",
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
@@ -350,11 +352,11 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"\n",
|
||||||
" entry_script=\"score.py\",\n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" conda_file=\"myenv.yml\",\n",
|
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
|
||||||
" extra_docker_file_steps = \"Dockerfile\")"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -391,8 +393,6 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
|
||||||
"\n",
|
|
||||||
"aci_service_name = 'onnx-demo-emotion'\n",
|
"aci_service_name = 'onnx-demo-emotion'\n",
|
||||||
"print(\"Service\", aci_service_name)\n",
|
"print(\"Service\", aci_service_name)\n",
|
||||||
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
||||||
@@ -726,7 +726,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# remember to delete your service after you are done using it!\n",
|
"# remember to delete your service after you are done using it!\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# aci_service.delete()"
|
"aci_service.delete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -755,6 +755,22 @@
|
|||||||
"name": "viswamy"
|
"name": "viswamy"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
"category": "deployment",
|
||||||
|
"compute": [
|
||||||
|
"Local"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"Emotion FER"
|
||||||
|
],
|
||||||
|
"deployment": [
|
||||||
|
"Azure Container Instance"
|
||||||
|
],
|
||||||
|
"exclude_from_index": false,
|
||||||
|
"framework": [
|
||||||
|
"ONNX"
|
||||||
|
],
|
||||||
|
"friendly_name": "Deploy Facial Expression Recognition (FER+) with ONNX Runtime",
|
||||||
|
"index_order": 2,
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
@@ -772,7 +788,12 @@
|
|||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.5"
|
||||||
},
|
},
|
||||||
"msauthor": "vinitra.swamy"
|
"msauthor": "vinitra.swamy",
|
||||||
|
"star_tag": [],
|
||||||
|
"tags": [
|
||||||
|
"ONNX Model Zoo"
|
||||||
|
],
|
||||||
|
"task": "Facial Expression Recognition"
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
|
|||||||
@@ -306,7 +306,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Write Environment File\n",
|
"### Write Environment File\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine."
|
"This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -317,7 +317,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n",
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
@@ -337,11 +337,11 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"\n",
|
||||||
" entry_script=\"score.py\",\n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" extra_docker_file_steps = \"Dockerfile\",\n",
|
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
|
||||||
" conda_file=\"myenv.yml\")"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -378,8 +378,6 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
|
||||||
"\n",
|
|
||||||
"aci_service_name = 'onnx-demo-mnist'\n",
|
"aci_service_name = 'onnx-demo-mnist'\n",
|
||||||
"print(\"Service\", aci_service_name)\n",
|
"print(\"Service\", aci_service_name)\n",
|
||||||
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
||||||
@@ -735,7 +733,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# remember to delete your service after you are done using it!\n",
|
"# remember to delete your service after you are done using it!\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# aci_service.delete()"
|
"aci_service.delete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -763,6 +761,22 @@
|
|||||||
"name": "viswamy"
|
"name": "viswamy"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
"category": "deployment",
|
||||||
|
"compute": [
|
||||||
|
"Local"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"MNIST"
|
||||||
|
],
|
||||||
|
"deployment": [
|
||||||
|
"Azure Container Instance"
|
||||||
|
],
|
||||||
|
"exclude_from_index": false,
|
||||||
|
"framework": [
|
||||||
|
"ONNX"
|
||||||
|
],
|
||||||
|
"friendly_name": "Deploy MNIST digit recognition with ONNX Runtime",
|
||||||
|
"index_order": 1,
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
@@ -780,7 +794,12 @@
|
|||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.5"
|
||||||
},
|
},
|
||||||
"msauthor": "vinitra.swamy"
|
"msauthor": "vinitra.swamy",
|
||||||
|
"star_tag": [],
|
||||||
|
"tags": [
|
||||||
|
"ONNX Model Zoo"
|
||||||
|
],
|
||||||
|
"task": "Image Classification"
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
|
|||||||
@@ -241,7 +241,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n",
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
@@ -251,7 +252,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Create the inference configuration object"
|
"Create the inference configuration object. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -261,11 +262,11 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"\n",
|
||||||
" entry_script=\"score.py\",\n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" conda_file=\"myenv.yml\",\n",
|
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
|
||||||
" extra_docker_file_steps = \"Dockerfile\")"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -302,7 +303,6 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
|
||||||
"from random import randint\n",
|
"from random import randint\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
|
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
|
||||||
@@ -362,7 +362,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"aci_service.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
@@ -372,6 +372,22 @@
|
|||||||
"name": "viswamy"
|
"name": "viswamy"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
"category": "deployment",
|
||||||
|
"compute": [
|
||||||
|
"Local"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"ImageNet"
|
||||||
|
],
|
||||||
|
"deployment": [
|
||||||
|
"Azure Container Instance"
|
||||||
|
],
|
||||||
|
"exclude_from_index": false,
|
||||||
|
"framework": [
|
||||||
|
"ONNX"
|
||||||
|
],
|
||||||
|
"friendly_name": "Deploy ResNet50 with ONNX Runtime",
|
||||||
|
"index_order": 4,
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
@@ -388,7 +404,12 @@
|
|||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.5"
|
||||||
}
|
},
|
||||||
|
"star_tag": [],
|
||||||
|
"tags": [
|
||||||
|
"ONNX Model Zoo"
|
||||||
|
],
|
||||||
|
"task": "Image Classification"
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
|
|||||||
@@ -405,7 +405,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create inference configuration\n",
|
"### Create inference configuration\n",
|
||||||
"First we create a YAML file that specifies which dependencies we would like to see in our container."
|
"First we create a YAML file that specifies which dependencies we would like to see in our container. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -416,7 +416,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n",
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\", \"azureml-defaults\"])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
@@ -436,11 +436,11 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"\n",
|
||||||
" entry_script=\"score.py\",\n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" conda_file=\"myenv.yml\",\n",
|
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)"
|
||||||
" extra_docker_file_steps = \"Dockerfile\")"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -477,7 +477,6 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"from random import randint\n",
|
"from random import randint\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -538,7 +537,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"aci_service.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
@@ -548,6 +547,22 @@
|
|||||||
"name": "viswamy"
|
"name": "viswamy"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
"category": "deployment",
|
||||||
|
"compute": [
|
||||||
|
"AML Compute"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"MNIST"
|
||||||
|
],
|
||||||
|
"deployment": [
|
||||||
|
"Azure Container Instance"
|
||||||
|
],
|
||||||
|
"exclude_from_index": false,
|
||||||
|
"framework": [
|
||||||
|
"ONNX"
|
||||||
|
],
|
||||||
|
"friendly_name": "Train MNIST in PyTorch, convert, and deploy with ONNX Runtime",
|
||||||
|
"index_order": 3,
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
@@ -565,6 +580,11 @@
|
|||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.6"
|
"version": "3.6.6"
|
||||||
},
|
},
|
||||||
|
"star_tag": [],
|
||||||
|
"tags": [
|
||||||
|
"ONNX Converter"
|
||||||
|
],
|
||||||
|
"task": "Image Classification",
|
||||||
"widgets": {
|
"widgets": {
|
||||||
"application/vnd.jupyter.widget-state+json": {
|
"application/vnd.jupyter.widget-state+json": {
|
||||||
"state": {
|
"state": {
|
||||||
|
|||||||
@@ -318,7 +318,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"sample-deploy-to-aks"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Set the web service configuration (using default here)\n",
|
"# Set the web service configuration (using default here)\n",
|
||||||
@@ -331,7 +335,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"sample-deploy-to-aks"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
|
|||||||
@@ -1,454 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Register Model, Create Image and Deploy Service\n",
|
|
||||||
"\n",
|
|
||||||
"This example shows how to deploy a web service in step-by-step fashion:\n",
|
|
||||||
"\n",
|
|
||||||
" 1. Register model\n",
|
|
||||||
" 2. Query versions of models and select one to deploy\n",
|
|
||||||
" 3. Create Docker image\n",
|
|
||||||
" 4. Query versions of images\n",
|
|
||||||
" 5. Deploy the image as web service\n",
|
|
||||||
" \n",
|
|
||||||
"**IMPORTANT**:\n",
|
|
||||||
" * This notebook requires you to first complete [train-within-notebook](../../training/train-within-notebook/train-within-notebook.ipynb) example\n",
|
|
||||||
" \n",
|
|
||||||
"The train-within-notebook example taught you how to deploy a web service directly from model in one step. This Notebook shows a more advanced approach that gives you more control over model versions and Docker image versions. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Prerequisites\n",
|
|
||||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Check core SDK version number\n",
|
|
||||||
"import azureml.core\n",
|
|
||||||
"\n",
|
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Initialize Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"Initialize a workspace object from persisted configuration."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"create workspace"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"ws = Workspace.from_config()\n",
|
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Register Model"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"You can add tags and descriptions to your models. Note you need to have a `sklearn_linreg_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a model with the same name `sklearn_linreg_model.pkl` in the workspace.\n",
|
|
||||||
"\n",
|
|
||||||
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"register model from file"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"import sklearn\n",
|
|
||||||
"\n",
|
|
||||||
"library_version = \"sklearn\"+sklearn.__version__.replace(\".\",\"x\")\n",
|
|
||||||
"\n",
|
|
||||||
"model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n",
|
|
||||||
" model_name = \"sklearn_regression_model.pkl\",\n",
|
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\", 'version': library_version},\n",
|
|
||||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
|
||||||
" workspace = ws)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"register model from file"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"regression_models = Model.list(workspace=ws, tags=['area'])\n",
|
|
||||||
"for m in regression_models:\n",
|
|
||||||
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"You can pick a specific model to deploy"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"print(model.name, model.description, model.version, sep = '\\t')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Create Docker Image"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_linreg_model.pkl` registered under the workspace. It is NOT referenceing the local file."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%%writefile score.py\n",
|
|
||||||
"import os\n",
|
|
||||||
"import pickle\n",
|
|
||||||
"import json\n",
|
|
||||||
"import numpy\n",
|
|
||||||
"from sklearn.externals import joblib\n",
|
|
||||||
"from sklearn.linear_model import Ridge\n",
|
|
||||||
"\n",
|
|
||||||
"def init():\n",
|
|
||||||
" global model\n",
|
|
||||||
" # AZUREML_MODEL_DIR is an environment variable created during deployment.\n",
|
|
||||||
" # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n",
|
|
||||||
" # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n",
|
|
||||||
" model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_regression_model.pkl')\n",
|
|
||||||
" # deserialize the model file back into a sklearn model\n",
|
|
||||||
" model = joblib.load(model_path)\n",
|
|
||||||
"\n",
|
|
||||||
"# note you can pass in multiple rows for scoring\n",
|
|
||||||
"def run(raw_data):\n",
|
|
||||||
" try:\n",
|
|
||||||
" data = json.loads(raw_data)['data']\n",
|
|
||||||
" data = numpy.array(data)\n",
|
|
||||||
" result = model.predict(data)\n",
|
|
||||||
" # you can return any datatype as long as it is JSON-serializable\n",
|
|
||||||
" return result.tolist()\n",
|
|
||||||
" except Exception as e:\n",
|
|
||||||
" error = str(e)\n",
|
|
||||||
" return error"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
|
||||||
"\n",
|
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
|
||||||
"\n",
|
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
|
||||||
" f.write(myenv.serialize_to_string())"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Note that following command can take few minutes. \n",
|
|
||||||
"\n",
|
|
||||||
"You can add tags and descriptions to images. Also, an image can contain multiple models."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"create image"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
|
||||||
"\n",
|
|
||||||
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
|
||||||
" execution_script=\"score.py\",\n",
|
|
||||||
" conda_file=\"myenv.yml\",\n",
|
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
|
||||||
" description = \"Image with ridge regression model\")\n",
|
|
||||||
"\n",
|
|
||||||
"image = Image.create(name = \"myimage1\",\n",
|
|
||||||
" # this is the model object. note you can pass in 0-n models via this list-type parameter\n",
|
|
||||||
" # in case you need to reference multiple models, or none at all, in your scoring script.\n",
|
|
||||||
" models = [model],\n",
|
|
||||||
" image_config = image_config, \n",
|
|
||||||
" workspace = ws)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"create image"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"image.wait_for_creation(show_output = True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Use a custom Docker image\n",
|
|
||||||
"\n",
|
|
||||||
"You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n",
|
|
||||||
"\n",
|
|
||||||
"Only Supported for `ContainerImage`(from azureml.core.image) with `python` runtime.\n",
|
|
||||||
"```python\n",
|
|
||||||
"# use an image available in public Container Registry without authentication\n",
|
|
||||||
"image_config.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n",
|
|
||||||
"\n",
|
|
||||||
"# or, use an image available in a private Container Registry\n",
|
|
||||||
"image_config.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n",
|
|
||||||
"image_config.base_image_registry.address = \"myregistry.azurecr.io\"\n",
|
|
||||||
"image_config.base_image_registry.username = \"username\"\n",
|
|
||||||
"image_config.base_image_registry.password = \"password\"\n",
|
|
||||||
"\n",
|
|
||||||
"# or, use an image built during training.\n",
|
|
||||||
"image_config.base_image = run.properties[\"AzureML.DerivedImageName\"]\n",
|
|
||||||
"```\n",
|
|
||||||
"You can get the address of training image from the properties of a Run object. Only new runs submitted with azureml-sdk>=1.0.22 to AMLCompute targets will have the 'AzureML.DerivedImageName' property. Instructions on how to get a Run can be found in [manage-runs](../../training/manage-runs/manage-runs.ipynb). \n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"List images by tag and find out the detailed build log for debugging."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"create image"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"for i in Image.list(workspace = ws,tags = [\"area\"]):\n",
|
|
||||||
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Deploy image as web service on Azure Container Instance\n",
|
|
||||||
"\n",
|
|
||||||
"Note that the service creation can take few minutes."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"deploy service",
|
|
||||||
"aci"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
|
||||||
"\n",
|
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
|
||||||
" memory_gb = 1, \n",
|
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"}, \n",
|
|
||||||
" description = 'Predict diabetes using regression model')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"deploy service",
|
|
||||||
"aci"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.webservice import Webservice\n",
|
|
||||||
"\n",
|
|
||||||
"aci_service_name = 'my-aci-service-2'\n",
|
|
||||||
"print(aci_service_name)\n",
|
|
||||||
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
|
||||||
" image = image,\n",
|
|
||||||
" name = aci_service_name,\n",
|
|
||||||
" workspace = ws)\n",
|
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
|
||||||
"print(aci_service.state)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Test web service"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Call the web service with some dummy input data to get a prediction."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"deploy service",
|
|
||||||
"aci"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import json\n",
|
|
||||||
"\n",
|
|
||||||
"test_sample = json.dumps({'data': [\n",
|
|
||||||
" [1,2,3,4,5,6,7,8,9,10], \n",
|
|
||||||
" [10,9,8,7,6,5,4,3,2,1]\n",
|
|
||||||
"]})\n",
|
|
||||||
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
|
|
||||||
"\n",
|
|
||||||
"prediction = aci_service.run(input_data=test_sample)\n",
|
|
||||||
"print(prediction)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Delete ACI to clean up"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"deploy service",
|
|
||||||
"aci"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"aci_service.delete()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "aashishb"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,8 +0,0 @@
|
|||||||
name: register-model-create-image-deploy-service
|
|
||||||
dependencies:
|
|
||||||
- pip:
|
|
||||||
- azureml-sdk
|
|
||||||
- matplotlib
|
|
||||||
- tqdm
|
|
||||||
- scipy
|
|
||||||
- sklearn
|
|
||||||
Binary file not shown.
@@ -0,0 +1 @@
|
|||||||
|
{"class":"org.apache.spark.ml.classification.LogisticRegressionModel","timestamp":1570147252329,"sparkVersion":"2.4.0","uid":"LogisticRegression_5df3978caaf3","paramMap":{"regParam":0.01},"defaultParamMap":{"aggregationDepth":2,"threshold":0.5,"rawPredictionCol":"rawPrediction","featuresCol":"features","labelCol":"label","predictionCol":"prediction","family":"auto","regParam":0.0,"tol":1.0E-6,"probabilityCol":"probability","standardization":true,"elasticNetParam":0.0,"maxIter":100,"fitIntercept":true}}
|
||||||
@@ -0,0 +1,343 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Register Spark Model and deploy as Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"This example shows how to deploy a Webservice in step-by-step fashion:\n",
|
||||||
|
"\n",
|
||||||
|
" 1. Register Spark Model\n",
|
||||||
|
" 2. Deploy Spark Model as Webservice"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prerequisites\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Check core SDK version number\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Initialize Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"Initialize a workspace object from persisted configuration."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"create workspace"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Register Model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can add tags and descriptions to your Models. Note you need to have a `iris.model` file in the current directory. This model file is generated using [train in spark](../training/train-in-spark/train-in-spark.ipynb) notebook. The below call registers that file as a Model with the same name `iris.model` in the workspace.\n",
|
||||||
|
"\n",
|
||||||
|
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"register model from file"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"model = Model.register(model_path=\"iris.model\",\n",
|
||||||
|
" model_name=\"iris.model\",\n",
|
||||||
|
" tags={'type': \"regression\"},\n",
|
||||||
|
" description=\"Logistic regression model to predict iris species\",\n",
|
||||||
|
" workspace=ws)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Fetch Environment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment.\n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook, we will be using 'AzureML-PySpark-MmlSpark-0.15', a curated environment.\n",
|
||||||
|
"\n",
|
||||||
|
"More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Environment\n",
|
||||||
|
"\n",
|
||||||
|
"env = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Inference Configuration\n",
|
||||||
|
"\n",
|
||||||
|
"There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.\n",
|
||||||
|
"Note: in that case, your entry_script is relative path to the source_directory path.\n",
|
||||||
|
"\n",
|
||||||
|
"Sample code for using a source directory:\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"inference_config = InferenceConfig(source_directory=\"C:/abc\",\n",
|
||||||
|
" entry_script=\"x/y/score.py\",\n",
|
||||||
|
" environment=environment)\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
" - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n",
|
||||||
|
" - entry_script = contains logic specific to initializing your model and running predictions\n",
|
||||||
|
" - environment = An environment object to use for the deployment. Doesn't have to be registered"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"create image"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
|
"\n",
|
||||||
|
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy Model as Webservice on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Note that the service creation can take few minutes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"azuremlexception-remarks-sample"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
||||||
|
"from azureml.exceptions import WebserviceException\n",
|
||||||
|
"\n",
|
||||||
|
"deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n",
|
||||||
|
"aci_service_name = 'aciservice1'\n",
|
||||||
|
"\n",
|
||||||
|
"try:\n",
|
||||||
|
" # if you want to get existing service below is the command\n",
|
||||||
|
" # since aci name needs to be unique in subscription deleting existing aci if any\n",
|
||||||
|
" # we use aci_service_name to create azure aci\n",
|
||||||
|
" service = Webservice(ws, name=aci_service_name)\n",
|
||||||
|
" if service:\n",
|
||||||
|
" service.delete()\n",
|
||||||
|
"except WebserviceException as e:\n",
|
||||||
|
" print()\n",
|
||||||
|
"\n",
|
||||||
|
"service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config)\n",
|
||||||
|
"\n",
|
||||||
|
"service.wait_for_deployment(True)\n",
|
||||||
|
"print(service.state)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Test web service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import json\n",
|
||||||
|
"test_sample = json.dumps({'features':{'type':1,'values':[4.3,3.0,1.1,0.1]},'label':2.0})\n",
|
||||||
|
"\n",
|
||||||
|
"test_sample_encoded = bytes(test_sample, encoding='utf8')\n",
|
||||||
|
"prediction = service.run(input_data=test_sample_encoded)\n",
|
||||||
|
"print(prediction)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Delete ACI to clean up"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"deploy service",
|
||||||
|
"aci"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"service.delete()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Model Profiling\n",
|
||||||
|
"\n",
|
||||||
|
"You can also take advantage of the profiling feature to estimate CPU and memory requirements for models.\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"profile = Model.profile(ws, \"profilename\", [model], inference_config, test_sample)\n",
|
||||||
|
"profile.wait_for_profiling(True)\n",
|
||||||
|
"profiling_results = profile.get_results()\n",
|
||||||
|
"print(profiling_results)\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Model Packaging\n",
|
||||||
|
"\n",
|
||||||
|
"If you want to build a Docker image that encapsulates your model and its dependencies, you can use the model packaging option. The output image will be pushed to your workspace's ACR.\n",
|
||||||
|
"\n",
|
||||||
|
"You must include an Environment object in your inference configuration to use `Model.package()`.\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"package = Model.package(ws, [model], inference_config)\n",
|
||||||
|
"package.wait_for_creation(show_output=True) # Or show_output=False to hide the Docker build logs.\n",
|
||||||
|
"package.pull()\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"Instead of a fully-built image, you can also generate a Dockerfile and download all the assets needed to build an image on top of your Environment.\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"package = Model.package(ws, [model], inference_config, generate_dockerfile=True)\n",
|
||||||
|
"package.wait_for_creation(show_output=True)\n",
|
||||||
|
"package.save(\"./local_context_dir\")\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "aashishb"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"category": "deployment",
|
||||||
|
"compute": [
|
||||||
|
"None"
|
||||||
|
],
|
||||||
|
"datasets": [
|
||||||
|
"Iris"
|
||||||
|
],
|
||||||
|
"deployment": [
|
||||||
|
"Azure Container Instance"
|
||||||
|
],
|
||||||
|
"exclude_from_index": false,
|
||||||
|
"framework": [
|
||||||
|
"PySpark"
|
||||||
|
],
|
||||||
|
"friendly_name": "Register Spark model and deploy as webservice",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.2"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,4 @@
|
|||||||
|
name: model-register-and-deploy-spark
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
37
how-to-use-azureml/deployment/spark/score.py
Normal file
37
how-to-use-azureml/deployment/spark/score.py
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
import traceback
|
||||||
|
from pyspark.ml.linalg import VectorUDT
|
||||||
|
from azureml.core.model import Model
|
||||||
|
from pyspark.ml.classification import LogisticRegressionModel
|
||||||
|
from pyspark.sql.types import StructType, StructField
|
||||||
|
from pyspark.sql.types import DoubleType
|
||||||
|
from pyspark.sql import SQLContext
|
||||||
|
from pyspark import SparkContext
|
||||||
|
|
||||||
|
sc = SparkContext.getOrCreate()
|
||||||
|
sqlContext = SQLContext(sc)
|
||||||
|
spark = sqlContext.sparkSession
|
||||||
|
|
||||||
|
input_schema = StructType([StructField("features", VectorUDT()), StructField("label", DoubleType())])
|
||||||
|
reader = spark.read
|
||||||
|
reader.schema(input_schema)
|
||||||
|
|
||||||
|
|
||||||
|
def init():
|
||||||
|
global model
|
||||||
|
# note here "iris.model" is the name of the model registered under the workspace
|
||||||
|
# this call should return the path to the model.pkl file on the local disk.
|
||||||
|
model_path = Model.get_model_path('iris.model')
|
||||||
|
# Load the model file back into a LogisticRegression model
|
||||||
|
model = LogisticRegressionModel.load(model_path)
|
||||||
|
|
||||||
|
|
||||||
|
def run(data):
|
||||||
|
try:
|
||||||
|
input_df = reader.json(sc.parallelize([data]))
|
||||||
|
result = model.transform(input_df)
|
||||||
|
# you can return any datatype as long as it is JSON-serializable
|
||||||
|
return result.collect()[0]['prediction']
|
||||||
|
except Exception as e:
|
||||||
|
traceback.print_exc()
|
||||||
|
error = str(e)
|
||||||
|
return error
|
||||||
@@ -1,11 +1,14 @@
|
|||||||
## Using explain model APIs
|
## Using AzureML Interpret APIs
|
||||||
|
|
||||||
<a name="samples"></a>
|
<a name="samples"></a>
|
||||||
# Explain Model SDK Sample Notebooks
|
# AzureML Interpret SDK Sample Notebooks
|
||||||
|
|
||||||
Follow these sample notebooks to learn:
|
You can run the interpret-community SDK to explain models locally without Azure.
|
||||||
|
For notebooks on the local experience, please see:
|
||||||
|
https://github.com/interpretml/interpret-community/tree/master/notebooks
|
||||||
|
|
||||||
1. [Explain tabular data locally](tabular-data): Basic examples of explaining model trained on tabular data.
|
Follow these sample notebooks to learn about the model interpretability integration with Azure:
|
||||||
2. [Explain on remote AMLCompute](azure-integration/remote-explanation): Explain a model on a remote AMLCompute target.
|
|
||||||
3. [Explain tabular data with Run History](azure-integration/run-history): Explain a model with Run History.
|
1. [Explain on remote AMLCompute](azure-integration/remote-explanation): Explain a model on a remote AMLCompute target.
|
||||||
4. [Operationalize model explanation](azure-integration/scoring-time): Operationalize model explanation as a web service.
|
2. [Explain tabular data with Run History](azure-integration/run-history): Explain a model with Run History.
|
||||||
|
3. [Operationalize model explanation](azure-integration/scoring-time): Operationalize model explanation as a web service.
|
||||||
|
|||||||
@@ -669,7 +669,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
"from interpret_community.widget import ExplanationDashboard"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -678,7 +678,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ExplanationDashboard(global_explanation, original_model, x_test)"
|
"ExplanationDashboard(global_explanation, original_model, datasetX=x_test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -61,4 +61,4 @@ global_explanation = tabular_explainer.explain_global(X_test)
|
|||||||
# Uploading model explanation data for storage or visualization in webUX
|
# Uploading model explanation data for storage or visualization in webUX
|
||||||
# The explanation can then be downloaded on any compute
|
# The explanation can then be downloaded on any compute
|
||||||
comment = 'Global explanation on regression model trained on boston dataset'
|
comment = 'Global explanation on regression model trained on boston dataset'
|
||||||
client.upload_model_explanation(global_explanation, comment=comment)
|
client.upload_model_explanation(global_explanation, comment=comment, model_id=original_model.id)
|
||||||
|
|||||||
@@ -564,7 +564,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
"from interpret_community.widget import ExplanationDashboard"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -573,7 +573,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ExplanationDashboard(downloaded_global_explanation, model, x_test)"
|
"ExplanationDashboard(downloaded_global_explanation, model, datasetX=x_test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -290,7 +290,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
"from interpret_community.widget import ExplanationDashboard"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -299,7 +299,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ExplanationDashboard(global_explanation, clf, x_test)"
|
"ExplanationDashboard(global_explanation, clf, datasetX=x_test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -308,7 +308,9 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Deploy \n",
|
"## Deploy \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deploy Model and ScoringExplainer"
|
"Deploy Model and ScoringExplainer.\n",
|
||||||
|
"\n",
|
||||||
|
"Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -319,7 +321,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"# WARNING: to install this, g++ needs to be available on the Docker image and is not by default (look at the next cell)\n",
|
"# azureml-defaults is required to host the model as a web service.\n",
|
||||||
"azureml_pip_packages = [\n",
|
"azureml_pip_packages = [\n",
|
||||||
" 'azureml-defaults', 'azureml-contrib-interpret', 'azureml-core', 'azureml-telemetry',\n",
|
" 'azureml-defaults', 'azureml-contrib-interpret', 'azureml-core', 'azureml-telemetry',\n",
|
||||||
" 'azureml-interpret'\n",
|
" 'azureml-interpret'\n",
|
||||||
@@ -338,16 +340,6 @@
|
|||||||
" print(f.read())"
|
" print(f.read())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%%writefile dockerfile\n",
|
|
||||||
"RUN apt-get update && apt-get install -y g++ "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -369,6 +361,8 @@
|
|||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||||
" memory_gb=1, \n",
|
" memory_gb=1, \n",
|
||||||
@@ -376,10 +370,8 @@
|
|||||||
" \"method\" : \"local_explanation\"}, \n",
|
" \"method\" : \"local_explanation\"}, \n",
|
||||||
" description='Get local explanations for IBM Employee Attrition data')\n",
|
" description='Get local explanations for IBM Employee Attrition data')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" entry_script=\"score_local_explain.py\",\n",
|
"inference_config = InferenceConfig(entry_script=\"score_local_explain.py\", environment=myenv)\n",
|
||||||
" conda_file=\"myenv.yml\",\n",
|
|
||||||
" extra_docker_file_steps=\"dockerfile\")\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# Use configs and models generated above\n",
|
"# Use configs and models generated above\n",
|
||||||
"service = Model.deploy(ws, 'model-scoring-deploy-local', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
|
"service = Model.deploy(ws, 'model-scoring-deploy-local', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
|
||||||
|
|||||||
@@ -355,7 +355,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
"from interpret_community.widget import ExplanationDashboard"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -364,7 +364,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ExplanationDashboard(global_explanation, original_svm_model, x_test)"
|
"ExplanationDashboard(global_explanation, original_svm_model, datasetX=x_test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -409,16 +409,6 @@
|
|||||||
" print(f.read())"
|
" print(f.read())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%%writefile dockerfile\n",
|
|
||||||
"RUN apt-get update && apt-get install -y g++ "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -439,6 +429,8 @@
|
|||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.environment import Environment\n",
|
||||||
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||||
" memory_gb=1, \n",
|
" memory_gb=1, \n",
|
||||||
@@ -446,10 +438,8 @@
|
|||||||
" \"method\" : \"local_explanation\"}, \n",
|
" \"method\" : \"local_explanation\"}, \n",
|
||||||
" description='Get local explanations for IBM Employee Attrition data')\n",
|
" description='Get local explanations for IBM Employee Attrition data')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
|
||||||
" entry_script=\"score_remote_explain.py\",\n",
|
"inference_config = InferenceConfig(entry_script=\"score_remote_explain.py\", environment=myenv)\n",
|
||||||
" conda_file=\"myenv.yml\",\n",
|
|
||||||
" extra_docker_file_steps=\"dockerfile\")\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# Use configs and models generated above\n",
|
"# Use configs and models generated above\n",
|
||||||
"service = Model.deploy(ws, 'model-scoring-service', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
|
"service = Model.deploy(ws, 'model-scoring-service', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
|
||||||
|
|||||||
@@ -116,7 +116,7 @@ global_explanation = tabular_explainer.explain_global(x_test)
|
|||||||
|
|
||||||
# uploading model explanation data for storage or visualization
|
# uploading model explanation data for storage or visualization
|
||||||
comment = 'Global explanation on classification model trained on IBM employee attrition dataset'
|
comment = 'Global explanation on classification model trained on IBM employee attrition dataset'
|
||||||
client.upload_model_explanation(global_explanation, comment=comment)
|
client.upload_model_explanation(global_explanation, comment=comment, model_id=original_model.id)
|
||||||
|
|
||||||
# also create a lightweight explainer for scoring time
|
# also create a lightweight explainer for scoring time
|
||||||
scoring_explainer = LinearScoringExplainer(tabular_explainer)
|
scoring_explainer = LinearScoringExplainer(tabular_explainer)
|
||||||
|
|||||||
@@ -1,509 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Explain binary classification model predictions with raw feature transformations\n",
|
|
||||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a binary classification model that uses advanced many to one or many to many feature transformations.**_\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"## Table of Contents\n",
|
|
||||||
"\n",
|
|
||||||
"1. [Introduction](#Introduction)\n",
|
|
||||||
"1. [Setup](#Setup)\n",
|
|
||||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
|
||||||
" 1. Apply feature transformations\n",
|
|
||||||
" 1. Train a binary classification model\n",
|
|
||||||
" 1. Explain the model on raw features\n",
|
|
||||||
" 1. Generate global explanations\n",
|
|
||||||
" 1. Generate local explanations\n",
|
|
||||||
"1. [Visualize results](#Visualize)\n",
|
|
||||||
"1. [Next steps](#Next)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Introduction\n",
|
|
||||||
"\n",
|
|
||||||
"This notebook illustrates creating explanations for a binary classification model, Titanic passenger data classification, that uses many to one and many to many feature transformations from raw data to engineered features. For the many to one transformation, we sum 2 features `age` and `fare`. For many to many transformations two features are computed: one that is product of `age` and `fare` and another that is square of this product. Our tabular data explainer is then used to get the explanation object with the flag `allow_all_transformations` passed. The object is then used to get raw feature importances.\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"We will showcase raw feature transformations with three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
|
||||||
"\n",
|
|
||||||
"|  |\n",
|
|
||||||
"|:--:|\n",
|
|
||||||
"| *Interpretability Toolkit Architecture* |\n",
|
|
||||||
"\n",
|
|
||||||
"Problem: Titanic passenger data classification with scikit-learn (run model explainer locally)\n",
|
|
||||||
"\n",
|
|
||||||
"1. Transform raw features to engineered features\n",
|
|
||||||
"2. Train a Logistic Regression model using Scikit-learn\n",
|
|
||||||
"3. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
|
||||||
"4. Visualize the global and local explanations with the visualization dashboard.\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
|
|
||||||
"If you are using Jupyter Labs run the following command:\n",
|
|
||||||
"```\n",
|
|
||||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
|
||||||
"```\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Explain\n",
|
|
||||||
"\n",
|
|
||||||
"### Run model explainer locally at training time"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from sklearn.pipeline import Pipeline\n",
|
|
||||||
"from sklearn.impute import SimpleImputer\n",
|
|
||||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
|
||||||
"from sklearn.linear_model import LogisticRegression\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"\n",
|
|
||||||
"# Explainers:\n",
|
|
||||||
"# 1. SHAP Tabular Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import TabularExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Mimic Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import MimicExplainer\n",
|
|
||||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
|
||||||
"from interpret.ext.glassbox import LGBMExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import LinearExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import SGDExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import DecisionTreeExplainableModel\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. PFI Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import PFIExplainer "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Load the Titanic passenger data"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"titanic_url = ('https://raw.githubusercontent.com/amueller/'\n",
|
|
||||||
" 'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')\n",
|
|
||||||
"data = pd.read_csv(titanic_url)\n",
|
|
||||||
"# fill missing values\n",
|
|
||||||
"data = data.fillna(method=\"ffill\")\n",
|
|
||||||
"data = data.fillna(method=\"bfill\")"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Similar to example [here](https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py), use a subset of columns"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
|
||||||
"\n",
|
|
||||||
"numeric_features = ['age', 'fare']\n",
|
|
||||||
"categorical_features = ['embarked', 'sex', 'pclass']\n",
|
|
||||||
"\n",
|
|
||||||
"y = data['survived'].values\n",
|
|
||||||
"X = data[categorical_features + numeric_features]\n",
|
|
||||||
"\n",
|
|
||||||
"# Split data into train and test\n",
|
|
||||||
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Transform raw features"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# We add many to one and many to many transformations for illustration purposes.\n",
|
|
||||||
"# The support for raw feature explanations with many to one and many to many transformations are only supported \n",
|
|
||||||
"# When allow_all_transformations is set to True on explainer creation\n",
|
|
||||||
"from sklearn.preprocessing import FunctionTransformer\n",
|
|
||||||
"many_to_one_transformer = FunctionTransformer(lambda x: x.sum(axis=1).reshape(-1, 1))\n",
|
|
||||||
"many_to_many_transformer = FunctionTransformer(lambda x: np.hstack(\n",
|
|
||||||
" (np.prod(x, axis=1).reshape(-1, 1), (np.prod(x, axis=1)**2).reshape(-1, 1))\n",
|
|
||||||
"))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from sklearn.compose import ColumnTransformer\n",
|
|
||||||
"\n",
|
|
||||||
"transformations = ColumnTransformer([\n",
|
|
||||||
" (\"age_fare_1\", Pipeline(steps=[\n",
|
|
||||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
|
||||||
" ('scaler', StandardScaler())\n",
|
|
||||||
" ]), [\"age\", \"fare\"]),\n",
|
|
||||||
" (\"age_fare_2\", many_to_one_transformer, [\"age\", \"fare\"]),\n",
|
|
||||||
" (\"age_fare_3\", many_to_many_transformer, [\"age\", \"fare\"]),\n",
|
|
||||||
" (\"embarked\", Pipeline(steps=[\n",
|
|
||||||
" (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n",
|
|
||||||
" (\"encoder\", OneHotEncoder(sparse=False))]), [\"embarked\"]),\n",
|
|
||||||
" (\"sex_pclass\", OneHotEncoder(sparse=False), [\"sex\", \"pclass\"]) \n",
|
|
||||||
"])\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"'''\n",
|
|
||||||
"# Uncomment below if sklearn-pandas is not installed\n",
|
|
||||||
"#!pip install sklearn-pandas\n",
|
|
||||||
"from sklearn_pandas import DataFrameMapper\n",
|
|
||||||
"\n",
|
|
||||||
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
|
|
||||||
"\n",
|
|
||||||
"transformations = [\n",
|
|
||||||
" ([\"age\", \"fare\"], Pipeline(steps=[\n",
|
|
||||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
|
||||||
" ('scaler', StandardScaler())\n",
|
|
||||||
" ])),\n",
|
|
||||||
" ([\"age\", \"fare\"], many_to_one_transformer),\n",
|
|
||||||
" ([\"age\", \"fare\"], many_to_many_transformer),\n",
|
|
||||||
" ([\"embarked\"], Pipeline(steps=[\n",
|
|
||||||
" (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n",
|
|
||||||
" (\"encoder\", OneHotEncoder(sparse=False))])),\n",
|
|
||||||
" ([\"sex\", \"pclass\"], OneHotEncoder(sparse=False)) \n",
|
|
||||||
"]\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# Append classifier to preprocessing pipeline.\n",
|
|
||||||
"# Now we have a full prediction pipeline.\n",
|
|
||||||
"clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n",
|
|
||||||
" ('classifier', LogisticRegression(solver='lbfgs'))])\n",
|
|
||||||
"'''"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Train a Logistic Regression model, which you want to explain"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Append classifier to preprocessing pipeline.\n",
|
|
||||||
"# Now we have a full prediction pipeline.\n",
|
|
||||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
|
||||||
" ('classifier', LogisticRegression(solver='lbfgs'))])\n",
|
|
||||||
"model = clf.fit(x_train, y_train)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain predictions on your local machine"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# 1. Using SHAP TabularExplainer\n",
|
|
||||||
"# When the last parameter allow_all_transformations is passed, we handle many to one and many to many transformations to \n",
|
|
||||||
"# generate approximations to raw feature importances. When this flag is passed, for transformations not recognized as one to \n",
|
|
||||||
"# many, we distribute feature importances evenly to raw features generating them.\n",
|
|
||||||
"# clf.steps[-1][1] returns the trained classification model\n",
|
|
||||||
"explainer = TabularExplainer(clf.steps[-1][1], \n",
|
|
||||||
" initialization_examples=x_train, \n",
|
|
||||||
" features=x_train.columns, \n",
|
|
||||||
" transformations=transformations, \n",
|
|
||||||
" allow_all_transformations=True)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Using MimicExplainer\n",
|
|
||||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
|
||||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
|
||||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
|
||||||
"# explainer = MimicExplainer(clf.steps[-1][1], \n",
|
|
||||||
"# x_train, \n",
|
|
||||||
"# LGBMExplainableModel, \n",
|
|
||||||
"# augment_data=True, \n",
|
|
||||||
"# max_num_of_augmentations=10, \n",
|
|
||||||
"# features=x_train.columns, \n",
|
|
||||||
"# transformations=transformations, \n",
|
|
||||||
"# allow_all_transformations=True)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. Using PFIExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
|
||||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
|
||||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
|
||||||
"# Default metrics: \n",
|
|
||||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
|
||||||
"# Mean absolute error for regression\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# explainer = PFIExplainer(clf.steps[-1][1], \n",
|
|
||||||
"# features=x_train.columns, \n",
|
|
||||||
"# transformations=transformations)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate global explanations\n",
|
|
||||||
"Explain overall model predictions (global explanation)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
|
||||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
|
||||||
"\n",
|
|
||||||
"global_explanation = explainer.explain_global(x_test)\n",
|
|
||||||
"\n",
|
|
||||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
|
||||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Sorted SHAP values\n",
|
|
||||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
|
||||||
"# Corresponding feature names\n",
|
|
||||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
|
||||||
"# Feature ranks (based on original order of features)\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
|
||||||
"# Per class feature names\n",
|
|
||||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
|
||||||
"# Per class feature importance values\n",
|
|
||||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# feature shap values for all features and all data points in the training data\n",
|
|
||||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate local explanations\n",
|
|
||||||
"Explain local data points (individual instances)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Note: PFIExplainer does not support local explanations\n",
|
|
||||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
|
||||||
"\n",
|
|
||||||
"# E.g., Explain the first data point in the test set\n",
|
|
||||||
"instance_num = 1\n",
|
|
||||||
"local_explanation = explainer.explain_local(x_test[:instance_num])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
|
||||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
|
||||||
"\n",
|
|
||||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
|
||||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
|
||||||
"\n",
|
|
||||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
|
||||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Visualize\n",
|
|
||||||
"Load the visualization dashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Next\n",
|
|
||||||
"Learn about other use cases of the explain package on a:\n",
|
|
||||||
" \n",
|
|
||||||
"1. [Training time: regression problem](./explain-regression-local.ipynb)\n",
|
|
||||||
"1. [Training time: binary classification problem](./explain-binary-classification-local.ipynb)\n",
|
|
||||||
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
|
|
||||||
"1. [Explain models with simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
|
|
||||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
|
||||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
|
||||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
|
||||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
|
||||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "mesameki"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.8"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,9 +0,0 @@
|
|||||||
name: advanced-feature-transformations-explain-local
|
|
||||||
dependencies:
|
|
||||||
- pip:
|
|
||||||
- azureml-sdk
|
|
||||||
- interpret
|
|
||||||
- azureml-interpret
|
|
||||||
- azureml-contrib-interpret
|
|
||||||
- sklearn-pandas
|
|
||||||
- ipywidgets
|
|
||||||
@@ -1,390 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Explain binary classification model predictions\n",
|
|
||||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a binary classification model predictions.**_\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"## Table of Contents\n",
|
|
||||||
"\n",
|
|
||||||
"1. [Introduction](#Introduction)\n",
|
|
||||||
"1. [Setup](#Setup)\n",
|
|
||||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
|
||||||
" 1. Train a binary classification model\n",
|
|
||||||
" 1. Explain the model\n",
|
|
||||||
" 1. Generate global explanations\n",
|
|
||||||
" 1. Generate local explanations\n",
|
|
||||||
"1. [Visualize results](#Visualize)\n",
|
|
||||||
"1. [Next steps](#Next)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Introduction\n",
|
|
||||||
"\n",
|
|
||||||
"This notebook illustrates how to explain a binary classification model predictions locally at training time without contacting any Azure services.\n",
|
|
||||||
"It demonstrates the API calls that you need to make to get the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and explanations.\n",
|
|
||||||
"\n",
|
|
||||||
"We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
|
||||||
"\n",
|
|
||||||
"|  |\n",
|
|
||||||
"|:--:|\n",
|
|
||||||
"| *Interpretability Toolkit Architecture* |\n",
|
|
||||||
"\n",
|
|
||||||
"Problem: Breast cancer diagnosis classification with scikit-learn (run model explainer locally)\n",
|
|
||||||
"\n",
|
|
||||||
"1. Train a SVM classification model using Scikit-learn\n",
|
|
||||||
"2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
|
||||||
"3. Visualize the global and local explanations with the visualization dashboard.\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
|
|
||||||
"If you are using Jupyter Labs run the following command:\n",
|
|
||||||
"```\n",
|
|
||||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
|
||||||
"```\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Explain\n",
|
|
||||||
"\n",
|
|
||||||
"### Run model explainer locally at training time"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from sklearn.datasets import load_breast_cancer\n",
|
|
||||||
"from sklearn import svm\n",
|
|
||||||
"\n",
|
|
||||||
"# Explainers:\n",
|
|
||||||
"# 1. SHAP Tabular Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import TabularExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Mimic Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import MimicExplainer\n",
|
|
||||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
|
||||||
"from interpret.ext.glassbox import LGBMExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import LinearExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import SGDExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import DecisionTreeExplainableModel\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. PFI Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import PFIExplainer "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Load the breast cancer diagnosis data"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"breast_cancer_data = load_breast_cancer()\n",
|
|
||||||
"classes = breast_cancer_data.target_names.tolist()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Split data into train and test\n",
|
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
|
||||||
"x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Train a SVM classification model, which you want to explain"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
|
|
||||||
"model = clf.fit(x_train, y_train)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain predictions on your local machine"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# 1. Using SHAP TabularExplainer\n",
|
|
||||||
"explainer = TabularExplainer(model, \n",
|
|
||||||
" x_train, \n",
|
|
||||||
" features=breast_cancer_data.feature_names, \n",
|
|
||||||
" classes=classes)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Using MimicExplainer\n",
|
|
||||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
|
||||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
|
||||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
|
||||||
"# explainer = MimicExplainer(model, \n",
|
|
||||||
"# x_train, \n",
|
|
||||||
"# LGBMExplainableModel, \n",
|
|
||||||
"# augment_data=True, \n",
|
|
||||||
"# max_num_of_augmentations=10, \n",
|
|
||||||
"# features=breast_cancer_data.feature_names, \n",
|
|
||||||
"# classes=classes)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. Using PFIExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
|
||||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
|
||||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
|
||||||
"# Default metrics: \n",
|
|
||||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
|
||||||
"# Mean absolute error for regression\n",
|
|
||||||
"\n",
|
|
||||||
"# explainer = PFIExplainer(model, \n",
|
|
||||||
"# features=breast_cancer_data.feature_names, \n",
|
|
||||||
"# classes=classes)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate global explanations\n",
|
|
||||||
"Explain overall model predictions (global explanation)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
|
||||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
|
||||||
"global_explanation = explainer.explain_global(x_test)\n",
|
|
||||||
"\n",
|
|
||||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
|
||||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Sorted SHAP values\n",
|
|
||||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
|
||||||
"# Corresponding feature names\n",
|
|
||||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
|
||||||
"# Feature ranks (based on original order of features)\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
|
||||||
"\n",
|
|
||||||
"# Note: PFIExplainer does not support per class explanations\n",
|
|
||||||
"# Per class feature names\n",
|
|
||||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
|
||||||
"# Per class feature importance values\n",
|
|
||||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# feature shap values for all features and all data points in the training data\n",
|
|
||||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate local explanations\n",
|
|
||||||
"Explain local data points (individual instances)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Note: PFIExplainer does not support local explanations\n",
|
|
||||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
|
||||||
"\n",
|
|
||||||
"# E.g., Explain the first data point in the test set\n",
|
|
||||||
"instance_num = 0\n",
|
|
||||||
"local_explanation = explainer.explain_local(x_test[instance_num,:])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
|
||||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
|
||||||
"\n",
|
|
||||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
|
||||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
|
||||||
"\n",
|
|
||||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
|
||||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Visualize\n",
|
|
||||||
"Load the visualization dashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Next\n",
|
|
||||||
"Learn about other use cases of the explain package on a:\n",
|
|
||||||
" \n",
|
|
||||||
"1. [Training time: regression problem](./explain-regression-local.ipynb)\n",
|
|
||||||
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
|
|
||||||
"1. Explain models with engineered features:\n",
|
|
||||||
" 1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
|
|
||||||
" 1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)\n",
|
|
||||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
|
||||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
|
||||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
|
||||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
|
||||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "mesameki"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.8"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,8 +0,0 @@
|
|||||||
name: explain-binary-classification-local
|
|
||||||
dependencies:
|
|
||||||
- pip:
|
|
||||||
- azureml-sdk
|
|
||||||
- interpret
|
|
||||||
- azureml-interpret
|
|
||||||
- azureml-contrib-interpret
|
|
||||||
- ipywidgets
|
|
||||||
@@ -1,388 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Explain multiclass classification model's predictions\n",
|
|
||||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a multiclass classification model predictions.**_\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"## Table of Contents\n",
|
|
||||||
"\n",
|
|
||||||
"1. [Introduction](#Introduction)\n",
|
|
||||||
"1. [Setup](#Setup)\n",
|
|
||||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
|
||||||
" 1. Train a multiclass classification model\n",
|
|
||||||
" 1. Explain the model\n",
|
|
||||||
" 1. Generate global explanations\n",
|
|
||||||
" 1. Generate local explanations\n",
|
|
||||||
"1. [Visualize results](#Visualize)\n",
|
|
||||||
"1. [Next steps](#Next)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Introduction\n",
|
|
||||||
"\n",
|
|
||||||
"This notebook illustrates how to explain a multiclass classification model predictions locally at training time without contacting any Azure services.\n",
|
|
||||||
"It demonstrates the API calls that you need to make to get the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and explanations.\n",
|
|
||||||
"\n",
|
|
||||||
"We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
|
||||||
"\n",
|
|
||||||
"|  |\n",
|
|
||||||
"|:--:|\n",
|
|
||||||
"| *Interpretability Toolkit Architecture* |\n",
|
|
||||||
"\n",
|
|
||||||
"Problem: Iris flower classification with scikit-learn (run model explainer locally)\n",
|
|
||||||
"\n",
|
|
||||||
"1. Train a SVM classification model using Scikit-learn\n",
|
|
||||||
"2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
|
||||||
"3. Visualize the global and local explanations with the visualization dashboard.\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
|
|
||||||
"If you are using Jupyter Labs run the following command:\n",
|
|
||||||
"```\n",
|
|
||||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
|
||||||
"```\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Explain\n",
|
|
||||||
"\n",
|
|
||||||
"### Run model explainer locally at training time"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from sklearn.datasets import load_iris\n",
|
|
||||||
"from sklearn import svm\n",
|
|
||||||
"\n",
|
|
||||||
"# Explainers:\n",
|
|
||||||
"# 1. SHAP Tabular Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import TabularExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Mimic Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import MimicExplainer\n",
|
|
||||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
|
||||||
"from interpret.ext.glassbox import LGBMExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import LinearExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import SGDExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import DecisionTreeExplainableModel\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. PFI Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import PFIExplainer "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Load the Iris flower dataset"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"iris = load_iris()\n",
|
|
||||||
"X = iris['data']\n",
|
|
||||||
"y = iris['target']\n",
|
|
||||||
"classes = iris['target_names']\n",
|
|
||||||
"feature_names = iris['feature_names']"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Split data into train and test\n",
|
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
|
||||||
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Train a SVM classification model, which you want to explain"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
|
|
||||||
"model = clf.fit(x_train, y_train)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain predictions on your local machine"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# 1. Using SHAP TabularExplainer\n",
|
|
||||||
"explainer = TabularExplainer(model, \n",
|
|
||||||
" x_train, \n",
|
|
||||||
" features=feature_names, \n",
|
|
||||||
" classes=classes)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Using MimicExplainer\n",
|
|
||||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
|
||||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
|
||||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
|
||||||
"# explainer = MimicExplainer(model, \n",
|
|
||||||
"# x_train, \n",
|
|
||||||
"# LGBMExplainableModel, \n",
|
|
||||||
"# augment_data=True, \n",
|
|
||||||
"# max_num_of_augmentations=10, \n",
|
|
||||||
"# features=feature_names, \n",
|
|
||||||
"# classes=classes)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. Using PFIExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
|
||||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
|
||||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
|
||||||
"# Default metrics: \n",
|
|
||||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
|
||||||
"# Mean absolute error for regression\n",
|
|
||||||
"\n",
|
|
||||||
"# explainer = PFIExplainer(model, \n",
|
|
||||||
"# features=feature_names, \n",
|
|
||||||
"# classes=classes)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate global explanations\n",
|
|
||||||
"Explain overall model predictions (global explanation)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
|
||||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
|
||||||
"global_explanation = explainer.explain_global(x_test)\n",
|
|
||||||
"\n",
|
|
||||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
|
||||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Sorted SHAP values\n",
|
|
||||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
|
||||||
"# Corresponding feature names\n",
|
|
||||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
|
||||||
"# Feature ranks (based on original order of features)\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
|
||||||
"\n",
|
|
||||||
"# Note: PFIExplainer does not support per class explanations\n",
|
|
||||||
"# Per class feature names\n",
|
|
||||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
|
||||||
"# Per class feature importance values\n",
|
|
||||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# feature shap values for all features and all data points in the training data\n",
|
|
||||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate local explanations\n",
|
|
||||||
"Explain local data points (individual instances)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Note: PFIExplainer does not support local explanations\n",
|
|
||||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
|
||||||
"\n",
|
|
||||||
"# E.g., Explain the first data point in the test set\n",
|
|
||||||
"instance_num = 0\n",
|
|
||||||
"local_explanation = explainer.explain_local(x_test[instance_num,:])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
|
||||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
|
||||||
"\n",
|
|
||||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
|
||||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
|
||||||
"\n",
|
|
||||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
|
||||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Visualize\n",
|
|
||||||
"Load the visualization dashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Next\n",
|
|
||||||
"Learn about other use cases of the explain package on a:\n",
|
|
||||||
"\n",
|
|
||||||
"1. [Training time: regression problem](./explain-regression-local.ipynb) \n",
|
|
||||||
"1. [Training time: binary classification problem](./explain-binary-classification-local.ipynb)\n",
|
|
||||||
"1. Explain models with engineered features:\n",
|
|
||||||
" 1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
|
|
||||||
" 1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)\n",
|
|
||||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
|
||||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
|
||||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
|
||||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
|
||||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)\n",
|
|
||||||
"\u00e2\u20ac\u2039\n"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "mesameki"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.8"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,8 +0,0 @@
|
|||||||
name: explain-multiclass-classification-local
|
|
||||||
dependencies:
|
|
||||||
- pip:
|
|
||||||
- azureml-sdk
|
|
||||||
- interpret
|
|
||||||
- azureml-interpret
|
|
||||||
- azureml-contrib-interpret
|
|
||||||
- ipywidgets
|
|
||||||
@@ -1,383 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Explain regression model predictions\n",
|
|
||||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a regression model predictions.**_\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"## Table of Contents\n",
|
|
||||||
"\n",
|
|
||||||
"1. [Introduction](#Introduction)\n",
|
|
||||||
"1. [Setup](#Setup)\n",
|
|
||||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
|
||||||
" 1. Train a regressor model\n",
|
|
||||||
" 1. Explain the model\n",
|
|
||||||
" 1. Generate global explanations\n",
|
|
||||||
" 1. Generate local explanations\n",
|
|
||||||
"1. [Visualize results](#Visualize)\n",
|
|
||||||
"1. [Next steps](#Next)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Introduction\n",
|
|
||||||
"\n",
|
|
||||||
"This notebook illustrates how to explain regression model predictions locally at training time without contacting any Azure services.\n",
|
|
||||||
"It demonstrates the API calls that you need to make to get the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and explanations.\n",
|
|
||||||
"\n",
|
|
||||||
"We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
|
||||||
"\n",
|
|
||||||
"|  |\n",
|
|
||||||
"|:--:|\n",
|
|
||||||
"| *Interpretability Toolkit Architecture* |\n",
|
|
||||||
"\n",
|
|
||||||
"Problem: Boston Housing Price Prediction with scikit-learn (run model explainer locally)\n",
|
|
||||||
"\n",
|
|
||||||
"1. Train a GradientBoosting regression model using Scikit-learn\n",
|
|
||||||
"2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
|
||||||
"3. Visualize the global and local explanations with the visualization dashboard.\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
|
|
||||||
"If you are using Jupyter Labs run the following command:\n",
|
|
||||||
"```\n",
|
|
||||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
|
||||||
"```\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Explain\n",
|
|
||||||
"\n",
|
|
||||||
"### Run model explainer locally at training time"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from sklearn import datasets\n",
|
|
||||||
"from sklearn.ensemble import GradientBoostingRegressor\n",
|
|
||||||
"\n",
|
|
||||||
"# Explainers:\n",
|
|
||||||
"# 1. SHAP Tabular Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import TabularExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Mimic Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import MimicExplainer\n",
|
|
||||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
|
||||||
"from interpret.ext.glassbox import LGBMExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import LinearExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import SGDExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import DecisionTreeExplainableModel\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. PFI Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import PFIExplainer "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Load the Boston house price data"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"boston_data = datasets.load_boston()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Split data into train and test\n",
|
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
|
||||||
"x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Train a GradientBoosting regression model, which you want to explain"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n",
|
|
||||||
" learning_rate=0.1, loss='huber',\n",
|
|
||||||
" random_state=1)\n",
|
|
||||||
"model = reg.fit(x_train, y_train)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain predictions on your local machine"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# 1. Using SHAP TabularExplainer\n",
|
|
||||||
"explainer = TabularExplainer(model, \n",
|
|
||||||
" x_train, \n",
|
|
||||||
" features = boston_data.feature_names)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Using MimicExplainer\n",
|
|
||||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
|
||||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
|
||||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
|
||||||
"# explainer = MimicExplainer(model, \n",
|
|
||||||
"# x_train, \n",
|
|
||||||
"# LGBMExplainableModel, \n",
|
|
||||||
"# augment_data=True, \n",
|
|
||||||
"# max_num_of_augmentations=10, \n",
|
|
||||||
"# features=boston_data.feature_names)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. Using PFIExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
|
||||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
|
||||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
|
||||||
"# Default metrics: \n",
|
|
||||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
|
||||||
"# Mean absolute error for regression\n",
|
|
||||||
"\n",
|
|
||||||
"# explainer = PFIExplainer(model, \n",
|
|
||||||
"# features=boston_data.feature_names)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate global explanations\n",
|
|
||||||
"Explain overall model predictions (global explanation)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
|
||||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
|
||||||
"global_explanation = explainer.explain_global(x_test)\n",
|
|
||||||
"\n",
|
|
||||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
|
||||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Sorted SHAP values \n",
|
|
||||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
|
||||||
"# Corresponding feature names\n",
|
|
||||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
|
||||||
"# Feature ranks (based on original order of features)\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Note: PFIExplainer does not support local explanations\n",
|
|
||||||
"# feature shap values for all features and all data points in the training data\n",
|
|
||||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate local explanations\n",
|
|
||||||
"Explain local data points (individual instances)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Note: PFIExplainer does not support local explanations\n",
|
|
||||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
|
||||||
"\n",
|
|
||||||
"# E.g., Explain the first data point in the test set\n",
|
|
||||||
"local_explanation = explainer.explain_local(x_test[0,:])\n",
|
|
||||||
"\n",
|
|
||||||
"# E.g., Explain the first five data points in the test set\n",
|
|
||||||
"# local_explanation_group = explainer.explain_local(x_test[0:4,:])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Sorted local feature importance information; reflects the original feature order\n",
|
|
||||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()\n",
|
|
||||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()\n",
|
|
||||||
"\n",
|
|
||||||
"print('sorted local importance names: {}'.format(sorted_local_importance_names))\n",
|
|
||||||
"print('sorted local importance values: {}'.format(sorted_local_importance_values))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Visualize\n",
|
|
||||||
"Load the visualization dashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Next\n",
|
|
||||||
"Learn about other use cases of the explain package on a:\n",
|
|
||||||
" \n",
|
|
||||||
"1. [Training time: binary classification problem](./explain-binary-classification-local.ipynb)\n",
|
|
||||||
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
|
|
||||||
"1. Explain models with engineered features:\n",
|
|
||||||
" 1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
|
|
||||||
" 1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)\n",
|
|
||||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
|
||||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
|
||||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
|
||||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
|
||||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "mesameki"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.8"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,8 +0,0 @@
|
|||||||
name: explain-regression-local
|
|
||||||
dependencies:
|
|
||||||
- pip:
|
|
||||||
- azureml-sdk
|
|
||||||
- interpret
|
|
||||||
- azureml-interpret
|
|
||||||
- azureml-contrib-interpret
|
|
||||||
- ipywidgets
|
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 116 KiB |
@@ -1,517 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Explain binary classification model predictions with raw feature transformations\n",
|
|
||||||
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a binary classification model that uses one to one and one to many feature transformations.**_\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"## Table of Contents\n",
|
|
||||||
"\n",
|
|
||||||
"1. [Introduction](#Introduction)\n",
|
|
||||||
"1. [Setup](#Setup)\n",
|
|
||||||
"1. [Run model explainer locally at training time](#Explain)\n",
|
|
||||||
" 1. Apply feature transformations\n",
|
|
||||||
" 1. Train a binary classification model\n",
|
|
||||||
" 1. Explain the model on raw features\n",
|
|
||||||
" 1. Generate global explanations\n",
|
|
||||||
" 1. Generate local explanations\n",
|
|
||||||
"1. [Visualize results](#Visualize)\n",
|
|
||||||
"1. [Next steps](#Next%20steps)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Introduction\n",
|
|
||||||
"\n",
|
|
||||||
"This notebook illustrates creating explanations for a binary classification model, IBM employee attrition classification, that uses one to one and one to many feature transformations from raw data to engineered features. The one to many feature transformations include one hot encoding on categorical features. The one to one feature transformations apply standard scaling on numeric features. Our tabular data explainer is then used to get raw feature importances.\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"We will showcase raw feature transformations with three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
|
|
||||||
"\n",
|
|
||||||
"|  |\n",
|
|
||||||
"|:--:|\n",
|
|
||||||
"| *Interpretability Toolkit Architecture* |\n",
|
|
||||||
"\n",
|
|
||||||
"Problem: IBM employee attrition classification with scikit-learn (run model explainer locally)\n",
|
|
||||||
"\n",
|
|
||||||
"1. Transform raw features to engineered features\n",
|
|
||||||
"2. Train a SVC classification model using Scikit-learn\n",
|
|
||||||
"3. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
|
|
||||||
"4. Visualize the global and local explanations with the visualization dashboard.\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
|
|
||||||
"If you are using Jupyter Labs run the following command:\n",
|
|
||||||
"```\n",
|
|
||||||
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
|
|
||||||
"```\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Explain\n",
|
|
||||||
"\n",
|
|
||||||
"### Run model explainer locally at training time"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from sklearn.pipeline import Pipeline\n",
|
|
||||||
"from sklearn.impute import SimpleImputer\n",
|
|
||||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
|
||||||
"from sklearn.svm import SVC\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"\n",
|
|
||||||
"# Explainers:\n",
|
|
||||||
"# 1. SHAP Tabular Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import TabularExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Mimic Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import MimicExplainer\n",
|
|
||||||
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
|
|
||||||
"from interpret.ext.glassbox import LGBMExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import LinearExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import SGDExplainableModel\n",
|
|
||||||
"from interpret.ext.glassbox import DecisionTreeExplainableModel\n",
|
|
||||||
"\n",
|
|
||||||
"# OR\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. PFI Explainer\n",
|
|
||||||
"from interpret.ext.blackbox import PFIExplainer "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Load the IBM employee attrition data"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# get the IBM employee attrition dataset\n",
|
|
||||||
"outdirname = 'dataset.6.21.19'\n",
|
|
||||||
"try:\n",
|
|
||||||
" from urllib import urlretrieve\n",
|
|
||||||
"except ImportError:\n",
|
|
||||||
" from urllib.request import urlretrieve\n",
|
|
||||||
"import zipfile\n",
|
|
||||||
"zipfilename = outdirname + '.zip'\n",
|
|
||||||
"urlretrieve('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)\n",
|
|
||||||
"with zipfile.ZipFile(zipfilename, 'r') as unzip:\n",
|
|
||||||
" unzip.extractall('.')\n",
|
|
||||||
"attritionData = pd.read_csv('./WA_Fn-UseC_-HR-Employee-Attrition.csv')\n",
|
|
||||||
"\n",
|
|
||||||
"# Dropping Employee count as all values are 1 and hence attrition is independent of this feature\n",
|
|
||||||
"attritionData = attritionData.drop(['EmployeeCount'], axis=1)\n",
|
|
||||||
"# Dropping Employee Number since it is merely an identifier\n",
|
|
||||||
"attritionData = attritionData.drop(['EmployeeNumber'], axis=1)\n",
|
|
||||||
"\n",
|
|
||||||
"attritionData = attritionData.drop(['Over18'], axis=1)\n",
|
|
||||||
"\n",
|
|
||||||
"# Since all values are 80\n",
|
|
||||||
"attritionData = attritionData.drop(['StandardHours'], axis=1)\n",
|
|
||||||
"\n",
|
|
||||||
"# Converting target variables from string to numerical values\n",
|
|
||||||
"target_map = {'Yes': 1, 'No': 0}\n",
|
|
||||||
"attritionData[\"Attrition_numerical\"] = attritionData[\"Attrition\"].apply(lambda x: target_map[x])\n",
|
|
||||||
"target = attritionData[\"Attrition_numerical\"]\n",
|
|
||||||
"\n",
|
|
||||||
"attritionXData = attritionData.drop(['Attrition_numerical', 'Attrition'], axis=1)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Split data into train and test\n",
|
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
|
||||||
"x_train, x_test, y_train, y_test = train_test_split(attritionXData, \n",
|
|
||||||
" target, \n",
|
|
||||||
" test_size = 0.2,\n",
|
|
||||||
" random_state=0,\n",
|
|
||||||
" stratify=target)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Creating dummy columns for each categorical feature\n",
|
|
||||||
"categorical = []\n",
|
|
||||||
"for col, value in attritionXData.iteritems():\n",
|
|
||||||
" if value.dtype == 'object':\n",
|
|
||||||
" categorical.append(col)\n",
|
|
||||||
" \n",
|
|
||||||
"# Store the numerical columns in a list numerical\n",
|
|
||||||
"numerical = attritionXData.columns.difference(categorical) "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Transform raw features"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from sklearn.compose import ColumnTransformer\n",
|
|
||||||
"\n",
|
|
||||||
"# We create the preprocessing pipelines for both numeric and categorical data.\n",
|
|
||||||
"numeric_transformer = Pipeline(steps=[\n",
|
|
||||||
" ('imputer', SimpleImputer(strategy='median')),\n",
|
|
||||||
" ('scaler', StandardScaler())])\n",
|
|
||||||
"\n",
|
|
||||||
"categorical_transformer = Pipeline(steps=[\n",
|
|
||||||
" ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),\n",
|
|
||||||
" ('onehot', OneHotEncoder(handle_unknown='ignore'))])\n",
|
|
||||||
"\n",
|
|
||||||
"transformations = ColumnTransformer(\n",
|
|
||||||
" transformers=[\n",
|
|
||||||
" ('num', numeric_transformer, numerical),\n",
|
|
||||||
" ('cat', categorical_transformer, categorical)])\n",
|
|
||||||
"\n",
|
|
||||||
"# Append classifier to preprocessing pipeline.\n",
|
|
||||||
"# Now we have a full prediction pipeline.\n",
|
|
||||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
|
||||||
" ('classifier', SVC(C = 1.0, probability=True))])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"'''\n",
|
|
||||||
"# Uncomment below if sklearn-pandas is not installed\n",
|
|
||||||
"#!pip install sklearn-pandas\n",
|
|
||||||
"from sklearn_pandas import DataFrameMapper\n",
|
|
||||||
"\n",
|
|
||||||
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"numeric_transformations = [([f], Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])) for f in numerical]\n",
|
|
||||||
"\n",
|
|
||||||
"categorical_transformations = [([f], OneHotEncoder(handle_unknown='ignore', sparse=False)) for f in categorical]\n",
|
|
||||||
"\n",
|
|
||||||
"transformations = numeric_transformations + categorical_transformations\n",
|
|
||||||
"\n",
|
|
||||||
"# Append classifier to preprocessing pipeline.\n",
|
|
||||||
"# Now we have a full prediction pipeline.\n",
|
|
||||||
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
|
||||||
" ('classifier', SVC(C = 1.0, probability=True))]) \n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"'''"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Train a SVM classification model, which you want to explain"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"model = clf.fit(x_train, y_train)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain predictions on your local machine"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# 1. Using SHAP TabularExplainer\n",
|
|
||||||
"# clf.steps[-1][1] returns the trained classification model\n",
|
|
||||||
"explainer = TabularExplainer(clf.steps[-1][1], \n",
|
|
||||||
" initialization_examples=x_train, \n",
|
|
||||||
" features=attritionXData.columns, \n",
|
|
||||||
" classes=[\"Not leaving\", \"leaving\"], \n",
|
|
||||||
" transformations=transformations)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 2. Using MimicExplainer\n",
|
|
||||||
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
|
|
||||||
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
|
|
||||||
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
|
|
||||||
"# explainer = MimicExplainer(clf.steps[-1][1], \n",
|
|
||||||
"# x_train, \n",
|
|
||||||
"# LGBMExplainableModel, \n",
|
|
||||||
"# augment_data=True, \n",
|
|
||||||
"# max_num_of_augmentations=10, \n",
|
|
||||||
"# features=attritionXData.columns, \n",
|
|
||||||
"# classes=[\"Not leaving\", \"leaving\"], \n",
|
|
||||||
"# transformations=transformations)\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# 3. Using PFIExplainer\n",
|
|
||||||
"\n",
|
|
||||||
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
|
|
||||||
"# Note that if a metric function is provided a higher value must be better.\n",
|
|
||||||
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
|
|
||||||
"# Default metrics: \n",
|
|
||||||
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
|
|
||||||
"# Mean absolute error for regression\n",
|
|
||||||
"\n",
|
|
||||||
"# explainer = PFIExplainer(clf.steps[-1][1], \n",
|
|
||||||
"# features=x_train.columns, \n",
|
|
||||||
"# transformations=transformations,\n",
|
|
||||||
"# classes=[\"Not leaving\", \"leaving\"])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate global explanations\n",
|
|
||||||
"Explain overall model predictions (global explanation)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
|
|
||||||
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
|
|
||||||
"global_explanation = explainer.explain_global(x_test)\n",
|
|
||||||
"\n",
|
|
||||||
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
|
|
||||||
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Sorted SHAP values\n",
|
|
||||||
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
|
|
||||||
"# Corresponding feature names\n",
|
|
||||||
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
|
|
||||||
"# Feature ranks (based on original order of features)\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
|
|
||||||
"\n",
|
|
||||||
"# Note: PFIExplainer does not support per class explanations\n",
|
|
||||||
"# Per class feature names\n",
|
|
||||||
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
|
|
||||||
"# Per class feature importance values\n",
|
|
||||||
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Print out a dictionary that holds the sorted feature importance names and values\n",
|
|
||||||
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Explain overall model predictions as a collection of local (instance-level) explanations"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# feature shap values for all features and all data points in the training data\n",
|
|
||||||
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Generate local explanations\n",
|
|
||||||
"Explain local data points (individual instances)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Note: PFIExplainer does not support local explanations\n",
|
|
||||||
"# You can pass a specific data point or a group of data points to the explain_local function\n",
|
|
||||||
"\n",
|
|
||||||
"# E.g., Explain the first data point in the test set\n",
|
|
||||||
"instance_num = 1\n",
|
|
||||||
"local_explanation = explainer.explain_local(x_test[:instance_num])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
|
|
||||||
"prediction_value = clf.predict(x_test)[instance_num]\n",
|
|
||||||
"\n",
|
|
||||||
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
|
|
||||||
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
|
|
||||||
"\n",
|
|
||||||
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
|
|
||||||
"print('local importance names: {}'.format(sorted_local_importance_names))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Visualize\n",
|
|
||||||
"Load the visualization dashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.contrib.interpret.visualize import ExplanationDashboard"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ExplanationDashboard(global_explanation, model, x_test)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Next\n",
|
|
||||||
"Learn about other use cases of the explain package on a:\n",
|
|
||||||
" \n",
|
|
||||||
"1. [Training time: regression problem](./explain-regression-local.ipynb)\n",
|
|
||||||
"1. [Training time: binary classification problem](./explain-binary-classification-local.ipynb)\n",
|
|
||||||
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
|
|
||||||
"1. [Explain models with advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)\n",
|
|
||||||
"1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)\n",
|
|
||||||
"1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)\n",
|
|
||||||
"1. Inferencing time: deploy a classification model and explainer:\n",
|
|
||||||
" 1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
|
|
||||||
" 1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "mesameki"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.8"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,9 +0,0 @@
|
|||||||
name: simple-feature-transformations-explain-local
|
|
||||||
dependencies:
|
|
||||||
- pip:
|
|
||||||
- azureml-sdk
|
|
||||||
- interpret
|
|
||||||
- azureml-interpret
|
|
||||||
- azureml-contrib-interpret
|
|
||||||
- sklearn-pandas
|
|
||||||
- ipywidgets
|
|
||||||
@@ -42,6 +42,8 @@ In this directory, there are two types of notebooks:
|
|||||||
|
|
||||||
1. [pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score): This notebook demonstrates how to run a batch scoring job using Azure Machine Learning pipelines.
|
1. [pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score): This notebook demonstrates how to run a batch scoring job using Azure Machine Learning pipelines.
|
||||||
2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute. This sample also showcases how to use conda dependencies using runconfig when using Pipelines.
|
2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute. This sample also showcases how to use conda dependencies using runconfig when using Pipelines.
|
||||||
3. [nyc-taxi-data-regression-model-building.ipynb](https://aka.ms/pl-nyctaxi-tutorial): This notebook is an AzureML Pipelines version of the previously published two part sample.
|
3. [nyc-taxi-data-regression-model-building.ipynb](https://aka.ms/pl-nyctaxi-tutorial): This notebook is an AzureML Pipelines version of the previously published two part sample.
|
||||||
|
4. [file-dataset-image-inference-mnist.ipynb](https://aka.ms/pl-pr-filedata): This notebook demonstrates how to use ParallelRunStep to process unstructured data (file dataset).
|
||||||
|
5. [tabular-dataset-inference-iris.ipynb](https://aka.ms/pl-pr-tabulardata): This notebook demonstrates how to use ParallelRunStep to process structured data (tabular dataset).
|
||||||
|
|
||||||

|

|
||||||
|
|||||||
@@ -18,5 +18,6 @@ These notebooks below are designed to go in sequence.
|
|||||||
13. [aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb](https://aka.ms/pl-datapath): This notebook showcases how to use DataPath and PipelineParameter in AML Pipeline.
|
13. [aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb](https://aka.ms/pl-datapath): This notebook showcases how to use DataPath and PipelineParameter in AML Pipeline.
|
||||||
14. [aml-pipelines-how-to-use-pipeline-drafts.ipynb](http://aka.ms/pl-pl-draft): This notebook shows how to use Pipeline Drafts. Pipeline Drafts are mutable pipelines which can be used to submit runs and create Published Pipelines.
|
14. [aml-pipelines-how-to-use-pipeline-drafts.ipynb](http://aka.ms/pl-pl-draft): This notebook shows how to use Pipeline Drafts. Pipeline Drafts are mutable pipelines which can be used to submit runs and create Published Pipelines.
|
||||||
15. [aml-pipelines-hot-to-use-modulestep.ipynb](https://aka.ms/pl-modulestep): This notebook shows how to define Module, ModuleVersion and how to use them in an AML Pipeline using ModuleStep.
|
15. [aml-pipelines-hot-to-use-modulestep.ipynb](https://aka.ms/pl-modulestep): This notebook shows how to define Module, ModuleVersion and how to use them in an AML Pipeline using ModuleStep.
|
||||||
|
16. [aml-pipelines-with-notebook-runner-step.ipynb](https://aka.ms/pl-nbrstep): This notebook shows how you can run another notebook as a step in Azure Machine Learning Pipeline.
|
||||||
|
|
||||||

|

|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user