Compare commits

...

161 Commits

Author SHA1 Message Date
vizhur
a6088928ab update samples - test 2019-06-20 20:36:06 +00:00
Roope Astala
56e0ebc5ac Merge pull request #438 from rastala/master
add pipeline scripts
2019-06-19 18:56:42 -04:00
rastala
2aa39f2f4a add pipeline scripts 2019-06-19 18:55:32 -04:00
Roope Astala
4d247c1877 Merge pull request #437 from rastala/master
pytorch with mlflow
2019-06-19 17:23:06 -04:00
rastala
f6682f6f6d pytorch with mlflow 2019-06-19 17:21:52 -04:00
Roope Astala
26ecf25233 Merge pull request #436 from rastala/master
Update readme
2019-06-19 11:52:23 -04:00
Roope Astala
44c3a486c0 update readme 2019-06-19 11:49:49 -04:00
Roope Astala
c574f429b8 update readme 2019-06-19 11:48:52 -04:00
Roope Astala
77d557a5dc Merge pull request #435 from ganzhi/jamgan/drift
Add demo notebook for AML Data Drift
2019-06-17 16:39:46 -04:00
James Gan
13dedec4a4 Make it in same folder as internal repo 2019-06-17 13:38:27 -07:00
James Gan
6f5c52676f Add notebook to demo data drift 2019-06-17 13:33:30 -07:00
James Gan
90c105537c Add demo notebook for AML Data Drift 2019-06-17 13:31:08 -07:00
Roope Astala
ef264b1073 Merge pull request #434 from rastala/master
update pytorch
2019-06-17 11:57:29 -04:00
Roope Astala
824ac5e021 update pytorch 2019-06-17 11:56:42 -04:00
Roope Astala
e9a7b95716 Merge pull request #421 from csteegz/csteegz-add-warning
Add warning for using prediction client on azure notebooks
2019-06-13 20:27:34 -04:00
Roope Astala
789ee26357 Merge pull request #431 from jeff-shepherd/master
Fixed path for auto-ml-remote-amlcompute notebook
2019-06-13 16:56:25 -04:00
Jeff Shepherd
fc541706e7 Fixed path for auto-ml-remote-amlcompute 2019-06-13 13:12:32 -07:00
Roope Astala
64b8aa2a55 Merge pull request #429 from jeff-shepherd/master
Removed deprecated notebooks from readme
2019-06-13 14:40:57 -04:00
Jeff Shepherd
d3dc35dbb6 Removed deprecated notebooks from readme 2019-06-13 11:03:25 -07:00
Roope Astala
b55ac368e7 Merge pull request #428 from rastala/master
update cluster creation
2019-06-13 12:16:30 -04:00
Roope Astala
de162316d7 update cluster creation 2019-06-13 12:14:58 -04:00
Roope Astala
4ecc58dfe2 Merge pull request #427 from rastala/master
dockerfile
2019-06-12 10:24:34 -04:00
Roope Astala
daf27a76e4 dockerfile 2019-06-12 10:23:34 -04:00
Roope Astala
a05444845b Merge pull request #426 from rastala/master
version 1.0.43
2019-06-12 10:09:08 -04:00
Roope Astala
79c9f50c15 version 1.0.43 2019-06-12 10:08:35 -04:00
Roope Astala
67e10e0f6b Merge pull request #417 from lan-tang/patch-1
Create readme.md in data-drift
2019-06-11 13:47:55 -04:00
Roope Astala
1ef0331a0f Merge pull request #423 from rastala/master
add sklearn estimator
2019-06-11 11:30:37 -04:00
Roope Astala
5e91c836b9 add sklearn estimator 2019-06-11 11:29:56 -04:00
Colin Versteeg
661762854a add warning to training 2019-06-10 16:51:33 -07:00
Colin Versteeg
fbc90ba74f add to quickstart 2019-06-10 16:50:59 -07:00
Colin Versteeg
0d9c83d0a8 Update accelerated-models-object-detection.ipynb 2019-06-10 16:48:17 -07:00
Colin Versteeg
ca4cab1de9 Merge pull request #1 from Azure/master
pull from master
2019-06-10 16:45:12 -07:00
Roope Astala
ddbb3c45f6 Merge pull request #420 from rastala/master
mlflow integration preview
2019-06-10 15:12:36 -04:00
rastala
8eed4e39d0 mlflow integration preview 2019-06-10 15:10:57 -04:00
Lan Tang
b37c0297db Create readme.md 2019-06-07 12:32:32 -07:00
Roope Astala
968cc798d0 Update README.md 2019-06-05 12:15:33 -04:00
Roope Astala
5c9ca452fb Create README.md 2019-06-05 12:15:19 -04:00
Shané Winner
5e82680272 Update README.md 2019-05-31 10:58:39 -07:00
Roope Astala
41841fc8c0 Update README.md 2019-05-31 13:00:41 -04:00
Roope Astala
896bf63736 Merge pull request #397 from rastala/master
dockerfile
2019-05-29 11:05:18 -04:00
Roope Astala
d4751bf6ec dockerfile 2019-05-29 11:04:19 -04:00
Roope Astala
3531fe8a21 Merge pull request #396 from rastala/master
version 1.0.41
2019-05-29 11:01:15 -04:00
Roope Astala
db6ae67940 version 1.0.41 2019-05-29 10:59:59 -04:00
Shané Winner
2a479bb01e Merge pull request #395 from imatiach-msft/ilmat/fix-typo
fix typo
2019-05-28 14:02:33 -07:00
Ilya Matiach
d05eec92af fix typo 2019-05-28 16:59:59 -04:00
Josée Martens
70fdab0a28 Update auto-ml-classification-with-deployment.ipynb 2019-05-24 13:45:04 -05:00
Josée Martens
7ce5a43b58 Update auto-ml-classification-with-deployment.ipynb 2019-05-24 13:44:35 -05:00
Josée Martens
d2a9dbb582 Update auto-ml-classification-with-deployment.ipynb 2019-05-24 13:43:38 -05:00
Roope Astala
a5d774683d Merge pull request #390 from rastala/master
fix default cluster creation in config notebook
2019-05-23 12:30:09 -04:00
Roope Astala
0e850f0917 fix default cluster creation in config notebook 2019-05-23 12:27:53 -04:00
Shané Winner
59f34b7179 Delete configtest.ipynb 2019-05-22 10:47:50 -07:00
Shané Winner
2a3cb69004 Create configtest.ipynb 2019-05-22 10:41:16 -07:00
Shané Winner
42894ff81a Delete LICENSE.txt 2019-05-22 10:22:05 -07:00
Shané Winner
2163cab50b Delete LICENSE.txt 2019-05-22 10:21:42 -07:00
Shané Winner
255edb04c0 Rename LICENSE.txt to LICENSE 2019-05-22 10:13:08 -07:00
Shané Winner
cfce079278 Rename LICENSES to LICENSE.txt 2019-05-22 10:06:31 -07:00
Shané Winner
ae6f067c81 Deleted index.html
cleaning up root directory
2019-05-22 10:04:23 -07:00
Shané Winner
1b7ff724f3 Deleted pr.md
Contents of this file moved to the README in the root directory.
2019-05-22 10:03:40 -07:00
Shané Winner
8bba850db1 moved the content in the pr.md file
moved the content in the pr.md file to under 'Projects using Azure Machine Learning'
2019-05-21 07:51:28 -07:00
Shané Winner
b9e35ea0cb Create LICENSE 2019-05-21 07:44:10 -07:00
Shané Winner
ffa28aa89c Delete sdk 2019-05-21 07:43:06 -07:00
Shané Winner
6ab85a20e3 Create LICENSES 2019-05-21 07:42:07 -07:00
Shané Winner
486c44d157 Create sdk 2019-05-21 07:39:43 -07:00
Shané Winner
cd80040dd8 Delete Licenses 2019-05-21 07:39:03 -07:00
Shané Winner
465a5b13b1 Create Licenses 2019-05-21 07:38:52 -07:00
Shané Winner
dcd2d58880 Added notice on the data/telemetry 2019-05-20 14:44:43 -07:00
Roope Astala
93bf4393f2 Merge pull request #381 from jeff-shepherd/master
Revert change to default amlcompute cluster
2019-05-16 15:35:43 -04:00
Jeff Shepherd
d6ebb484a6 Revert change to default amlcomputecluster to support existing resource
groups
2019-05-16 12:27:23 -07:00
Roope Astala
35afd43193 Merge pull request #372 from rogerhe/master
adding macOS specific yml. Install nomkl to workaround openmp issue
2019-05-14 19:07:42 -04:00
Roope Astala
2d68535de2 Merge pull request #376 from rastala/master
version 1.0.39
2019-05-14 16:04:09 -04:00
Roope Astala
0d448892a3 version check 2019-05-14 16:03:39 -04:00
Roope Astala
2d41c00488 version 1.0.39 2019-05-14 16:01:14 -04:00
Roger He
22597ac684 adding macOS specific yml. Install nomkl to workaround openmp issue 2019-05-09 16:51:51 -07:00
Josée Martens
8b1bffc200 Update README.md 2019-05-08 12:36:49 -05:00
Josée Martens
a240ac319f Update README.md 2019-05-08 12:27:57 -05:00
Josée Martens
83cfe3b9b3 Update README.md 2019-05-08 12:25:41 -05:00
Paula Ledgerwood
dcce6f227f Merge pull request #360 from Azure/paledger/update-readme
Update readme/cluster location from PM's instructions
2019-05-06 10:08:22 -07:00
Paula Ledgerwood
5328186d68 Update python kernel version 2019-05-06 09:45:20 -07:00
Paula Ledgerwood
7ccaa2cf57 Update readme from PM's instructions 2019-05-06 09:41:54 -07:00
Shané Winner
56b0664b6b Update img-classification-part1-training.ipynb 2019-05-05 17:47:31 -07:00
Shané Winner
4c1167edc4 Update img-classification-part1-training.ipynb 2019-05-05 17:45:48 -07:00
Shané Winner
eb643fe213 Update README.md 2019-05-05 17:26:29 -07:00
Shané Winner
5faa9d293c Update README.md 2019-05-05 15:34:27 -07:00
Shané Winner
32e2b5f647 Update train-hyperparameter-tune-deploy-with-tensorflow.ipynb 2019-05-05 15:32:19 -07:00
Shané Winner
ae25654882 Update train-hyperparameter-tune-deploy-with-pytorch.ipynb 2019-05-05 15:29:42 -07:00
Shané Winner
0ca05093bd Update train-hyperparameter-tune-deploy-with-keras.ipynb 2019-05-05 15:28:16 -07:00
Shané Winner
5e39582de3 Update train-hyperparameter-tune-deploy-with-chainer.ipynb 2019-05-05 15:24:14 -07:00
Shané Winner
6b6a6da9dc Update tensorboard.ipynb 2019-05-05 15:22:28 -07:00
Shané Winner
cba2c6b9e2 Update how-to-use-estimator.ipynb 2019-05-05 15:20:50 -07:00
Shané Winner
58557abd20 Update export-run-history-to-tensorboard.ipynb 2019-05-05 15:18:48 -07:00
Shané Winner
59452a3141 Update distributed-tensorflow-with-parameter-server.ipynb 2019-05-05 15:17:15 -07:00
Shané Winner
463718e26b Update distributed-tensorflow-with-horovod.ipynb 2019-05-05 15:15:13 -07:00
Shané Winner
9ea0ba5131 Update distributed-pytorch-with-horovod.ipynb 2019-05-05 15:13:28 -07:00
Shané Winner
2804a8d859 Update distributed-cntk-with-custom-docker.ipynb 2019-05-05 15:11:51 -07:00
Shané Winner
4761b668ff Update distributed-chainer.ipynb 2019-05-05 15:09:28 -07:00
Shané Winner
c4163017c2 Update using-environments.ipynb 2019-05-05 00:11:40 -07:00
Shané Winner
71e8e9bd23 Update train-within-notebook.ipynb 2019-05-05 00:09:26 -07:00
Shané Winner
6ff06dd137 Update train-on-remote-vm.ipynb 2019-05-05 00:06:23 -07:00
Shané Winner
73db8ae04d Update train-on-local.ipynb 2019-05-04 23:52:01 -07:00
Shané Winner
3637dce58a Update train-on-amlcompute.ipynb 2019-05-04 23:48:16 -07:00
Shané Winner
23771fc599 added tracking pixel and edited config text 2019-05-04 21:08:10 -07:00
Shané Winner
5f04a467b7 added tracking pixel 2019-05-04 21:03:08 -07:00
Shané Winner
532f65c998 added tracking pixel and edited config text 2019-05-04 20:59:50 -07:00
Shané Winner
f36dda0c2d added tracking pixel and edited the config text 2019-05-04 20:54:32 -07:00
Shané Winner
c7b56929bc added tracking pixel and edited config text 2019-05-04 20:50:57 -07:00
Shané Winner
5f19d75a42 added tracking pixel and edited the config text 2019-05-04 20:48:04 -07:00
Shané Winner
a1968aafa2 updated config text and added tracking pixel 2019-05-04 20:43:54 -07:00
Shané Winner
6b82991017 edited config text and added tracking pixel 2019-05-04 20:40:23 -07:00
Shané Winner
725013511e added tracking pixel 2019-05-04 20:34:58 -07:00
Shané Winner
6a20160173 added tracking pixel 2019-05-04 20:02:01 -07:00
Shané Winner
137db8aec0 added tracking pixel 2019-05-04 19:49:50 -07:00
Shané Winner
b7b10c394b added tracking pixel 2019-05-04 19:47:28 -07:00
Shané Winner
46206716a4 added tracking pixel 2019-05-04 19:44:23 -07:00
Shané Winner
92bb98ac62 added tracking pixel 2019-05-04 19:41:33 -07:00
Shané Winner
b398c24262 added tracking pixel 2019-05-04 19:38:28 -07:00
Shané Winner
e0618302e3 added tracking pixel 2019-05-04 19:35:57 -07:00
Shané Winner
b6cddafa3e edited config text and added the pixel tracker 2019-05-04 19:31:59 -07:00
Shané Winner
4188bd2474 updated the config text and added the tracking pixel 2019-05-04 19:25:26 -07:00
Shané Winner
69126edfcb update config text and added tracking pixel 2019-05-04 19:20:46 -07:00
Shané Winner
4e14c35b9b added pixel tracker 2019-05-04 16:31:07 -07:00
Shané Winner
1608c19aa6 updated tracking pixel and and config text 2019-05-04 15:12:53 -07:00
Shané Winner
46b8611b74 tracking pixel and edited config text 2019-05-04 15:08:57 -07:00
Shané Winner
fbb01bde70 update the config text and added pixel tracker server 2019-05-04 15:01:35 -07:00
Shané Winner
cefe2f0811 updated the config text and added the tracking pixel 2019-05-04 14:58:45 -07:00
Shané Winner
42e0a31f88 updated the config text and the tracking pixel 2019-05-04 14:54:37 -07:00
Shané Winner
8b0998ac9f updated the config text and the tracking pixel 2019-05-04 14:49:29 -07:00
Shané Winner
046c6051fb updated config text and added tracking pixel 2019-05-04 14:38:39 -07:00
Shané Winner
bdb7db15ef updated tracking pixel and the config text 2019-05-04 14:35:28 -07:00
Shané Winner
b13139f103 update the config text and the tracking pixel 2019-05-04 14:31:25 -07:00
Shané Winner
8adb206ae3 updated config text and pixel tracker 2019-05-04 13:56:09 -07:00
Shané Winner
484b6bbb7a updated the config text and pixel server 2019-05-04 13:51:12 -07:00
Shané Winner
55ef0bda6a updated config text 2019-05-04 13:46:43 -07:00
Shané Winner
1401cdef33 updated config text 2019-05-04 13:41:34 -07:00
Shané Winner
5d02206cbd updated with tracking pixel 2019-05-04 13:34:11 -07:00
Shané Winner
c24b65d4ae updated with tracking pixel 2019-05-04 13:32:14 -07:00
Shané Winner
57c5ef318f updated with pixel tracker 2019-05-04 13:25:11 -07:00
Shané Winner
ba033d72f8 Update train-in-spark.ipynb 2019-05-04 09:33:07 -07:00
Shané Winner
aa657ac528 Update manage-runs.ipynb 2019-05-04 09:29:00 -07:00
Shané Winner
7d8289679d added the tracking pixel and the edited the config text 2019-05-04 08:40:18 -07:00
Shané Winner
a7c3db0560 Update model-register-and-deploy.ipynb 2019-05-03 23:21:58 -07:00
Shané Winner
e548847881 pixel text and config text update 2019-05-03 23:20:57 -07:00
Shané Winner
08c6b1f4ed tracking pixel test 2019-05-03 23:15:28 -07:00
Shané Winner
78abb65f5e updated configuration text 2019-05-03 23:08:55 -07:00
Shané Winner
3c6c090732 Update README.md 2019-05-03 22:54:31 -07:00
Shané Winner
513e36d9b2 updated the config verbiage and tracking pixel 2019-05-03 22:54:02 -07:00
Ilya Matiach
9db91a7fb8 Merge pull request #351 from imatiach-msft/ilmat/update-raw-features-notebook
Update raw features explanation notebook
2019-05-03 12:47:28 -04:00
Roope Astala
d9b26b655b Merge pull request #356 from rastala/master
how to use environments
2019-05-03 10:27:33 -04:00
Roope Astala
cb8dc41766 how to use environments 2019-05-03 10:25:39 -04:00
Ilya Matiach
9c9b4bb122 Update raw features explanation notebook 2019-05-02 14:29:53 -04:00
Roope Astala
f5c896c70f Merge pull request #345 from csteegz/add-gpu-deploy
Create production-deploy-to-aks-gpu.ipynb
2019-05-02 14:13:50 -04:00
Colleen Forbes
3b572eddb2 Merge pull request #350 from MayMSFT/master
add dataset tutorial
2019-05-02 09:33:25 -07:00
May Hu
51523db294 add dataset tutorial 2019-05-02 09:07:11 -07:00
Ilya Matiach
3b4998941c Merge pull request #348 from imatiach-msft/ilmat/update-explain-model-nb
updating model explanation notebooks
2019-04-30 17:27:44 -04:00
Ilya Matiach
6cdbfb8722 updating model explanation notebooks 2019-04-30 17:12:54 -04:00
Colin Versteeg
c086bd69c7 Create production-deploy-to-aks-gpu.ipynb
Add deploy to aks GPU notebook
2019-04-29 16:26:42 -07:00
Shané Winner
279c9b8dc4 Pixel Tracker 2019-04-29 11:27:03 -07:00
Shané Winner
98589fe335 Testing Pixel Tracker 2019-04-29 11:16:08 -07:00
Shané Winner
77f21058a2 Testing Pixel Tracker 2019-04-29 11:04:05 -07:00
Roope Astala
baa65d0886 Merge pull request #343 from Azure/paledger/add-accel-models
Initial commit to add AccelModels notebooks from AzureMlCli repo
2019-04-29 13:56:06 -04:00
Paula Ledgerwood
0fffa11b2a Update links and code formatting 2019-04-29 10:20:55 -07:00
Paula Ledgerwood
20ec225343 Initial commit to add notebooks from AzureMlCli repo 2019-04-26 11:16:33 -07:00
162 changed files with 18966 additions and 4440 deletions

View File

@@ -0,0 +1,29 @@
FROM continuumio/miniconda:4.5.11
# install git
RUN apt-get update && apt-get upgrade -y && apt-get install -y git
# create a new conda environment named azureml
RUN conda create -n azureml -y -q Python=3.6
# install additional packages used by sample notebooks. this is optional
RUN ["/bin/bash", "-c", "source activate azureml && conda install -y tqdm cython matplotlib scikit-learn"]
# install azurmel-sdk components
RUN ["/bin/bash", "-c", "source activate azureml && pip install azureml-sdk[notebooks]==1.0.41"]
# clone Azure ML GitHub sample notebooks
RUN cd /home && git clone -b "azureml-sdk-1.0.41" --single-branch https://github.com/Azure/MachineLearningNotebooks.git
# generate jupyter configuration file
RUN ["/bin/bash", "-c", "source activate azureml && mkdir ~/.jupyter && cd ~/.jupyter && jupyter notebook --generate-config"]
# set an emtpy token for Jupyter to remove authentication.
# this is NOT recommended for production environment
RUN echo "c.NotebookApp.token = ''" >> ~/.jupyter/jupyter_notebook_config.py
# open up port 8887 on the container
EXPOSE 8887
# start Jupyter notebook server on port 8887 when the container starts
CMD /bin/bash -c "cd /home/MachineLearningNotebooks && source activate azureml && jupyter notebook --port 8887 --no-browser --ip 0.0.0.0 --allow-root"

View File

@@ -0,0 +1,29 @@
FROM continuumio/miniconda:4.5.11
# install git
RUN apt-get update && apt-get upgrade -y && apt-get install -y git
# create a new conda environment named azureml
RUN conda create -n azureml -y -q Python=3.6
# install additional packages used by sample notebooks. this is optional
RUN ["/bin/bash", "-c", "source activate azureml && conda install -y tqdm cython matplotlib scikit-learn"]
# install azurmel-sdk components
RUN ["/bin/bash", "-c", "source activate azureml && pip install azureml-sdk[notebooks]==1.0.43"]
# clone Azure ML GitHub sample notebooks
RUN cd /home && git clone -b "azureml-sdk-1.0.43" --single-branch https://github.com/Azure/MachineLearningNotebooks.git
# generate jupyter configuration file
RUN ["/bin/bash", "-c", "source activate azureml && mkdir ~/.jupyter && cd ~/.jupyter && jupyter notebook --generate-config"]
# set an emtpy token for Jupyter to remove authentication.
# this is NOT recommended for production environment
RUN echo "c.NotebookApp.token = ''" >> ~/.jupyter/jupyter_notebook_config.py
# open up port 8887 on the container
EXPOSE 8887
# start Jupyter notebook server on port 8887 when the container starts
CMD /bin/bash -c "cd /home/MachineLearningNotebooks && source activate azureml && jupyter notebook --port 8887 --no-browser --ip 0.0.0.0 --allow-root"

View File

@@ -1,3 +1,4 @@
This software is made available to you on the condition that you agree to This software is made available to you on the condition that you agree to
[your agreement][1] governing your use of Azure. [your agreement][1] governing your use of Azure.
If you do not have an existing agreement governing your use of Azure, you agree that If you do not have an existing agreement governing your use of Azure, you agree that

View File

@@ -24,8 +24,8 @@ pip install azureml-sdk
git clone https://github.com/Azure/MachineLearningNotebooks.git git clone https://github.com/Azure/MachineLearningNotebooks.git
# below steps are optional # below steps are optional
# install the base SDK and a Jupyter notebook server # install the base SDK, Jupyter notebook server and tensorboard
pip install azureml-sdk[notebooks] pip install azureml-sdk[notebooks,tensorboard]
# install model explainability component # install model explainability component
pip install azureml-sdk[explain] pip install azureml-sdk[explain]

View File

@@ -4,6 +4,10 @@ This repository contains example notebooks demonstrating the [Azure Machine Lear
![Azure ML workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/overview-what-is-azure-ml/aml.png) ![Azure ML workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/overview-what-is-azure-ml/aml.png)
## News
* [Try Azure Machine Learning with MLflow](./how-to-use-azureml/using-mlflow)
## Quick installation ## Quick installation
```sh ```sh
pip install azureml-sdk pip install azureml-sdk
@@ -11,7 +15,7 @@ pip install azureml-sdk
Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker. Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
## How to navigate and use the example notebooks? ## How to navigate and use the example notebooks?
You should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
If you want to... If you want to...
@@ -20,7 +24,7 @@ If you want to...
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb). * ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb). * ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb). * ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-mpi-batch-prediction.ipynb). * ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb). * ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
## Tutorials ## Tutorials
@@ -52,5 +56,18 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
Visit following repos to see projects contributed by Azure ML users: Visit following repos to see projects contributed by Azure ML users:
- [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis.
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT) - [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion) - [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
## Data/Telemetry
This repository collects usage data and sends it to Mircosoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement)
To opt out of tracking, please go to the raw markdown or .ipynb files and remove the following line of code:
```sh
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/README.png)"
```
This URL will be slightly different depending on the file.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/README.png)

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/configuration.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -96,7 +103,7 @@
"source": [ "source": [
"import azureml.core\n", "import azureml.core\n",
"\n", "\n",
"print(\"This notebook was created using version 1.0.23 of the Azure ML SDK\")\n", "print(\"This notebook was created using version 1.0.43 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
] ]
}, },
@@ -268,14 +275,14 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your CPU cluster\n", "# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpucluster\"\n", "cpu_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"# Verify that cluster does not exist already\n", "# Verify that cluster does not exist already\n",
"try:\n", "try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print(\"Found existing cpucluster\")\n", " print(\"Found existing cpu-cluster\")\n",
"except ComputeTargetException:\n", "except ComputeTargetException:\n",
" print(\"Creating new cpucluster\")\n", " print(\"Creating new cpu-cluster\")\n",
" \n", " \n",
" # Specify the configuration for the new cluster\n", " # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n", " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
@@ -306,14 +313,14 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your GPU cluster\n", "# Choose a name for your GPU cluster\n",
"gpu_cluster_name = \"gpucluster\"\n", "gpu_cluster_name = \"gpu-cluster\"\n",
"\n", "\n",
"# Verify that cluster does not exist already\n", "# Verify that cluster does not exist already\n",
"try:\n", "try:\n",
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n", " gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
" print(\"Found existing gpu cluster\")\n", " print(\"Found existing gpu cluster\")\n",
"except ComputeTargetException:\n", "except ComputeTargetException:\n",
" print(\"Creating new gpucluster\")\n", " print(\"Creating new gpu-cluster\")\n",
" \n", " \n",
" # Specify the configuration for the new cluster\n", " # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n", " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",

View File

@@ -15,6 +15,3 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not
* [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service. * [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service.
Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/). Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/README.png)

View File

@@ -115,16 +115,7 @@ jupyter notebook
- Simple example of using automated ML for regression - Simple example of using automated ML for regression
- Uses local compute for training - Uses local compute for training
- [auto-ml-remote-execution.ipynb](remote-execution/auto-ml-remote-execution.ipynb) - [auto-ml-remote-amlcompute.ipynb](remote-amlcompute/auto-ml-remote-amlcompute.ipynb)
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
- Example of using automated ML for classification using a remote linux DSVM for training
- Parallel execution of iterations
- Async tracking of progress
- Cancelling individual iterations or entire run
- Retrieving models for any iteration or logged metric
- Specify automated ML settings as kwargs
- [auto-ml-remote-amlcompute.ipynb](remote-batchai/auto-ml-remote-amlcompute.ipynb)
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits) - Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
- Example of using automated ML for classification using remote AmlCompute for training - Example of using automated ML for classification using remote AmlCompute for training
- Parallel execution of iterations - Parallel execution of iterations
@@ -133,12 +124,6 @@ jupyter notebook
- Retrieving models for any iteration or logged metric - Retrieving models for any iteration or logged metric
- Specify automated ML settings as kwargs - Specify automated ML settings as kwargs
- [auto-ml-remote-attach.ipynb](remote-attach/auto-ml-remote-attach.ipynb)
- Dataset: Scikit learn's [20newsgroup](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html)
- handling text data with preprocess flag
- Reading data from a blob store for remote executions
- using pandas dataframes for reading data
- [auto-ml-missing-data-blacklist-early-termination.ipynb](missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb) - [auto-ml-missing-data-blacklist-early-termination.ipynb](missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb)
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits) - Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
- Blacklist certain pipelines - Blacklist certain pipelines
@@ -156,10 +141,6 @@ jupyter notebook
- Get details for a automated ML Run. (automated ML settings, run widget & all metrics) - Get details for a automated ML Run. (automated ML settings, run widget & all metrics)
- Download fitted pipeline for any iteration - Download fitted pipeline for any iteration
- [auto-ml-remote-execution-with-datastore.ipynb](remote-execution-with-datastore/auto-ml-remote-execution-with-datastore.ipynb)
- Dataset: Scikit learn's [20newsgroup](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html)
- Download the data and store it in DataStore.
- [auto-ml-classification-with-deployment.ipynb](classification-with-deployment/auto-ml-classification-with-deployment.ipynb) - [auto-ml-classification-with-deployment.ipynb](classification-with-deployment/auto-ml-classification-with-deployment.ipynb)
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits) - Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
- Simple example of using automated ML for classification - Simple example of using automated ML for classification

View File

@@ -10,7 +10,7 @@ dependencies:
- urllib3<1.24 - urllib3<1.24
- scipy>=1.0.0,<=1.1.0 - scipy>=1.0.0,<=1.1.0
- scikit-learn>=0.19.0,<=0.20.3 - scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<0.23.0 - pandas>=0.22.0,<=0.23.4
- py-xgboost<=0.80 - py-xgboost<=0.80
- pip: - pip:

View File

@@ -0,0 +1,22 @@
name: azure_automl
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- nomkl
- python>=3.5.2,<3.6.8
- nb_conda
- matplotlib==2.1.0
- numpy>=1.11.0,<=1.16.2
- cython
- urllib3<1.24
- scipy>=1.0.0,<=1.1.0
- scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<0.23.0
- py-xgboost<=0.80
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-sdk[automl,explain]
- azureml-widgets
- pandas_ml

View File

@@ -12,7 +12,7 @@ fi
if [ "$AUTOML_ENV_FILE" == "" ] if [ "$AUTOML_ENV_FILE" == "" ]
then then
AUTOML_ENV_FILE="automl_env.yml" AUTOML_ENV_FILE="automl_env_mac.yml"
fi fi
if [ ! -f $AUTOML_ENV_FILE ]; then if [ ! -f $AUTOML_ENV_FILE ]; then

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -66,11 +73,12 @@
"import numpy as np\n", "import numpy as np\n",
"import pandas as pd\n", "import pandas as pd\n",
"from sklearn import datasets\n", "from sklearn import datasets\n",
"from sklearn.model_selection import train_test_split\n",
"\n", "\n",
"import azureml.core\n", "import azureml.core\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig" "from azureml.train.automl import AutoMLConfig, constants"
] ]
}, },
{ {
@@ -106,7 +114,7 @@
"source": [ "source": [
"## Data\n", "## Data\n",
"\n", "\n",
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method." "This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
] ]
}, },
{ {
@@ -115,11 +123,17 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"digits = datasets.load_digits()\n", "iris = datasets.load_iris()\n",
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
" iris.target, \n",
" test_size=0.2, \n",
" random_state=0)\n",
"\n", "\n",
"# Exclude the first 100 rows from training so that they can be used for test.\n", "# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
"X_train = digits.data[100:,:]\n", "# This is needed for initializing the input variable names of ONNX model, \n",
"y_train = digits.target[100:]" "# and the prediction with the ONNX model using the inference helper.\n",
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
] ]
}, },
{ {
@@ -155,9 +169,10 @@
" primary_metric = 'AUC_weighted',\n", " primary_metric = 'AUC_weighted',\n",
" iteration_timeout_minutes = 60,\n", " iteration_timeout_minutes = 60,\n",
" iterations = 10,\n", " iterations = 10,\n",
" verbosity = logging.INFO,\n", " verbosity = logging.INFO, \n",
" X = X_train, \n", " X = X_train, \n",
" y = y_train,\n", " y = y_train,\n",
" preprocess=True,\n",
" enable_onnx_compatible_models=True,\n", " enable_onnx_compatible_models=True,\n",
" path = project_folder)" " path = project_folder)"
] ]
@@ -249,10 +264,69 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.automl._vendor.automl.client.core.common.onnx_convert import OnnxConverter\n", "from azureml.automl.core.onnx_convert import OnnxConverter\n",
"onnx_fl_path = \"./best_model.onnx\"\n", "onnx_fl_path = \"./best_model.onnx\"\n",
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)" "OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predict with the ONNX model, using onnxruntime package"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"import json\n",
"from azureml.automl.core.onnx_convert import OnnxConvertConstants\n",
"\n",
"if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n",
" python_version_compatible = True\n",
"else:\n",
" python_version_compatible = False\n",
"\n",
"try:\n",
" import onnxruntime\n",
" from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n",
" onnxrt_present = True\n",
"except ImportError:\n",
" onnxrt_present = False\n",
"\n",
"def get_onnx_res(run):\n",
" res_path = '_debug_y_trans_converter.json'\n",
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
" with open(res_path) as f:\n",
" onnx_res = json.load(f)\n",
" return onnx_res\n",
"\n",
"if onnxrt_present and python_version_compatible: \n",
" mdl_bytes = onnx_mdl.SerializeToString()\n",
" onnx_res = get_onnx_res(best_run)\n",
"\n",
" onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n",
" pred_onnx, pred_prob_onnx = onnxrt_helper.predict(X_test)\n",
"\n",
" print(pred_onnx)\n",
" print(pred_prob_onnx)\n",
"else:\n",
" if not python_version_compatible:\n",
" print('Please use Python version 3.6 to run the inference helper.') \n",
" if not onnxrt_present:\n",
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
} }
], ],
"metadata": { "metadata": {

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-with-whitelisting/auto-ml-classification-with-whitelisting.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -328,6 +335,12 @@
" print()\n", " print()\n",
" for estimator in step[1].estimators:\n", " for estimator in step[1].estimators:\n",
" print_model(estimator[1], estimator[0]+ ' - ')\n", " print_model(estimator[1], estimator[0]+ ' - ')\n",
" elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):\n",
" print(\"\\nMeta Learner\")\n",
" pprint(step[1]._meta_learner)\n",
" print()\n",
" for estimator in step[1]._base_learners:\n",
" print_model(estimator[1], estimator[0]+ ' - ')\n",
" else:\n", " else:\n",
" pprint(step[1].get_params())\n", " pprint(step[1].get_params())\n",
" print()\n", " print()\n",

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -117,21 +124,12 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n", "# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n", "# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
"X = dprep.auto_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
"\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n", "# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n", "# and convert column types manually.\n",
"# Here we read a comma delimited file and convert all columns to integers.\n", "example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))" "dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
] "dflow.get_profile()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
] ]
}, },
{ {
@@ -140,7 +138,30 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X.skip(1).head(5)" "# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
"dflow = dflow.drop_nulls('Primary Type')\n",
"dflow.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
] ]
}, },
{ {
@@ -162,9 +183,8 @@
" \"iteration_timeout_minutes\" : 10,\n", " \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n", " \"iterations\" : 2,\n",
" \"primary_metric\" : 'AUC_weighted',\n", " \"primary_metric\" : 'AUC_weighted',\n",
" \"preprocess\" : False,\n", " \"preprocess\" : True,\n",
" \"verbosity\" : logging.INFO,\n", " \"verbosity\" : logging.INFO\n",
" \"n_cross_validations\": 3\n",
"}" "}"
] ]
}, },
@@ -172,7 +192,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create or Attach a Remote Linux DSVM" "### Create or Attach an AmlCompute cluster"
] ]
}, },
{ {
@@ -181,21 +201,36 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"dsvm_name = 'mydsvmc'\n", "from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"\n", "\n",
"try:\n", "# Choose a name for your cluster.\n",
" while ws.compute_targets[dsvm_name].provisioning_state == 'Creating':\n", "amlcompute_cluster_name = \"cpu-cluster\"\n",
" time.sleep(1)\n", "\n",
" \n", "found = False\n",
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n", "\n",
" print('Found existing DVSM.')\n", "# Check if this compute target already exists in the workspace.\n",
"except:\n", "\n",
" print('Creating a new DSVM.')\n", "cts = ws.compute_targets\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n", " found = True\n",
" dsvm_compute.wait_for_completion(show_output = True)\n", " print('Found existing compute target.')\n",
" print(\"Waiting one minute for ssh to be accessible\")\n", " compute_target = cts[amlcompute_cluster_name]\n",
" time.sleep(90) # Wait for ssh to be accessible" "\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
" # For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -207,9 +242,13 @@
"from azureml.core.runconfig import RunConfiguration\n", "from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n", "from azureml.core.conda_dependencies import CondaDependencies\n",
"\n", "\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n", "conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n", "\n",
"conda_run_config.target = dsvm_compute\n", "# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n", "cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
@@ -257,6 +296,44 @@
"remote_run" "remote_run"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pre-process cache cleanup\n",
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run.clean_preprocessor_cache()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cancelling Runs\n",
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
"# remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2.\n",
"# remote_run.cancel_iteration(1)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -376,7 +453,8 @@
"source": [ "source": [
"## Test\n", "## Test\n",
"\n", "\n",
"#### Load Test Data" "#### Load Test Data\n",
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
] ]
}, },
{ {
@@ -385,12 +463,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from sklearn import datasets\n", "dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
"\n", "dflow_test = dflow_test.drop_nulls('Primary Type')"
"digits = datasets.load_digits()\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
] ]
}, },
{ {
@@ -398,7 +472,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Testing Our Best Fitted Model\n", "#### Testing Our Best Fitted Model\n",
"We will try to predict 2 digits and see how our model works." "We will use confusion matrix to see how our model works."
] ]
}, },
{ {
@@ -407,65 +481,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#Randomly select digits and test\n", "from pandas_ml import ConfusionMatrix\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"\n", "\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n", "y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
" print(index)\n", "X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Appendix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
"\n", "\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage." "\n",
] "ypred = fitted_model.predict(X_test)\n",
}, "\n",
{ "cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
"cell_type": "code", "\n",
"execution_count": null, "print(cm)\n",
"metadata": {}, "\n",
"outputs": [], "cm.plot()"
"source": [
"# sklearn.digits.data + target\n",
"digits_complete = dprep.auto_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(digits_complete.to_pandas_dataframe().shape)\n",
"labels_column = 'Column64'\n",
"dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
"dflow_y = digits_complete.keep_columns(columns = [labels_column])"
] ]
} }
], ],

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/dataprep/auto-ml-dataprep.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -115,23 +122,12 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n", "# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n", "# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
"X = dprep.auto_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
"\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n", "# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n", "# and convert column types manually.\n",
"# Here we read a comma delimited file and convert all columns to integers.\n", "example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))" "dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
] "dflow.get_profile()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
] ]
}, },
{ {
@@ -140,7 +136,30 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X.skip(1).head(5)" "# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
"dflow = dflow.drop_nulls('Primary Type')\n",
"dflow.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
] ]
}, },
{ {
@@ -162,7 +181,7 @@
" \"iteration_timeout_minutes\" : 10,\n", " \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n", " \"iterations\" : 2,\n",
" \"primary_metric\" : 'AUC_weighted',\n", " \"primary_metric\" : 'AUC_weighted',\n",
" \"preprocess\" : False,\n", " \"preprocess\" : True,\n",
" \"verbosity\" : logging.INFO\n", " \"verbosity\" : logging.INFO\n",
"}" "}"
] ]
@@ -326,7 +345,8 @@
"source": [ "source": [
"## Test\n", "## Test\n",
"\n", "\n",
"#### Load Test Data" "#### Load Test Data\n",
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
] ]
}, },
{ {
@@ -335,12 +355,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from sklearn import datasets\n", "dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
"\n", "dflow_test = dflow_test.drop_nulls('Primary Type')"
"digits = datasets.load_digits()\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
] ]
}, },
{ {
@@ -348,7 +364,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Testing Our Best Fitted Model\n", "#### Testing Our Best Fitted Model\n",
"We will try to predict 2 digits and see how our model works." "We will use confusion matrix to see how our model works."
] ]
}, },
{ {
@@ -357,65 +373,18 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#Randomly select digits and test\n", "from pandas_ml import ConfusionMatrix\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"\n", "\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n", "y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
" print(index)\n", "X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Appendix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
"\n", "\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage." "ypred = fitted_model.predict(X_test)\n",
] "\n",
}, "cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
{ "\n",
"cell_type": "code", "print(cm)\n",
"execution_count": null, "\n",
"metadata": {}, "cm.plot()"
"outputs": [],
"source": [
"# sklearn.digits.data + target\n",
"digits_complete = dprep.auto_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(digits_complete.to_pandas_dataframe().shape)\n",
"labels_column = 'Column64'\n",
"dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
"dflow_y = digits_complete.keep_columns(columns = [labels_column])"
] ]
} }
], ],

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/exploring-previous-runs/auto-ml-exploring-previous-runs.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -220,7 +227,7 @@
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n", "|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n",
"|**country**|The country used to generate holiday features. These should be ISO 3166 two-letter country codes (i.e. 'US', 'GB').|\n", "|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. " "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
] ]
}, },
@@ -235,8 +242,8 @@
" \"time_column_name\": time_column_name,\n", " \"time_column_name\": time_column_name,\n",
" # these columns are a breakdown of the total and therefore a leak\n", " # these columns are a breakdown of the total and therefore a leak\n",
" \"drop_column_names\": ['casual', 'registered'],\n", " \"drop_column_names\": ['casual', 'registered'],\n",
" # knowing the country allows Automated ML to bring in holidays\n", " # knowing the country/region allows Automated ML to bring in holidays\n",
" \"country\" : 'US',\n", " \"country_or_region\" : 'US',\n",
" \"max_horizon\" : max_horizon,\n", " \"max_horizon\" : max_horizon,\n",
" \"target_lags\": 1 \n", " \"target_lags\": 1 \n",
"}\n", "}\n",

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -129,7 +136,7 @@
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute import ComputeTarget\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlcl\"\n", "amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"found = False\n", "found = False\n",
"# Check if this compute target already exists in the workspace.\n", "# Check if this compute target already exists in the workspace.\n",
@@ -138,30 +145,23 @@
" found = True\n", " found = True\n",
" print('Found existing compute target.')\n", " print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n", " compute_target = cts[amlcompute_cluster_name]\n",
" \n", "\n",
"if not found:\n", "if not found:\n",
" print('Creating a new compute target...')\n", " print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n", " #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n", " max_nodes = 6)\n",
"\n", "\n",
" # Create the cluster.\n", " # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n", "\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n", " # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n", " # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", " compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n", "\n",
" # For a more detailed view of current AmlCompute status, use get_status()." " # For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -1,558 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Remote Execution using attach**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we use the scikit-learn's [20newsgroup](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups.html) to showcase how you can use AutoML to handle text data with remote attach.\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Attach an existing DSVM to a workspace.\n",
"3. Configure AutoML using `AutoMLConfig`.\n",
"4. Train the model using the DSVM.\n",
"5. Explore the results.\n",
"6. Viewing the engineered names for featurized data and featurization summary for all raw features.\n",
"7. Test the best fitted model.\n",
"\n",
"In addition this notebook showcases the following features\n",
"- **Parallel** executions for iterations\n",
"- **Asynchronous** tracking of progress\n",
"- **Cancellation** of individual iterations or the entire run\n",
"- Retrieving models for any iteration or logged metric\n",
"- Specifying AutoML settings as `**kwargs`\n",
"- Handling **text** data using the `preprocess` flag"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'automl-remote-attach'\n",
"project_folder = './sample_projects/automl-remote-attach'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Attach a Remote Linux DSVM\n",
"To use a remote Docker compute target:\n",
"1. Create a Linux DSVM in Azure, following these [instructions](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro). Make sure you use the Ubuntu flavor (not CentOS). Make sure that disk space is available under `/tmp` because AutoML creates files under `/tmp/azureml_run`s. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4GB per core.\n",
"2. Enter the IP address, user name and password below.\n",
"\n",
"**Note:** By default, SSH runs on port 22 and you don't need to change the port number below. If you've configured SSH to use a different port, change `dsvm_ssh_port` accordinglyaddress. [Read more](https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/detailed-troubleshoot-ssh-connection) on changing SSH ports for security reasons."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, RemoteCompute\n",
"import time\n",
"\n",
"# Add your VM information below\n",
"# If a compute with the specified compute_name already exists, it will be used and the dsvm_ip_addr, dsvm_ssh_port, \n",
"# dsvm_username and dsvm_password will be ignored.\n",
"compute_name = 'mydsvmb'\n",
"dsvm_ip_addr = '<<ip_addr>>'\n",
"dsvm_ssh_port = 22\n",
"dsvm_username = '<<username>>'\n",
"dsvm_password = '<<password>>'\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" print('Using existing compute.')\n",
" dsvm_compute = ws.compute_targets[compute_name]\n",
"else:\n",
" attach_config = RemoteCompute.attach_configuration(address=dsvm_ip_addr, username=dsvm_username, password=dsvm_password, ssh_port=dsvm_ssh_port)\n",
" ComputeTarget.attach(workspace=ws, name=compute_name, attach_configuration=attach_config)\n",
"\n",
" while ws.compute_targets[compute_name].provisioning_state == 'Creating':\n",
" time.sleep(1)\n",
"\n",
" dsvm_compute = ws.compute_targets[compute_name]\n",
" \n",
" if dsvm_compute.provisioning_state == 'Failed':\n",
" print('Attached failed.')\n",
" print(dsvm_compute.provisioning_errors)\n",
" dsvm_compute.detach()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import pkg_resources\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to the Linux DSVM\n",
"conda_run_config.target = dsvm_compute\n",
"\n",
"pandas_dependency = 'pandas==' + pkg_resources.get_distribution(\"pandas\").version\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80',pandas_dependency])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"In this example, the `get_data()` function returns a [dictionary](README.md#getdata)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile $project_folder/get_data.py\n",
"\n",
"import numpy as np\n",
"from sklearn.datasets import fetch_20newsgroups\n",
"\n",
"def get_data():\n",
" remove = ('headers', 'footers', 'quotes')\n",
" categories = [\n",
" 'alt.atheism',\n",
" 'talk.religion.misc',\n",
" 'comp.graphics',\n",
" 'sci.space',\n",
" ]\n",
" data_train = fetch_20newsgroups(subset = 'train', categories = categories,\n",
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
" \n",
" X_train = np.array(data_train.data).reshape((len(data_train.data),1))\n",
" y_train = np.array(data_train.target)\n",
" \n",
" return { \"X\" : X_train, \"y\" : y_train }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
"\n",
"**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n",
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
"|**enable_cache**|Setting this to *True* enables preprocess done once and reuse the same preprocessed data for all the iterations. Default value is True.\n",
"|**max_cores_per_iteration**|Indicates how many cores on the compute target would be used to train a single pipeline.<br>Default is *1*; you can set it to *-1* to use all cores.|"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\": 60,\n",
" \"iterations\": 4,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": True,\n",
" \"max_cores_per_iteration\": 2\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" path = project_folder,\n",
" run_configuration=conda_run_config,\n",
" data_script = project_folder + \"/get_data.py\",\n",
" **automl_settings\n",
" )\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pre-process cache cleanup\n",
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run.clean_preprocessor_cache()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(remote_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cancelling Runs\n",
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
"# remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2.\n",
"# remote_run.cancel_iteration(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### View the engineered names for featurized data\n",
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### View the featurization summary\n",
"Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
"- Raw feature name\n",
"- Number of engineered features formed out of this raw feature\n",
"- Type detected\n",
"- If feature was dropped\n",
"- List of feature transformations for the raw feature"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model which has the smallest `accuracy` value:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# lookup_metric = \"accuracy\"\n",
"# best_run, fitted_model = remote_run.get_output(metric = lookup_metric)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a Specific Iteration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 0\n",
"zero_run, zero_model = remote_run.get_output(iteration = iteration)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load test data.\n",
"from pandas_ml import ConfusionMatrix\n",
"from sklearn.datasets import fetch_20newsgroups\n",
"\n",
"remove = ('headers', 'footers', 'quotes')\n",
"categories = [\n",
" 'alt.atheism',\n",
" 'talk.religion.misc',\n",
" 'comp.graphics',\n",
" 'sci.space',\n",
" ]\n",
"\n",
"data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
"\n",
"X_test = np.array(data_test.data).reshape((len(data_test.data),1))\n",
"y_test = data_test.target\n",
"\n",
"# Test our best pipeline.\n",
"\n",
"y_pred = fitted_model.predict(X_test)\n",
"y_pred_strings = [data_test.target_names[i] for i in y_pred]\n",
"y_test_strings = [data_test.target_names[i] for i in y_test]\n",
"\n",
"cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n",
"print(cm)\n",
"cm.plot()"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,555 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Remote Execution using AmlCompute**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Create or Attach existing AmlCompute to a workspace.\n",
"3. Configure AutoML using `AutoMLConfig`.\n",
"4. Train the model using AmlCompute\n",
"5. Explore the results.\n",
"6. Test the best fitted model.\n",
"\n",
"In addition this notebook showcases the following features\n",
"- **Parallel** executions for iterations\n",
"- **Asynchronous** tracking of progress\n",
"- **Cancellation** of individual iterations or the entire run\n",
"- Retrieving models for any iteration or logged metric\n",
"- Specifying AutoML settings as `**kwargs`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import csv\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'automl-remote-amlcompute'\n",
"project_folder = './project'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlcl\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
" \n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"For remote executions, you need to make the data accessible from the remote compute.\n",
"This can be done by uploading the data to DataStore.\n",
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_train = datasets.load_digits()\n",
"\n",
"if not os.path.isdir('data'):\n",
" os.mkdir('data')\n",
" \n",
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)\n",
" \n",
"pd.DataFrame(data_train.data).to_csv(\"data/X_train.tsv\", index=False, header=False, quoting=csv.QUOTE_ALL, sep=\"\\t\")\n",
"pd.DataFrame(data_train.target).to_csv(\"data/y_train.tsv\", index=False, header=False, sep=\"\\t\")\n",
"\n",
"ds = ws.get_default_datastore()\n",
"ds.upload(src_dir='./data', target_path='bai_data', overwrite=True, show_progress=True)\n",
"\n",
"from azureml.core.runconfig import DataReferenceConfiguration\n",
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
" path_on_datastore='bai_data', \n",
" path_on_compute='/tmp/azureml_runs',\n",
" mode='download', # download files from datastore to compute target\n",
" overwrite=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n",
"# set the data reference of the run coonfiguration\n",
"conda_run_config.data_references = {ds.name: dr}\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile $project_folder/get_data.py\n",
"\n",
"import pandas as pd\n",
"\n",
"def get_data():\n",
" X_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
"\n",
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
"\n",
"**Note:** When using AmlCompute, you can't pass Numpy arrays directly to the fit method.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\": 2,\n",
" \"iterations\": 20,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": False,\n",
" \"max_concurrent_iterations\": 5,\n",
" \"verbosity\": logging.INFO\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" path = project_folder,\n",
" run_configuration=conda_run_config,\n",
" data_script = project_folder + \"/get_data.py\",\n",
" **automl_settings\n",
" )\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output = False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"\n",
"#### Loading executed runs\n",
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"remote_run = AutoMLRun(experiment = experiment, run_id = 'AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(remote_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cancelling Runs\n",
"\n",
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
"# remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2.\n",
"# remote_run.cancel_iteration(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model which has the smallest `log_loss` value:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"log_loss\"\n",
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 3\n",
"third_run, third_model = remote_run.get_output(iteration=iteration)\n",
"print(third_run)\n",
"print(third_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test\n",
"\n",
"#### Load Test Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing Our Best Fitted Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Randomly select digits and test.\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,586 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Remote Execution with DataStore**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"This sample accesses a data file on a remote DSVM through DataStore. Advantages of using data store are:\n",
"1. DataStore secures the access details.\n",
"2. DataStore supports read, write to blob and file store\n",
"3. AutoML natively supports copying data from DataStore to DSVM\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Storing data in DataStore.\n",
"2. get_data returning data from DataStore."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import time\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.compute import DsvmCompute\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-remote-datastore-file'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-remote-datastore-file'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a Remote Linux DSVM\n",
"Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n",
"\n",
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/detailed-troubleshoot-ssh-connection) on this."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"compute_target_name = 'mydsvmc'\n",
"\n",
"try:\n",
" while ws.compute_targets[compute_target_name].provisioning_state == 'Creating':\n",
" time.sleep(1)\n",
" \n",
" dsvm_compute = DsvmCompute(workspace=ws, name=compute_target_name)\n",
" print('found existing:', dsvm_compute.name)\n",
"except:\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size=\"Standard_D2_v2\")\n",
" dsvm_compute = DsvmCompute.create(ws, name=compute_target_name, provisioning_configuration=dsvm_config)\n",
" dsvm_compute.wait_for_completion(show_output=True)\n",
" print(\"Waiting one minute for ssh to be accessible\")\n",
" time.sleep(90) # Wait for ssh to be accessible"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"\n",
"### Copy data file to local\n",
"\n",
"Download the data file.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.isdir('data'):\n",
" os.mkdir('data') "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_20newsgroups\n",
"import csv\n",
"\n",
"remove = ('headers', 'footers', 'quotes')\n",
"categories = [\n",
" 'alt.atheism',\n",
" 'talk.religion.misc',\n",
" 'comp.graphics',\n",
" 'sci.space',\n",
" ]\n",
"data_train = fetch_20newsgroups(subset = 'train', categories = categories,\n",
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
" \n",
"pd.DataFrame(data_train.data).to_csv(\"data/X_train.tsv\", index=False, header=False, quoting=csv.QUOTE_ALL, sep=\"\\t\")\n",
"pd.DataFrame(data_train.target).to_csv(\"data/y_train.tsv\", index=False, header=False, sep=\"\\t\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Upload data to the cloud"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n",
"\n",
"The data.tsv files are uploaded into a directory named data at the root of the datastore."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#blob_datastore = Datastore(ws, blob_datastore_name)\n",
"ds = ws.get_default_datastore()\n",
"print(ds.datastore_type, ds.account_name, ds.container_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ds.upload_files(\"data.tsv\")\n",
"ds.upload(src_dir='./data', target_path='data', overwrite=True, show_progress=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run\n",
"\n",
"First let's create a DataReferenceConfigruation object to inform the system what data folder to download to the compute target.\n",
"The path_on_compute should be an absolute path to ensure that the data files are downloaded only once. The get_data method should use this same path to access the data files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import DataReferenceConfiguration\n",
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
" path_on_datastore='data', \n",
" path_on_compute='/tmp/azureml_runs',\n",
" mode='download', # download files from datastore to compute target\n",
" overwrite=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import pkg_resources\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to the Linux DSVM\n",
"conda_run_config.target = dsvm_compute\n",
"# set the data reference of the run coonfiguration\n",
"conda_run_config.data_references = {ds.name: dr}\n",
"\n",
"pandas_dependency = 'pandas==' + pkg_resources.get_distribution(\"pandas\").version\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80',pandas_dependency])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Get Data File\n",
"For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"\n",
"The *get_data()* function returns a [dictionary](README.md#getdata).\n",
"\n",
"The read_csv uses the path_on_compute value specified in the DataReferenceConfiguration call plus the path_on_datastore folder and then the actual file name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile $project_folder/get_data.py\n",
"\n",
"import pandas as pd\n",
"\n",
"def get_data():\n",
" X_train = pd.read_csv(\"/tmp/azureml_runs/data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" y_train = pd.read_csv(\"/tmp/azureml_runs/data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
"\n",
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
"\n",
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to AutoMLConfig.</i>\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
"|**n_cross_validations**|Number of cross validation splits|\n",
"|**max_concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM\n",
"|**preprocess**| *True/False* <br>Setting this to *True* enables Auto ML to perform preprocessing <br>on the input to handle *missing data*, and perform some common *feature extraction*|\n",
"|**enable_cache**|Setting this to *True* enables preprocess done once and reuse the same preprocessed data for all the iterations. Default value is True.|\n",
"|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.<br> Default is *1*, you can set it to *-1* to use all cores|"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\": 60,\n",
" \"iterations\": 4,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": True,\n",
" \"max_cores_per_iteration\": 1,\n",
" \"verbosity\": logging.INFO\n",
"}\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" path=project_folder,\n",
" run_configuration=conda_run_config,\n",
" #compute_target = dsvm_compute,\n",
" data_script = project_folder + \"/get_data.py\",\n",
" **automl_settings\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"#### Widget for monitoring runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n",
"\n",
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(remote_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Canceling Runs\n",
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
"# remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2\n",
"# remote_run.cancel_iteration(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pre-process cache cleanup\n",
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run.clean_preprocessor_cache()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method returns the best run and the fitted model. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model based on any other metric"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# lookup_metric = \"accuracy\"\n",
"# best_run, fitted_model = remote_run.get_output(metric=lookup_metric)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a specific iteration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# iteration = 1\n",
"# best_run, fitted_model = remote_run.get_output(iteration=iteration)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load test data.\n",
"from pandas_ml import ConfusionMatrix\n",
"\n",
"data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
"\n",
"X_test = np.array(data_test.data).reshape((len(data_test.data),1))\n",
"y_test = data_test.target\n",
"\n",
"# Test our best pipeline.\n",
"\n",
"y_pred = fitted_model.predict(X_test)\n",
"y_pred_strings = [data_test.target_names[i] for i in y_pred]\n",
"y_test_strings = [data_test.target_names[i] for i in y_test]\n",
"\n",
"cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n",
"print(cm)\n",
"cm.plot()"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,527 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Remote Execution using DSVM (Ubuntu)**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you wiil learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Attach an existing DSVM to a workspace.\n",
"3. Configure AutoML using `AutoMLConfig`.\n",
"4. Train the model using the DSVM.\n",
"5. Explore the results.\n",
"6. Test the best fitted model.\n",
"\n",
"In addition, this notebook showcases the following features:\n",
"- **Parallel** executions for iterations\n",
"- **Asynchronous** tracking of progress\n",
"- **Cancellation** of individual iterations or the entire run\n",
"- Retrieving models for any iteration or logged metric\n",
"- Specifying AutoML settings as `**kwargs`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import time\n",
"import csv\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'automl-remote-dsvm'\n",
"project_folder = './project'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a Remote Linux DSVM\n",
"**Note:** If creation fails with a message about Marketplace purchase eligibilty, start creation of a DSVM through the [Azure portal](https://portal.azure.com), and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled this setting, you can exit the portal without actually creating the DSVM, and creation of the DSVM through the notebook should work.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import DsvmCompute\n",
"\n",
"dsvm_name = 'mydsvma'\n",
"try:\n",
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
" print('Found an existing DSVM.')\n",
"except:\n",
" print('Creating a new DSVM.')\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2s_v3\")\n",
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
" dsvm_compute.wait_for_completion(show_output = True)\n",
" print(\"Waiting one minute for ssh to be accessible\")\n",
" time.sleep(90) # Wait for ssh to be accessible"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"For remote executions, you need to make the data accessible from the remote compute.\n",
"This can be done by uploading the data to DataStore.\n",
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_train = datasets.load_digits()\n",
"\n",
"if not os.path.isdir('data'):\n",
" os.mkdir('data')\n",
" \n",
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)\n",
" \n",
"pd.DataFrame(data_train.data).to_csv(\"data/X_train.tsv\", index=False, header=False, quoting=csv.QUOTE_ALL, sep=\"\\t\")\n",
"pd.DataFrame(data_train.target).to_csv(\"data/y_train.tsv\", index=False, header=False, sep=\"\\t\")\n",
"\n",
"ds = ws.get_default_datastore()\n",
"ds.upload(src_dir='./data', target_path='re_data', overwrite=True, show_progress=True)\n",
"\n",
"from azureml.core.runconfig import DataReferenceConfiguration\n",
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
" path_on_datastore='re_data', \n",
" path_on_compute='/tmp/azureml_runs',\n",
" mode='download', # download files from datastore to compute target\n",
" overwrite=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to the Linux DSVM\n",
"conda_run_config.target = dsvm_compute\n",
"\n",
"# set the data reference of the run coonfiguration\n",
"conda_run_config.data_references = {ds.name: dr}\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile $project_folder/get_data.py\n",
"\n",
"import pandas as pd\n",
"\n",
"def get_data():\n",
" X_train = pd.read_csv(\"/tmp/azureml_runs/re_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" y_train = pd.read_csv(\"/tmp/azureml_runs/re_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
"\n",
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
"\n",
"**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**max_concurrent_iterations**|Maximum number of iterations to execute in parallel. This should be less than the number of cores on the DSVM.|"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\": 10,\n",
" \"iterations\": 20,\n",
" \"n_cross_validations\": 5,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"preprocess\": False,\n",
" \"max_concurrent_iterations\": 2,\n",
" \"verbosity\": logging.INFO\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" path = project_folder, \n",
" run_configuration=conda_run_config,\n",
" data_script = project_folder + \"/get_data.py\",\n",
" **automl_settings\n",
" )\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** The first run on a new DSVM may take several minutes to prepare the environment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
"\n",
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output = False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"\n",
"#### Loading Executed Runs\n",
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"remote_run = AutoMLRun(experiment=experiment, run_id = 'AutoML_480d3ed6-fc94-44aa-8f4e-0b945db9d3ef')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait until the run finishes.\n",
"remote_run.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(remote_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cancelling Runs\n",
"\n",
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
"# remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2.\n",
"# remote_run.cancel_iteration(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model which has the smallest `log_loss` value:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"log_loss\"\n",
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a Specific Iteration\n",
"Show the run and the model from the third iteration:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 3\n",
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
"print(third_run)\n",
"print(third_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test\n",
"\n",
"#### Load Test Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Test Our Best Fitted Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Randomly select digits and test.\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/sample-weight/auto-ml-sample-weight.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/subsampling/auto-ml-subsampling-local.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -26,4 +26,8 @@ You can use Azure Databricks as a compute target from [Azure Machine Learning Pi
For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks). For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
**Please let us know your feedback.** **Please let us know your feedback.**
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/README.png)

View File

@@ -11,6 +11,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/build-model-run-history-03.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -333,6 +340,13 @@
"source": [ "source": [
"dbutils.notebook.exit(\"success\")" "dbutils.notebook.exit(\"success\")"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/amlsdk/build-model-run-history-03.png)"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -11,6 +11,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/deploy-to-aci-04.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -277,6 +284,13 @@
"#comment to not delete the web service\n", "#comment to not delete the web service\n",
"myservice.delete()" "myservice.delete()"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.png)"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -11,6 +11,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -203,6 +210,13 @@
"#model.delete()\n", "#model.delete()\n",
"aks_target.delete() " "aks_target.delete() "
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.png)"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -11,6 +11,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/ingest-data-02.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -139,6 +146,13 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [] "source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/amlsdk/ingest-data-02.png)"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -11,6 +11,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/installation-and-configuration-01.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -143,6 +150,13 @@
" 'Subscription id: ' + ws.subscription_id, \n", " 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')" " 'Resource group: ' + ws.resource_group, sep = '\\n')"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/amlsdk/installation-and-configuration-01.png)"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -660,6 +660,13 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [] "source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.png)"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -796,6 +796,13 @@
"source": [ "source": [
"myservice.delete()" "myservice.delete()"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.png)"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -677,6 +677,13 @@
"# Next: ADLA as a Compute Target\n", "# Next: ADLA as a Compute Target\n",
"To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](./aml-pipelines-use-adla-as-compute-target.ipynb) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline." "To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](./aml-pipelines-use-adla-as-compute-target.ipynb) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline."
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.png)"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -8,7 +8,14 @@
"\n", "\n",
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-hdi/automl_hdi_local_classification.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -0,0 +1,709 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Track Data Drift between Training and Inference Data in Production \n",
"\n",
"With this notebook, you will learn how to enable the DataDrift service to automatically track and determine whether your inference data is drifting from the data your model was initially trained on. The DataDrift service provides metrics and visualizations to help stakeholders identify which specific features cause the concept drift to occur.\n",
"\n",
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
"\n",
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/datadrift/azureml-datadrift.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Prerequisites and Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install the DataDrift package\n",
"\n",
"Install the azureml-contrib-datadrift, azureml-contrib-opendatasets and lightgbm packages before running this notebook.\n",
"```\n",
"pip install azureml-contrib-datadrift\n",
"pip install azureml-contrib-datasets\n",
"pip install lightgbm\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import os\n",
"import time\n",
"from datetime import datetime, timedelta\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import requests\n",
"from azureml.contrib.datadrift import DataDriftDetector, AlertConfiguration\n",
"from azureml.contrib.opendatasets import NoaaIsdWeather\n",
"from azureml.core import Dataset, Workspace, Run\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.image import ContainerImage\n",
"from azureml.core.model import Model\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"from azureml.widgets import RunDetails\n",
"from sklearn.externals import joblib\n",
"from sklearn.model_selection import train_test_split\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up Configuraton and Create Azure ML Workspace\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
"prefix = \"dd\"\n",
"\n",
"# NOTE: Please do not change the model_name, as it's required by the score.py file\n",
"model_name = \"driftmodel\"\n",
"image_name = \"{}driftimage\".format(prefix)\n",
"service_name = \"{}driftservice\".format(prefix)\n",
"\n",
"# optionally, set email address to receive an email alert for DataDrift\n",
"email_address = \"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate Train/Testing Data\n",
"\n",
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
" '725486', '727883', '723177', '722075', '723086', '724053',\n",
" '725070', '722073', '726060', '725224', '725260', '724520',\n",
" '720305', '724020', '726510', '725126', '722523', '703333',\n",
" '722249', '722728', '725483', '722972', '724975', '742079',\n",
" '727468', '722193', '725624', '722030', '726380', '720309',\n",
" '722071', '720326', '725415', '724504', '725665', '725424',\n",
" '725066']\n",
"\n",
"columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n",
"\n",
"\n",
"def enrich_weather_noaa_data(noaa_df):\n",
" hours_in_day = 23\n",
" week_in_year = 52\n",
" \n",
" noaa_df[\"hour\"] = noaa_df[\"datetime\"].dt.hour\n",
" noaa_df[\"weekofyear\"] = noaa_df[\"datetime\"].dt.week\n",
" \n",
" noaa_df[\"sine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year))\n",
" noaa_df[\"cosine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year))\n",
"\n",
" noaa_df[\"sine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day))\n",
" noaa_df[\"cosine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
" \n",
" return noaa_df\n",
"\n",
"def add_window_col(input_df):\n",
" shift_interval = pd.Timedelta('-7 days') # your X days interval\n",
" df_shifted = input_df.copy()\n",
" df_shifted['datetime'] = df_shifted['datetime'] - shift_interval\n",
" df_shifted.drop(list(input_df.columns.difference(['datetime', 'usaf', 'wban', 'sine_hourofday', 'temperature'])), axis=1, inplace=True)\n",
"\n",
" # merge, keeping only observations where -1 lag is present\n",
" df2 = pd.merge(input_df,\n",
" df_shifted,\n",
" on=['datetime', 'usaf', 'wban', 'sine_hourofday'],\n",
" how='inner', # use 'left' to keep observations without lags\n",
" suffixes=['', '-7'])\n",
" return df2\n",
"\n",
"def get_noaa_data(start_time, end_time, cols, station_list):\n",
" isd = NoaaIsdWeather(start_time, end_time, cols=cols)\n",
" # Read into Pandas data frame.\n",
" noaa_df = isd.to_pandas_dataframe()\n",
" noaa_df = noaa_df.rename(columns={\"stationName\": \"station_name\"})\n",
" \n",
" df_filtered = noaa_df[noaa_df[\"usaf\"].isin(station_list)]\n",
" df_filtered.reset_index(drop=True)\n",
" \n",
" # Enrich with time features\n",
" df_enriched = enrich_weather_noaa_data(df_filtered)\n",
" \n",
" return df_enriched\n",
"\n",
"def get_featurized_noaa_df(start_time, end_time, cols, station_list):\n",
" df_1 = get_noaa_data(start_time - timedelta(days=7), start_time - timedelta(seconds=1), cols, station_list)\n",
" df_2 = get_noaa_data(start_time, end_time, cols, station_list)\n",
" noaa_df = pd.concat([df_1, df_2])\n",
" \n",
" print(\"Adding window feature\")\n",
" df_window = add_window_col(noaa_df)\n",
" \n",
" cat_columns = df_window.dtypes == object\n",
" cat_columns = cat_columns[cat_columns == True]\n",
" \n",
" print(\"Encoding categorical columns\")\n",
" df_encoded = pd.get_dummies(df_window, columns=cat_columns.keys().tolist())\n",
" \n",
" print(\"Dropping unnecessary columns\")\n",
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
" \n",
" return df_featurized"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Train model on Jan 1 - 14, 2009 data\n",
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"label = \"temperature\"\n",
"x_df = df.drop(label, axis=1)\n",
"y_df = df[[label]]\n",
"x_train, x_test, y_train, y_test = train_test_split(df, y_df, test_size=0.2, random_state=223)\n",
"print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)\n",
"\n",
"training_dir = 'outputs/training'\n",
"training_file = \"training.csv\"\n",
"\n",
"# Generate training dataframe to register as Training Dataset\n",
"os.makedirs(training_dir, exist_ok=True)\n",
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
"training_df.to_csv(training_dir + \"/\" + training_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create/Register Training Dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset_name = \"dataset\"\n",
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
"snapshot_name = \"snapshot-{}\".format(name_suffix)\n",
"\n",
"dstore = ws.get_default_datastore()\n",
"dstore.upload(training_dir, \"data/training\", show_progress=True)\n",
"dpath = dstore.path(\"data/training/training.csv\")\n",
"trainingDataset = Dataset.auto_read_files(dpath, include_path=True)\n",
"trainingDataset = trainingDataset.register(workspace=ws, name=dataset_name, description=\"dset\", exist_ok=True)\n",
"\n",
"trainingDataSnapshot = trainingDataset.create_snapshot(snapshot_name=snapshot_name, compute_target=None, create_data_snapshot=True)\n",
"datasets = [(Dataset.Scenario.TRAINING, trainingDataSnapshot)]\n",
"print(\"dataset registration done.\\n\")\n",
"datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train and Save Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import lightgbm as lgb\n",
"\n",
"train = lgb.Dataset(data=x_train, \n",
" label=y_train)\n",
"\n",
"test = lgb.Dataset(data=x_test, \n",
" label=y_test,\n",
" reference=train)\n",
"\n",
"params = {'learning_rate' : 0.1,\n",
" 'boosting' : 'gbdt',\n",
" 'metric' : 'rmse',\n",
" 'feature_fraction' : 1,\n",
" 'bagging_fraction' : 1,\n",
" 'max_depth': 6,\n",
" 'num_leaves' : 31,\n",
" 'objective' : 'regression',\n",
" 'bagging_freq' : 1,\n",
" \"verbose\": -1,\n",
" 'min_data_per_leaf': 100}\n",
"\n",
"model = lgb.train(params, \n",
" num_boost_round=500,\n",
" train_set=train,\n",
" valid_sets=[train, test],\n",
" verbose_eval=50,\n",
" early_stopping_rounds=25)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
"\n",
"os.makedirs('outputs', exist_ok=True)\n",
"joblib.dump(model, model_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Register Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = Model.register(model_path=model_file,\n",
" model_name=model_name,\n",
" workspace=ws,\n",
" datasets=datasets)\n",
"\n",
"print(model_name, image_name, service_name, model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy Model To AKS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare Environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Image creation may take up to 15 minutes.\n",
"\n",
"image_name = image_name + str(model.version)\n",
"\n",
"if not image_name in ws.images:\n",
" # Use the score.py defined in this directory as the execution script\n",
" # NOTE: The Model Data Collector must be enabled in the execution script for DataDrift to run correctly\n",
" image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
" runtime=\"python\",\n",
" conda_file=\"myenv.yml\",\n",
" description=\"Image with weather dataset model\")\n",
" image = ContainerImage.create(name=image_name,\n",
" models=[model],\n",
" image_config=image_config,\n",
" workspace=ws)\n",
"\n",
" image.wait_for_creation(show_output=True)\n",
"else:\n",
" image = ws.images[image_name]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Compute Target"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_name = 'dd-demo-e2e'\n",
"prov_config = AksCompute.provisioning_configuration()\n",
"\n",
"if not aks_name in ws.compute_targets:\n",
" aks_target = ComputeTarget.create(workspace=ws,\n",
" name=aks_name,\n",
" provisioning_configuration=prov_config)\n",
"\n",
" aks_target.wait_for_completion(show_output=True)\n",
" print(aks_target.provisioning_state)\n",
" print(aks_target.provisioning_errors)\n",
"else:\n",
" aks_target=ws.compute_targets[aks_name]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy Service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service_name = service_name\n",
"\n",
"if not aks_service_name in ws.webservices:\n",
" aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)\n",
" aks_service = Webservice.deploy_from_image(workspace=ws,\n",
" name=aks_service_name,\n",
" image=image,\n",
" deployment_config=aks_config,\n",
" deployment_target=aks_target)\n",
" aks_service.wait_for_deployment(show_output=True)\n",
" print(aks_service.state)\n",
"else:\n",
" aks_service = ws.webservices[aks_service_name]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Run DataDrift Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Send Scoring Data to Service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download Scoring Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Score Model on March 15, 2016 data\n",
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
"# Add the window feature column\n",
"scoring_df = add_window_col(scoring_df)\n",
"\n",
"# Drop features not used by the model\n",
"print(\"Dropping unnecessary columns\")\n",
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
"scoring_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
"extra_cols = ('Path', 'Column1')\n",
"for k in extra_cols:\n",
" columns_dict.pop(k, None)\n",
"training_columns = list(columns_dict.keys())\n",
"\n",
"categorical_columns = scoring_df.dtypes == object\n",
"categorical_columns = categorical_columns[categorical_columns == True]\n",
"\n",
"test_df = pd.get_dummies(scoring_df[categorical_columns.keys().tolist()])\n",
"encoded_df = scoring_df.join(test_df)\n",
"\n",
"# Populate missing OHE columns with 0 values to match traning dataset schema\n",
"difference = list(set(training_columns) - set(encoded_df.columns.tolist()))\n",
"for col in difference:\n",
" encoded_df[col] = 0\n",
"encoded_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Serialize dataframe to list of row dictionaries\n",
"encoded_dict = encoded_df.to_dict('records')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit Scoring Data to Service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"# retreive the API keys. AML generates two keys.\n",
"key1, key2 = aks_service.get_keys()\n",
"\n",
"total_count = len(scoring_df)\n",
"i = 0\n",
"load = []\n",
"for row in encoded_dict:\n",
" load.append(row)\n",
" i = i + 1\n",
" if i % 100 == 0:\n",
" payload = json.dumps({\"data\": load})\n",
" \n",
" # construct raw HTTP request and send to the service\n",
" payload_binary = bytes(payload,encoding = 'utf8')\n",
" headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
" resp = requests.post(aks_service.scoring_uri, payload_binary, headers=headers)\n",
" \n",
" print(\"prediction:\", resp.content, \"Progress: {}/{}\".format(i, total_count)) \n",
"\n",
" load = []\n",
" time.sleep(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure DataDrift"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"services = [service_name]\n",
"start = datetime.now() - timedelta(days=2)\n",
"end = datetime(year=2020, month=1, day=22, hour=15, minute=16)\n",
"feature_list = ['usaf', 'wban', 'latitude', 'longitude', 'station_name', 'p_k', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n",
"alert_config = AlertConfiguration([email_address]) if email_address else None\n",
"\n",
"# there will be an exception indicating using get() method if DataDrift object already exist\n",
"try:\n",
" datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", alert_config=alert_config)\n",
"except KeyError:\n",
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
" \n",
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run an Adhoc DataDriftDetector Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"target_date = datetime.today()\n",
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, datadrift._id)\n",
"dd_run = Run(experiment=exp, run_id=run)\n",
"RunDetails(dd_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Drift Analysis Results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(dd_run.get_children())\n",
"for child in children:\n",
" child.wait_for_completion()\n",
"\n",
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
"drift_metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show all drift figures, one per serivice.\n",
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
"\n",
"drift_figures = datadrift.show(with_details=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enable DataDrift Schedule"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"datadrift.enable_schedule()"
]
}
],
"metadata": {
"authors": [
{
"name": "rafarmah"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,3 @@
## Using data drift APIs
1. [Detect data drift for a model](azure-ml-datadrift.ipynb): Detect data drift for a deployed model.

View File

@@ -0,0 +1,58 @@
import pickle
import json
import numpy
import azureml.train.automl
from sklearn.externals import joblib
from sklearn.linear_model import Ridge
from azureml.core.model import Model
from azureml.core.run import Run
from azureml.monitoring import ModelDataCollector
import time
import pandas as pd
def init():
global model, inputs_dc, prediction_dc, feature_names, categorical_features
print("Model is initialized" + time.strftime("%H:%M:%S"))
model_path = Model.get_model_path(model_name="driftmodel")
model = joblib.load(model_path)
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
"temperature-7"]
categorical_features = ["usaf", "wban", "p_k", "station_name"]
inputs_dc = ModelDataCollector(model_name="driftmodel",
identifier="inputs",
feature_names=feature_names)
prediction_dc = ModelDataCollector("driftmodel",
identifier="predictions",
feature_names=["temperature"])
def run(raw_data):
global inputs_dc, prediction_dc
try:
data = json.loads(raw_data)["data"]
data = pd.DataFrame(data)
# Remove the categorical features as the model expects OHE values
input_data = data.drop(categorical_features, axis=1)
result = model.predict(input_data)
# Collect the non-OHE dataframe
collected_df = data[feature_names]
inputs_dc.collect(collected_df.values)
prediction_dc.collect(result)
return result.tolist()
except Exception as e:
error = str(e)
print(error + time.strftime("%H:%M:%S"))
return error

View File

@@ -9,6 +9,20 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/deploy-to-cloud/model-register-and-deploy.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deploy-to-cloud/model-register-and-deploy.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -26,7 +40,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites\n", "## Prerequisites\n",
"Make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
] ]
}, },
{ {

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deploy-to-local/register-model-deploy-local-advanced.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -28,7 +35,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites\n", "## Prerequisites\n",
"Make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
] ]
}, },
{ {

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deploy-to-local/register-model-deploy-local.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -28,7 +35,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites\n", "## Prerequisites\n",
"Make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
] ]
}, },
{ {

View File

@@ -0,0 +1,102 @@
# Notebooks for Microsoft Azure Machine Learning Hardware Accelerated Models SDK
Easily create and train a model using various deep neural networks (DNNs) as a featurizer for deployment to Azure or a Data Box Edge device for ultra-low latency inferencing using FPGA's. These models are currently available:
* ResNet 50
* ResNet 152
* DenseNet-121
* VGG-16
* SSD-VGG
To learn more about the azureml-accel-model classes, see the section [Model Classes](#model-classes) below or the [Azure ML Accel Models SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel?view=azure-ml-py).
### Step 1: Create an Azure ML workspace
Follow [these instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python) to install the Azure ML SDK on your local machine, create an Azure ML workspace, and set up your notebook environment, which is required for the next step.
### Step 2: Check your FPGA quota
Use the Azure CLI to check whether you have quota.
```shell
az vm list-usage --location "eastus" -o table
```
The other locations are ``southeastasia``, ``westeurope``, and ``westus2``.
Under the "Name" column, look for "Standard PBS Family vCPUs" and ensure you have at least 6 vCPUs under "CurrentValue."
If you do not have quota, then submit a request form [here](https://aka.ms/accelerateAI).
### Step 3: Install the Azure ML Accelerated Models SDK
Once you have set up your environment, install the Azure ML Accel Models SDK. This package requires tensorflow >= 1.6,<2.0 to be installed.
If you already have tensorflow >= 1.6,<2.0 installed in your development environment, you can install the SDK package using:
```
pip install azureml-accel-models
```
If you do not have tensorflow >= 1.6,<2.0 and are using a CPU-only development environment, our SDK with tensorflow can be installed using:
```
pip install azureml-accel-models[cpu]
```
If your machine supports GPU (for example, on an [Azure DSVM](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview)), then you can leverage the tensorflow-gpu functionality using:
```
pip install azureml-accel-models[gpu]
```
### Step 4: Follow our notebooks
The notebooks in this repo walk through the following scenarios:
* [Quickstart](accelerated-models-quickstart.ipynb), deploy and inference a ResNet50 model trained on ImageNet
* [Object Detection](accelerated-models-object-detection.ipynb), deploy and inference an SSD-VGG model that can do object detection
* [Training models](accelerated-models-training.ipynb), train one of our accelerated models on the Kaggle Cats and Dogs dataset to see how to improve accuracy on custom datasets
<a name="model-classes"></a>
## Model Classes
As stated above, we support 5 Accelerated Models. Here's more information on their input and output tensors.
**Available models and output tensors**
The available models and the corresponding default classifier output tensors are below. This is the value that you would use during inferencing if you used the default classifier.
* Resnet50, QuantizedResnet50
``
output_tensors = "classifier_1/resnet_v1_50/predictions/Softmax:0"
``
* Resnet152, QuantizedResnet152
``
output_tensors = "classifier/resnet_v1_152/predictions/Softmax:0"
``
* Densenet121, QuantizedDensenet121
``
output_tensors = "classifier/densenet121/predictions/Softmax:0"
``
* Vgg16, QuantizedVgg16
``
output_tensors = "classifier/vgg_16/fc8/squeezed:0"
``
* SsdVgg, QuantizedSsdVgg
``
output_tensors = ['ssd_300_vgg/block4_box/Reshape_1:0', 'ssd_300_vgg/block7_box/Reshape_1:0', 'ssd_300_vgg/block8_box/Reshape_1:0', 'ssd_300_vgg/block9_box/Reshape_1:0', 'ssd_300_vgg/block10_box/Reshape_1:0', 'ssd_300_vgg/block11_box/Reshape_1:0', 'ssd_300_vgg/block4_box/Reshape:0', 'ssd_300_vgg/block7_box/Reshape:0', 'ssd_300_vgg/block8_box/Reshape:0', 'ssd_300_vgg/block9_box/Reshape:0', 'ssd_300_vgg/block10_box/Reshape:0', 'ssd_300_vgg/block11_box/Reshape:0']
``
For more information, please reference the azureml.accel.models package in the [Azure ML Python SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.models?view=azure-ml-py).
**Input tensors**
The input_tensors value defaults to "Placeholder:0" and is created in the [Image Preprocessing](#construct-model) step in the line:
``
in_images = tf.placeholder(tf.string)
``
You can change the input_tensors name by doing this:
``
in_images = tf.placeholder(tf.string, name="images")
``
## Resources
* [Read more about FPGAs](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-accelerate-with-fpgas)

View File

@@ -0,0 +1,494 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure ML Hardware Accelerated Object Detection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial will show you how to deploy an object detection service based on the SSD-VGG model in just a few minutes using the Azure Machine Learning Accelerated AI service.\n",
"\n",
"We will use the SSD-VGG model accelerated on an FPGA. Our Accelerated Models Service handles translating deep neural networks (DNN) into an FPGA program.\n",
"\n",
"The steps in this notebook are: \n",
"1. [Setup Environment](#set-up-environment)\n",
"* [Construct Model](#construct-model)\n",
" * Image Preprocessing\n",
" * Featurizer\n",
" * Save Model\n",
" * Save input and output tensor names\n",
"* [Create Image](#create-image)\n",
"* [Deploy Image](#deploy-image)\n",
"* [Test the Service](#test-service)\n",
" * Create Client\n",
" * Serve the model\n",
"* [Cleanup](#cleanup)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"set-up-environment\"></a>\n",
"## 1. Set up Environment\n",
"### 1.a. Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.b. Retrieve Workspace\n",
"If you haven't created a Workspace, please follow [this notebook](\"../../../configuration.ipynb\") to do so. If you have, run the codeblock below to retrieve it. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"construct-model\"></a>\n",
"## 2. Construct model\n",
"### 2.a. Image preprocessing\n",
"We'd like our service to accept JPEG images as input. However the input to SSD-VGG is a float tensor of shape \\[1, 300, 300, 3\\]. The first dimension is batch, then height, width, and channels (i.e. NHWC). To bridge this gap, we need code that decodes JPEG images and resizes them appropriately for input to SSD-VGG. The Accelerated AI service can execute TensorFlow graphs as part of the service and we'll use that ability to do the image preprocessing. This code defines a TensorFlow graph that preprocesses an array of JPEG images (as TensorFlow strings) and produces a tensor that is ready to be featurized by SSD-VGG.\n",
"\n",
"**Note:** Expect to see TF deprecation warnings until we port our SDK over to use Tensorflow 2.0."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n",
"import azureml.accel.models.utils as utils\n",
"tf.reset_default_graph()\n",
"\n",
"in_images = tf.placeholder(tf.string)\n",
"image_tensors = utils.preprocess_array(in_images, output_width=300, output_height=300, preserve_aspect_ratio=False)\n",
"print(image_tensors.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.b. Featurizer\n",
"The SSD-VGG model is different from our other models in that it generates 12 tensor outputs. These corresponds to x,y displacements of the anchor boxes and the detection confidence (for 21 classes). Because these outputs are not convenient to work with, we will later use a pre-defined post-processing utility to transform the outputs into a simplified list of bounding boxes with their respective class and confidence.\n",
"\n",
"For more information about the output tensors, take this example: the output tensor 'ssd_300_vgg/block4_box/Reshape_1:0' has a shape of [None, 37, 37, 4, 21]. This gives the pre-softmax confidence for 4 anchor boxes situated at each site of a 37 x 37 grid imposed on the image, one confidence score for each of the 21 classes. The first dimension is the batch dimension. Likewise, 'ssd_300_vgg/block4_box/Reshape:0' has shape [None, 37, 37, 4, 4] and encodes the (cx, cy) center shift and rescaling (sw, sh) relative to each anchor box. Refer to the [SSD-VGG paper](https://arxiv.org/abs/1512.02325) to understand how these are computed. The other 10 tensors are defined similarly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.accel.models import SsdVgg\n",
"\n",
"saved_model_dir = os.path.join(os.path.expanduser('~'), 'models')\n",
"model_graph = SsdVgg(saved_model_dir, is_frozen = True)\n",
"\n",
"print('SSD-VGG Input Tensors:')\n",
"for idx, input_name in enumerate(model_graph.input_tensor_list):\n",
" print('{}, {}'.format(input_name, model_graph.get_input_dims(idx)))\n",
" \n",
"print('SSD-VGG Output Tensors:')\n",
"for idx, output_name in enumerate(model_graph.output_tensor_list):\n",
" print('{}, {}'.format(output_name, model_graph.get_output_dims(idx)))\n",
"\n",
"ssd_outputs = model_graph.import_graph_def(image_tensors, is_training=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.c. Save Model\n",
"Now that we loaded both parts of the tensorflow graph (preprocessor and SSD-VGG featurizer), we can save the graph and associated variables to a directory which we can register as an Azure ML Model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_name = \"ssdvgg\"\n",
"model_save_path = os.path.join(saved_model_dir, model_name, \"saved_model\")\n",
"print(\"Saving model in {}\".format(model_save_path))\n",
"\n",
"output_map = {}\n",
"for i, output in enumerate(ssd_outputs):\n",
" output_map['out_{}'.format(i)] = output\n",
"\n",
"with tf.Session() as sess:\n",
" model_graph.restore_weights(sess)\n",
" tf.saved_model.simple_save(sess, \n",
" model_save_path, \n",
" inputs={'images': in_images}, \n",
" outputs=output_map)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.d. Important! Save names of input and output tensors\n",
"\n",
"These input and output tensors that were created during the preprocessing and classifier steps are also going to be used when **converting the model** to an Accelerated Model that can run on FPGA's and for **making an inferencing request**. It is very important to save this information!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"input_tensors = in_images.name\n",
"# We will use the list of output tensors during inferencing\n",
"output_tensors = [output.name for output in ssd_outputs]\n",
"# However, for multiple output tensors, our AccelOnnxConverter will \n",
"# accept comma-delimited strings (lists will cause error)\n",
"output_tensors_str = \",\".join(output_tensors)\n",
"\n",
"print(input_tensors)\n",
"print(output_tensors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"create-image\"></a>\n",
"## 3. Create AccelContainerImage\n",
"Below we will execute all the same steps as in the [Quickstart](./accelerated-models-quickstart.ipynb#create-image) to package the model we have saved locally into an accelerated Docker image saved in our workspace. To complete all the steps, it may take a few minutes. For more details on each step, check out the [Quickstart section on model registration](./accelerated-models-quickstart.ipynb#register-model)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"from azureml.core.model import Model\n",
"from azureml.core.image import Image\n",
"from azureml.accel import AccelOnnxConverter\n",
"from azureml.accel import AccelContainerImage\n",
"\n",
"# Retrieve workspace\n",
"ws = Workspace.from_config()\n",
"print(\"Successfully retrieved workspace:\", ws.name, ws.resource_group, ws.location, ws.subscription_id, '\\n')\n",
"\n",
"# Register model\n",
"registered_model = Model.register(workspace = ws,\n",
" model_path = model_save_path,\n",
" model_name = model_name)\n",
"print(\"Successfully registered: \", registered_model.name, registered_model.description, registered_model.version, '\\n', sep = '\\t')\n",
"\n",
"# Convert model\n",
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors_str)\n",
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
"convert_request.wait_for_completion(show_output=False)\n",
"converted_model = convert_request.result\n",
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
" converted_model.id, converted_model.created_time, '\\n')\n",
"\n",
"# Package into AccelContainerImage\n",
"image_config = AccelContainerImage.image_configuration()\n",
"# Image name must be lowercase\n",
"image_name = \"{}-image\".format(model_name)\n",
"image = Image.create(name = image_name,\n",
" models = [converted_model],\n",
" image_config = image_config, \n",
" workspace = ws)\n",
"image.wait_for_creation()\n",
"print(\"Created AccelContainerImage: {} {} {}\\n\".format(image.name, image.creation_state, image.image_location))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"deploy-image\"></a>\n",
"## 4. Deploy image\n",
"Once you have an Azure ML Accelerated Image in your Workspace, you can deploy it to two destinations, to a Databox Edge machine or to an AKS cluster. \n",
"\n",
"### 4.a. Deploy to Databox Edge Machine using IoT Hub\n",
"See the sample [here](https://github.com/Azure-Samples/aml-real-time-ai/) for using the Azure IoT CLI extension for deploying your Docker image to your Databox Edge Machine.\n",
"\n",
"### 4.b. Deploy to AKS Cluster\n",
"Same as in the [Quickstart section on image deployment](./accelerated-models-quickstart.ipynb#deploy-image), we are going to create an AKS cluster with FPGA-enabled machines, then deploy our service to it.\n",
"#### Create AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Uses the specific FPGA enabled VM (sku: Standard_PB6s)\n",
"# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia\n",
"prov_config = AksCompute.provisioning_configuration(vm_size = \"Standard_PB6s\",\n",
" agent_count = 1, \n",
" location = \"eastus\")\n",
"\n",
"aks_name = 'aks-pb6-obj'\n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Provisioning an AKS cluster might take awhile (15 or so minutes), and we want to wait until it's successfully provisioned before we can deploy a service to it. If you interrupt this cell, provisioning of the cluster will continue. You can re-run it or check the status in your Workspace under Compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deploy AccelContainerImage to AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice, AksWebservice\n",
"\n",
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
"# Authentication is enabled by default, but for testing we specify False\n",
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target)\n",
"aks_service.wait_for_deployment(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-service\"></a>\n",
"## 5. Test the service\n",
"<a id=\"create-client\"></a>\n",
"### 5.a. Create Client\n",
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions. \n",
"\n",
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
"\n",
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using the grpc client in AzureML Accelerated Models SDK\n",
"from azureml.accel.client import PredictionClient\n",
"\n",
"address = aks_service.scoring_uri\n",
"ssl_enabled = address.startswith(\"https\")\n",
"address = address[address.find('/')+2:].strip('/')\n",
"port = 443 if ssl_enabled else 80\n",
"\n",
"# Initialize AzureML Accelerated Models client\n",
"client = PredictionClient(address=address,\n",
" port=port,\n",
" use_ssl=ssl_enabled,\n",
" service_name=aks_service.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can adapt the client [code](https://github.com/Azure/aml-real-time-ai/blob/master/pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](https://github.com/Azure/aml-real-time-ai/blob/master/sample-clients/csharp).\n",
"\n",
"The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"serve-model\"></a>\n",
"### 5.b. Serve the model\n",
"The SSD-VGG model returns the confidence and bounding boxes for all possible anchor boxes. As mentioned earlier, we will use a post-processing routine to transform this into a list of bounding boxes (y1, x1, y2, x2) where x, y are fractional coordinates measured from left and top respectively. A respective list of classes and scores is also returned to tag each bounding box. Below we make use of this information to draw the bounding boxes on top the original image. Note that in the post-processing routine we select a confidence threshold of 0.5."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cv2\n",
"from matplotlib import pyplot as plt\n",
"\n",
"colors_tableau = [(255, 255, 255), (31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),\n",
" (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),\n",
" (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),\n",
" (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),\n",
" (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]\n",
"\n",
"\n",
"def draw_boxes_on_img(img, classes, scores, bboxes, thickness=2):\n",
" shape = img.shape\n",
" for i in range(bboxes.shape[0]):\n",
" bbox = bboxes[i]\n",
" color = colors_tableau[classes[i]]\n",
" # Draw bounding box...\n",
" p1 = (int(bbox[0] * shape[0]), int(bbox[1] * shape[1]))\n",
" p2 = (int(bbox[2] * shape[0]), int(bbox[3] * shape[1]))\n",
" cv2.rectangle(img, p1[::-1], p2[::-1], color, thickness)\n",
" # Draw text...\n",
" s = '%s/%.3f' % (classes[i], scores[i])\n",
" p1 = (p1[0]-5, p1[1])\n",
" cv2.putText(img, s, p1[::-1], cv2.FONT_HERSHEY_DUPLEX, 0.4, color, 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.accel._external.ssdvgg_utils as ssdvgg_utils\n",
"\n",
"result = client.score_file(path=\"meeting.jpg\", input_name=input_tensors, outputs=output_tensors)\n",
"classes, scores, bboxes = ssdvgg_utils.postprocess(result, select_threshold=0.5)\n",
"\n",
"img = cv2.imread('meeting.jpg', 1)\n",
"img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
"draw_boxes_on_img(img, classes, scores, bboxes)\n",
"plt.imshow(img)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"cleanup\"></a>\n",
"## 6. Cleanup\n",
"It's important to clean up your resources, so that you won't incur unnecessary costs. In the [next notebook](./accelerated-models-training.ipynb) you will learn how to train a classfier on a new dataset using transfer learning."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.delete()\n",
"aks_target.delete()\n",
"image.delete()\n",
"registered_model.delete()\n",
"converted_model.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "coverste"
},
{
"name": "paledger"
},
{
"name": "sukha"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,548 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure ML Hardware Accelerated Models Quickstart"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial will show you how to deploy an image recognition service based on the ResNet 50 classifier using the Azure Machine Learning Accelerated Models service. Get more information about our service from our [documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-accelerate-with-fpgas), [API reference](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel?view=azure-ml-py), or [forum](https://aka.ms/aml-forum).\n",
"\n",
"We will use an accelerated ResNet50 featurizer running on an FPGA. Our Accelerated Models Service handles translating deep neural networks (DNN) into an FPGA program.\n",
"\n",
"For more information about using other models besides Resnet50, see the [README](./README.md).\n",
"\n",
"The steps covered in this notebook are: \n",
"1. [Set up environment](#set-up-environment)\n",
"* [Construct model](#construct-model)\n",
" * Image Preprocessing\n",
" * Featurizer (Resnet50)\n",
" * Classifier\n",
" * Save Model\n",
"* [Register Model](#register-model)\n",
"* [Convert into Accelerated Model](#convert-model)\n",
"* [Create Image](#create-image)\n",
"* [Deploy](#deploy-image)\n",
"* [Test service](#test-service)\n",
"* [Clean-up](#clean-up)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"set-up-environment\"></a>\n",
"## 1. Set up environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve Workspace\n",
"If you haven't created a Workspace, please follow [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) to do so. If you have, run the codeblock below to retrieve it. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"construct-model\"></a>\n",
"## 2. Construct model\n",
"\n",
"There are three parts to the model we are deploying: pre-processing, featurizer with ResNet50, and classifier with ImageNet dataset. Then we will save this complete Tensorflow model graph locally before registering it to your Azure ML Workspace.\n",
"\n",
"### 2.a. Image preprocessing\n",
"We'd like our service to accept JPEG images as input. However the input to ResNet50 is a tensor. So we need code that decodes JPEG images and does the preprocessing required by ResNet50. The Accelerated AI service can execute TensorFlow graphs as part of the service and we'll use that ability to do the image preprocessing. This code defines a TensorFlow graph that preprocesses an array of JPEG images (as strings) and produces a tensor that is ready to be featurized by ResNet50.\n",
"\n",
"**Note:** Expect to see TF deprecation warnings until we port our SDK over to use Tensorflow 2.0."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n",
"import azureml.accel.models.utils as utils\n",
"tf.reset_default_graph()\n",
"\n",
"in_images = tf.placeholder(tf.string)\n",
"image_tensors = utils.preprocess_array(in_images)\n",
"print(image_tensors.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.b. Featurizer\n",
"We use ResNet50 as a featurizer. In this step we initialize the model. This downloads a TensorFlow checkpoint of the quantized ResNet50."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.accel.models import QuantizedResnet50\n",
"save_path = os.path.expanduser('~/models')\n",
"model_graph = QuantizedResnet50(save_path, is_frozen = True)\n",
"feature_tensor = model_graph.import_graph_def(image_tensors)\n",
"print(model_graph.version)\n",
"print(feature_tensor.name)\n",
"print(feature_tensor.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.c. Classifier\n",
"The model we downloaded includes a classifier which takes the output of the ResNet50 and identifies an image. This classifier is trained on the ImageNet dataset. We are going to use this classifier for our service. The next [notebook](./accelerated-models-training.ipynb) shows how to train a classifier for a different data set. The input to the classifier is a tensor matching the output of our ResNet50 featurizer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"classifier_output = model_graph.get_default_classifier(feature_tensor)\n",
"print(classifier_output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.d. Save Model\n",
"Now that we loaded all three parts of the tensorflow graph (preprocessor, resnet50 featurizer, and the classifier), we can save the graph and associated variables to a directory which we can register as an Azure ML Model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# model_name must be lowercase\n",
"model_name = \"resnet50\"\n",
"model_save_path = os.path.join(save_path, model_name)\n",
"print(\"Saving model in {}\".format(model_save_path))\n",
"\n",
"with tf.Session() as sess:\n",
" model_graph.restore_weights(sess)\n",
" tf.saved_model.simple_save(sess, model_save_path,\n",
" inputs={'images': in_images},\n",
" outputs={'output_alias': classifier_output})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.e. Important! Save names of input and output tensors\n",
"\n",
"These input and output tensors that were created during the preprocessing and classifier steps are also going to be used when **converting the model** to an Accelerated Model that can run on FPGA's and for **making an inferencing request**. It is very important to save this information! You can see our defaults for all the models in the [README](./README.md).\n",
"\n",
"By default for Resnet50, these are the values you should see when running the cell below: \n",
"* input_tensors = \"Placeholder:0\"\n",
"* output_tensors = \"classifier/resnet_v1_50/predictions/Softmax:0\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"input_tensors = in_images.name\n",
"output_tensors = classifier_output.name\n",
"\n",
"print(input_tensors)\n",
"print(output_tensors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"register-model\"></a>\n",
"## 3. Register Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can add tags and descriptions to your models. Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"from azureml.core.model import Model\n",
"\n",
"registered_model = Model.register(workspace = ws,\n",
" model_path = model_save_path,\n",
" model_name = model_name)\n",
"\n",
"print(\"Successfully registered: \", registered_model.name, registered_model.description, registered_model.version, sep = '\\t')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"convert-model\"></a>\n",
"## 4. Convert Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For conversion you need to provide names of input and output tensors. This information can be found from the model_graph you saved in step 2.e. above.\n",
"\n",
"**Note**: Conversion may take a while and on average for FPGA model it is about 1-3 minutes and it depends on model type."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"outputs": [],
"source": [
"from azureml.accel import AccelOnnxConverter\n",
"\n",
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
"convert_request.wait_for_completion(show_output = False)\n",
"# If the above call succeeded, get the converted model\n",
"converted_model = convert_request.result\n",
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
" converted_model.id, converted_model.created_time, '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"create-image\"></a>\n",
"## 5. Package the model into an Image"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can add tags and descriptions to image. Also, for FPGA model an image can only contain **single** model.\n",
"\n",
"**Note**: The following command can take few minutes. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import Image\n",
"from azureml.accel import AccelContainerImage\n",
"\n",
"image_config = AccelContainerImage.image_configuration()\n",
"# Image name must be lowercase\n",
"image_name = \"{}-image\".format(model_name)\n",
"\n",
"image = Image.create(name = image_name,\n",
" models = [converted_model],\n",
" image_config = image_config, \n",
" workspace = ws)\n",
"image.wait_for_creation(show_output = False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"deploy-image\"></a>\n",
"## 6. Deploy\n",
"Once you have an Azure ML Accelerated Image in your Workspace, you can deploy it to two destinations, to a Databox Edge machine or to an AKS cluster. \n",
"\n",
"### 6.a. Databox Edge Machine using IoT Hub\n",
"See the sample [here](https://github.com/Azure-Samples/aml-real-time-ai/) for using the Azure IoT CLI extension for deploying your Docker image to your Databox Edge Machine.\n",
"\n",
"### 6.b. Azure Kubernetes Service (AKS) using Azure ML Service\n",
"We are going to create an AKS cluster with FPGA-enabled machines, then deploy our service to it. For more information, see [AKS official docs](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#aks).\n",
"\n",
"#### Create AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Uses the specific FPGA enabled VM (sku: Standard_PB6s)\n",
"# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia\n",
"prov_config = AksCompute.provisioning_configuration(vm_size = \"Standard_PB6s\",\n",
" agent_count = 1, \n",
" location = \"eastus\")\n",
"\n",
"aks_name = 'my-aks-pb6'\n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Provisioning an AKS cluster might take awhile (15 or so minutes), and we want to wait until it's successfully provisioned before we can deploy a service to it. If you interrupt this cell, provisioning of the cluster will continue. You can also check the status in your Workspace under Compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deploy AccelContainerImage to AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice, AksWebservice\n",
"\n",
"#Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
"# Authentication is enabled by default, but for testing we specify False\n",
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target)\n",
"aks_service.wait_for_deployment(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-service\"></a>\n",
"## 7. Test the service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7.a. Create Client\n",
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions.\n",
"\n",
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice, see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
"\n",
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using the grpc client in AzureML Accelerated Models SDK\n",
"from azureml.accel.client import PredictionClient\n",
"\n",
"address = aks_service.scoring_uri\n",
"ssl_enabled = address.startswith(\"https\")\n",
"address = address[address.find('/')+2:].strip('/')\n",
"port = 443 if ssl_enabled else 80\n",
"\n",
"# Initialize AzureML Accelerated Models client\n",
"client = PredictionClient(address=address,\n",
" port=port,\n",
" use_ssl=ssl_enabled,\n",
" service_name=aks_service.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can adapt the client [code](https://github.com/Azure/aml-real-time-ai/blob/master/pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](https://github.com/Azure/aml-real-time-ai/blob/master/sample-clients/csharp).\n",
"\n",
"The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7.b. Serve the model\n",
"To understand the results we need a mapping to the human readable imagenet classes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"classes_entries = requests.get(\"https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt\").text.splitlines()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Score image with input and output tensor names\n",
"results = client.score_file(path=\"./snowleopardgaze.jpg\", \n",
" input_name=input_tensors, \n",
" outputs=output_tensors)\n",
"\n",
"# map results [class_id] => [confidence]\n",
"results = enumerate(results)\n",
"# sort results by confidence\n",
"sorted_results = sorted(results, key=lambda x: x[1], reverse=True)\n",
"# print top 5 results\n",
"for top in sorted_results[:5]:\n",
" print(classes_entries[top[0]], 'confidence:', top[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"clean-up\"></a>\n",
"## 8. Clean-up\n",
"Run the cell below to delete your webservice, image, and model (must be done in that order). In the [next notebook](./accelerated-models-training.ipynb) you will learn how to train a classfier on a new dataset using transfer learning and finetune the weights."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.delete()\n",
"aks_target.delete()\n",
"image.delete()\n",
"registered_model.delete()\n",
"converted_model.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "coverste"
},
{
"name": "paledger"
},
{
"name": "aibhalla"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,862 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training with the Azure Machine Learning Accelerated Models Service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook will introduce how to apply common machine learning techniques, like transfer learning, custom weights, and unquantized vs. quantized models, when working with our Azure Machine Learning Accelerated Models Service (Azure ML Accel Models).\n",
"\n",
"We will use Tensorflow for the preprocessing steps, ResNet50 for the featurizer, and the Keras API (built on Tensorflow backend) to build the classifier layers instead of the default ImageNet classifier used in Quickstart. Then we will train the model, evaluate it, and deploy it to run on an FPGA.\n",
"\n",
"#### Transfer Learning and Custom weights\n",
"We will walk you through two ways to build and train a ResNet50 model on the Kaggle Cats and Dogs dataset: transfer learning only and then transfer learning with custom weights.\n",
"\n",
"In using transfer learning, our goal is to re-purpose the ResNet50 model already trained on the [ImageNet image dataset](http://www.image-net.org/) as a basis for our training of the Kaggle Cats and Dogs dataset. The ResNet50 featurizer will be imported as frozen, so only the Keras classifier will be trained.\n",
"\n",
"With the addition of custom weights, we will build the model so that the ResNet50 featurizer weights as not frozen. This will let us retrain starting with custom weights trained with ImageNet on ResNet50 and then use the Kaggle Cats and Dogs dataset to retrain and fine-tune the quantized version of the model.\n",
"\n",
"#### Unquantized vs. Quantized models\n",
"The unquantized version of our models (ie. Resnet50, Resnet152, Densenet121, Vgg16, SsdVgg) uses native float precision (32-bit floats), which will be faster at training. We will use this for our first run through, then fine-tune the weights with the quantized version. The quantized version of our models (i.e. QuantizedResnet50, QuantizedResnet152, QuantizedDensenet121, QuantizedVgg16, QuantizedSsdVgg) will have the same node names as the unquantized version, but use quantized operations and will match the performance of the model when running on an FPGA.\n",
"\n",
"#### Contents\n",
"1. [Setup Environment](#setup)\n",
"* [Prepare Data](#prepare-data)\n",
"* [Construct Model](#construct-model)\n",
" * Preprocessor\n",
" * Classifier\n",
" * Model construction\n",
"* [Train Model](#train-model)\n",
"* [Test Model](#test-model)\n",
"* [Execution](#execution)\n",
" * [Transfer Learning](#transfer-learning)\n",
" * [Transfer Learning with Custom Weights](#custom-weights)\n",
"* [Create Image](#create-image)\n",
"* [Deploy Image](#deploy-image)\n",
"* [Test the service](#test-service)\n",
"* [Clean-up](#cleanup)\n",
"* [Appendix](#appendix)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"setup\"></a>\n",
"## 1. Setup Environment\n",
"#### 1.a. Please set up your environment as described in the [Quickstart](./accelerated-models-quickstart.ipynb), meaning:\n",
"* Make sure your Workspace config.json exists and has the correct info\n",
"* Install Tensorflow\n",
"\n",
"#### 1.b. Download dataset into ~/catsanddogs \n",
"The dataset we will be using for training can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). Download the zip and extract to a directory named 'catsanddogs' under your user directory (\"~/catsanddogs\"). \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.c. Import packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import sys\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"from keras import backend as K\n",
"import sklearn\n",
"import tqdm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.d. Create directories for later use\n",
"After you train your model in float32, you'll write the weights to a place on disk. We also need a location to store the models that get downloaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_weights_dir = os.path.expanduser(\"~/custom-weights\")\n",
"saved_model_dir = os.path.expanduser(\"~/models\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"prepare-data\"></a>\n",
"## 2. Prepare Data\n",
"Load the files we are going to use for training and testing. By default this notebook uses only a very small subset of the Cats and Dogs dataset. That makes it run relatively quickly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import glob\n",
"import imghdr\n",
"datadir = os.path.expanduser(\"~/catsanddogs\")\n",
"\n",
"cat_files = glob.glob(os.path.join(datadir, 'PetImages', 'Cat', '*.jpg'))\n",
"dog_files = glob.glob(os.path.join(datadir, 'PetImages', 'Dog', '*.jpg'))\n",
"\n",
"# Limit the data set to make the notebook execute quickly.\n",
"cat_files = cat_files[:64]\n",
"dog_files = dog_files[:64]\n",
"\n",
"# The data set has a few images that are not jpeg. Remove them.\n",
"cat_files = [f for f in cat_files if imghdr.what(f) == 'jpeg']\n",
"dog_files = [f for f in dog_files if imghdr.what(f) == 'jpeg']\n",
"\n",
"if(not len(cat_files) or not len(dog_files)):\n",
" print(\"Please download the Kaggle Cats and Dogs dataset form https://www.microsoft.com/en-us/download/details.aspx?id=54765 and extract the zip to \" + datadir) \n",
" raise ValueError(\"Data not found\")\n",
"else:\n",
" print(cat_files[0])\n",
" print(dog_files[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Construct a numpy array as labels\n",
"image_paths = cat_files + dog_files\n",
"total_files = len(cat_files) + len(dog_files)\n",
"labels = np.zeros(total_files)\n",
"labels[len(cat_files):] = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Split images data as training data and test data\n",
"from sklearn.model_selection import train_test_split\n",
"onehot_labels = np.array([[0,1] if i else [1,0] for i in labels])\n",
"img_train, img_test, label_train, label_test = train_test_split(image_paths, onehot_labels, random_state=42, shuffle=True)\n",
"\n",
"print(len(img_train), len(img_test), label_train.shape, label_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"construct-model\"></a>\n",
"## 3. Construct Model\n",
"We will define the functions to handle creating the preprocessor and the classifier first, and then run them together to actually construct the model with the Resnet50 featurizer in a single Tensorflow session in a separate cell.\n",
"\n",
"We use ResNet50 for the featurizer and build our own classifier using Keras layers. We train the featurizer and the classifier as one model. We will provide parameters to determine whether we are using the quantized version and whether we are using custom weights in training or not."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.a. Define image preprocessing step\n",
"Same as in the Quickstart, before passing image dataset to the ResNet50 featurizer, we need to preprocess the input file to get it into the form expected by ResNet50. ResNet50 expects float tensors representing the images in BGR, channel last order. We've provided a default implementation of the preprocessing that you can use.\n",
"\n",
"**Note:** Expect to see TF deprecation warnings until we port our SDK over to use Tensorflow 2.0."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.accel.models.utils as utils\n",
"\n",
"def preprocess_images(scaling_factor=1.0):\n",
" # Convert images to 3D tensors [width,height,channel] - channels are in BGR order.\n",
" in_images = tf.placeholder(tf.string)\n",
" image_tensors = utils.preprocess_array(in_images, 'RGB', scaling_factor)\n",
" return in_images, image_tensors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.b. Define classifier\n",
"We use Keras layer APIs to construct the classifier. Because we're using the tensorflow backend, we can train this classifier in one session with our Resnet50 model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def construct_classifier(in_tensor, seed=None):\n",
" from keras.layers import Dropout, Dense, Flatten\n",
" from keras.initializers import glorot_uniform\n",
" K.set_session(tf.get_default_session())\n",
"\n",
" FC_SIZE = 1024\n",
" NUM_CLASSES = 2\n",
"\n",
" x = Dropout(0.2, input_shape=(1, 1, int(in_tensor.shape[3]),), seed=seed)(in_tensor)\n",
" x = Dense(FC_SIZE, activation='relu', input_dim=(1, 1, int(in_tensor.shape[3]),),\n",
" kernel_initializer=glorot_uniform(seed=seed), bias_initializer='zeros')(x)\n",
" x = Flatten()(x)\n",
" preds = Dense(NUM_CLASSES, activation='softmax', input_dim=FC_SIZE, name='classifier_output',\n",
" kernel_initializer=glorot_uniform(seed=seed), bias_initializer='zeros')(x)\n",
" return preds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.c. Define model construction\n",
"Now that the preprocessor and classifier for the model are defined, we can define how we want to construct the model. \n",
"\n",
"Constructing the model has these steps: \n",
"1. Get preprocessing steps\n",
"* Get featurizer using the Azure ML Accel Models SDK:\n",
" * import the graph definition\n",
" * restore the weights of the model into a Tensorflow session\n",
"* Get classifier\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def construct_model(quantized, starting_weights_directory = None):\n",
" from azureml.accel.models import Resnet50, QuantizedResnet50\n",
" \n",
" # Convert images to 3D tensors [width,height,channel]\n",
" in_images, image_tensors = preprocess_images(1.0)\n",
"\n",
" # Construct featurizer using quantized or unquantized ResNet50 model\n",
" if not quantized:\n",
" featurizer = Resnet50(saved_model_dir)\n",
" else:\n",
" featurizer = QuantizedResnet50(saved_model_dir, custom_weights_directory = starting_weights_directory)\n",
"\n",
" features = featurizer.import_graph_def(input_tensor=image_tensors)\n",
" \n",
" # Construct classifier\n",
" preds = construct_classifier(features)\n",
" \n",
" # Initialize weights\n",
" sess = tf.get_default_session()\n",
" tf.global_variables_initializer().run()\n",
"\n",
" featurizer.restore_weights(sess)\n",
"\n",
" return in_images, image_tensors, features, preds, featurizer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"train-model\"></a>\n",
"## 4. Train Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def read_files(files):\n",
" \"\"\" Read files to array\"\"\"\n",
" contents = []\n",
" for path in files:\n",
" with open(path, 'rb') as f:\n",
" contents.append(f.read())\n",
" return contents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def train_model(preds, in_images, img_train, label_train, is_retrain = False, train_epoch = 10, learning_rate=None):\n",
" \"\"\" training model \"\"\"\n",
" from keras.objectives import binary_crossentropy\n",
" from tqdm import tqdm\n",
" \n",
" learning_rate = learning_rate if learning_rate else 0.001 if is_retrain else 0.01\n",
" \n",
" # Specify the loss function\n",
" in_labels = tf.placeholder(tf.float32, shape=(None, 2)) \n",
" cross_entropy = tf.reduce_mean(binary_crossentropy(in_labels, preds))\n",
" optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)\n",
"\n",
" def chunks(a, b, n):\n",
" \"\"\"Yield successive n-sized chunks from a and b.\"\"\"\n",
" if (len(a) != len(b)):\n",
" print(\"a and b are not equal in chunks(a,b,n)\")\n",
" raise ValueError(\"Parameter error\")\n",
"\n",
" for i in range(0, len(a), n):\n",
" yield a[i:i + n], b[i:i + n]\n",
"\n",
" chunk_size = 16\n",
" chunk_num = len(label_train) / chunk_size\n",
"\n",
" sess = tf.get_default_session()\n",
" for epoch in range(train_epoch):\n",
" avg_loss = 0\n",
" for img_chunk, label_chunk in tqdm(chunks(img_train, label_train, chunk_size)):\n",
" contents = read_files(img_chunk)\n",
" _, loss = sess.run([optimizer, cross_entropy],\n",
" feed_dict={in_images: contents,\n",
" in_labels: label_chunk,\n",
" K.learning_phase(): 1})\n",
" avg_loss += loss / chunk_num\n",
" print(\"Epoch:\", (epoch + 1), \"loss = \", \"{:.3f}\".format(avg_loss))\n",
" \n",
" # Reach desired performance\n",
" if (avg_loss < 0.001):\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-model\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-model\"></a>\n",
"## 5. Test Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def test_model(preds, in_images, img_test, label_test):\n",
" \"\"\"Test the model\"\"\"\n",
" from keras.metrics import categorical_accuracy\n",
"\n",
" in_labels = tf.placeholder(tf.float32, shape=(None, 2))\n",
" accuracy = tf.reduce_mean(categorical_accuracy(in_labels, preds))\n",
" contents = read_files(img_test)\n",
"\n",
" accuracy = accuracy.eval(feed_dict={in_images: contents,\n",
" in_labels: label_test,\n",
" K.learning_phase(): 0})\n",
" return accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"execution\"></a>\n",
"## 6. Execute steps\n",
"You can run through the Transfer Learning section, then skip to Create AccelContainerImage. By default, because the custom weights section takes much longer for training twice, it is not saved as executable cells. You can copy the code or change cell type to 'Code'.\n",
"\n",
"<a id=\"transfer-learning\"></a>\n",
"### 6.a. Training using Transfer Learning"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Launch the training\n",
"tf.reset_default_graph()\n",
"sess = tf.Session(graph=tf.get_default_graph())\n",
"\n",
"with sess.as_default():\n",
" in_images, image_tensors, features, preds, featurizer = construct_model(quantized=True)\n",
" train_model(preds, in_images, img_train, label_train, is_retrain=False, train_epoch=10, learning_rate=0.01) \n",
" accuracy = test_model(preds, in_images, img_test, label_test) \n",
" print(\"Accuracy:\", accuracy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Save Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_name = 'resnet50-catsanddogs-tl'\n",
"model_save_path = os.path.join(saved_model_dir, model_name)\n",
"\n",
"tf.saved_model.simple_save(sess, model_save_path,\n",
" inputs={'images': in_images},\n",
" outputs={'output_alias': preds})\n",
"\n",
"input_tensors = in_images.name\n",
"output_tensors = preds.name\n",
"\n",
"print(input_tensors)\n",
"print(output_tensors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"custom-weights\"></a>\n",
"### 6.b. Traning using Custom Weights\n",
"\n",
"Because the quantized graph defintion and the float32 graph defintion share the same node names in the graph definitions, we can initally train the weights in float32, and then reload them with the quantized operations (which take longer) to fine-tune the model.\n",
"\n",
"First we train the model with custom weights but without quantization. Training is done with native float precision (32-bit floats). We load the training data set and batch the training with 10 epochs. When the performance reaches desired level or starts decredation, we stop the training iteration and save the weights as tensorflow checkpoint files. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Launch the training\n",
"```\n",
"tf.reset_default_graph()\n",
"sess = tf.Session(graph=tf.get_default_graph())\n",
"\n",
"with sess.as_default():\n",
" in_images, image_tensors, features, preds, featurizer = construct_model(quantized=False)\n",
" train_model(preds, in_images, img_train, label_train, is_retrain=False, train_epoch=10) \n",
" accuracy = test_model(preds, in_images, img_test, label_test) \n",
" print(\"Accuracy:\", accuracy)\n",
" featurizer.save_weights(custom_weights_dir + \"/rn50\", tf.get_default_session())\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Test Model\n",
"After training, we evaluate the trained model's accuracy on test dataset with quantization. So that we know the model's performance if it is deployed on the FPGA."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"tf.reset_default_graph()\n",
"sess = tf.Session(graph=tf.get_default_graph())\n",
"\n",
"with sess.as_default():\n",
" print(\"Testing trained model with quantization\")\n",
" in_images, image_tensors, features, preds, quantized_featurizer = construct_model(quantized=True, starting_weights_directory=custom_weights_dir)\n",
" accuracy = test_model(preds, in_images, img_test, label_test) \n",
" print(\"Accuracy:\", accuracy)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Fine-Tune Model\n",
"Sometimes, the model's accuracy can drop significantly after quantization. In those cases, we need to retrain the model enabled with quantization to get better model accuracy."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"if (accuracy < 0.93):\n",
" with sess.as_default():\n",
" print(\"Fine-tuning model with quantization\")\n",
" train_model(preds, in_images, img_train, label_train, is_retrain=True, train_epoch=10)\n",
" accuracy = test_model(preds, in_images, img_test, label_test) \n",
" print(\"Accuracy:\", accuracy)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Save Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"model_name = 'resnet50-catsanddogs-cw'\n",
"model_save_path = os.path.join(saved_model_dir, model_name)\n",
"\n",
"tf.saved_model.simple_save(sess, model_save_path,\n",
" inputs={'images': in_images},\n",
" outputs={'output_alias': preds})\n",
"\n",
"input_tensors = in_images.name\n",
"output_tensors = preds.name\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"create-image\"></a>\n",
"## 7. Create AccelContainerImage\n",
"\n",
"Below we will execute all the same steps as in the [Quickstart](./accelerated-models-quickstart.ipynb#create-image) to package the model we have saved locally into an accelerated Docker image saved in our workspace. To complete all the steps, it may take a few minutes. For more details on each step, check out the [Quickstart section on model registration](./accelerated-models-quickstart.ipynb#register-model)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"from azureml.core.model import Model\n",
"from azureml.core.image import Image\n",
"from azureml.accel import AccelOnnxConverter\n",
"from azureml.accel import AccelContainerImage\n",
"\n",
"# Retrieve workspace\n",
"ws = Workspace.from_config()\n",
"print(\"Successfully retrieved workspace:\", ws.name, ws.resource_group, ws.location, ws.subscription_id, '\\n')\n",
"\n",
"# Register model\n",
"registered_model = Model.register(workspace = ws,\n",
" model_path = model_save_path,\n",
" model_name = model_name)\n",
"print(\"Successfully registered: \", registered_model.name, registered_model.description, registered_model.version, '\\n', sep = '\\t')\n",
"\n",
"# Convert model\n",
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
"convert_request.wait_for_completion(show_output=False)\n",
"converted_model = convert_request.result\n",
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
" converted_model.id, converted_model.created_time, '\\n')\n",
"\n",
"# Package into AccelContainerImage\n",
"image_config = AccelContainerImage.image_configuration()\n",
"# Image name must be lowercase\n",
"image_name = \"{}-image\".format(model_name)\n",
"image = Image.create(name = image_name,\n",
" models = [converted_model],\n",
" image_config = image_config, \n",
" workspace = ws)\n",
"image.wait_for_creation()\n",
"print(\"Created AccelContainerImage: {} {} {}\\n\".format(image.name, image.creation_state, image.image_location))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"deploy-image\"></a>\n",
"## 8. Deploy image\n",
"Once you have an Azure ML Accelerated Image in your Workspace, you can deploy it to two destinations, to a Databox Edge machine or to an AKS cluster. \n",
"\n",
"### 8.a. Deploy to Databox Edge Machine using IoT Hub\n",
"See the sample [here](https://github.com/Azure-Samples/aml-real-time-ai/) for using the Azure IoT CLI extension for deploying your Docker image to your Databox Edge Machine.\n",
"\n",
"### 8.b. Deploy to AKS Cluster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Uses the specific FPGA enabled VM (sku: Standard_PB6s)\n",
"# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia\n",
"prov_config = AksCompute.provisioning_configuration(vm_size = \"Standard_PB6s\",\n",
" agent_count = 1,\n",
" location = \"eastus\")\n",
"\n",
"aks_name = 'aks-pb6-tl'\n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Provisioning an AKS cluster might take awhile (15 or so minutes), and we want to wait until it's successfully provisioned before we can deploy a service to it. If you interrupt this cell, provisioning of the cluster will continue. You can re-run it or check the status in your Workspace under Compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deploy AccelContainerImage to AKS ComputeTarget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice, AksWebservice\n",
"\n",
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
"# Authentication is enabled by default, but for testing we specify False\n",
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
" num_replicas=1,\n",
" auth_enabled = False)\n",
"\n",
"aks_service_name ='my-aks-service'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target)\n",
"aks_service.wait_for_deployment(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"test-service\"></a>\n",
"## 9. Test the service\n",
"\n",
"<a id=\"create-client\"></a>\n",
"### 9.a. Create Client\n",
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions. \n",
"\n",
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
"\n",
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using the grpc client in AzureML Accelerated Models SDK\n",
"from azureml.accel.client import PredictionClient\n",
"\n",
"address = aks_service.scoring_uri\n",
"ssl_enabled = address.startswith(\"https\")\n",
"address = address[address.find('/')+2:].strip('/')\n",
"port = 443 if ssl_enabled else 80\n",
"\n",
"# Initialize AzureML Accelerated Models client\n",
"client = PredictionClient(address=address,\n",
" port=port,\n",
" use_ssl=ssl_enabled,\n",
" service_name=aks_service.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"serve-model\"></a>\n",
"### 9.b. Serve the model\n",
"Let's see how our service does on a few images. It may get a few wrong."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Specify an image to classify\n",
"print('CATS')\n",
"for image_file in cat_files[:8]:\n",
" results = client.score_file(path=image_file, \n",
" input_name=input_tensors, \n",
" outputs=output_tensors)\n",
" result = 'CORRECT ' if results[0] > results[1] else 'WRONG '\n",
" print(result + str(results))\n",
"print('DOGS')\n",
"for image_file in dog_files[:8]:\n",
" results = client.score_file(path=image_file, \n",
" input_name=input_tensors, \n",
" outputs=output_tensors)\n",
" result = 'CORRECT ' if results[1] > results[0] else 'WRONG '\n",
" print(result + str(results))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"cleanup\"></a>\n",
"## 10. Cleanup\n",
"It's important to clean up your resources, so that you won't incur unnecessary costs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.delete()\n",
"aks_target.delete()\n",
"image.delete()\n",
"registered_model.delete()\n",
"converted_model.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"appendix\"></a>\n",
"## 11. Appendix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"License for plot_confusion_matrix:\n",
"\n",
"New BSD License\n",
"\n",
"Copyright (c) 2007-2018 The scikit-learn developers.\n",
"All rights reserved.\n",
"\n",
"\n",
"Redistribution and use in source and binary forms, with or without\n",
"modification, are permitted provided that the following conditions are met:\n",
"\n",
" a. Redistributions of source code must retain the above copyright notice,\n",
" this list of conditions and the following disclaimer.\n",
" b. Redistributions in binary form must reproduce the above copyright\n",
" notice, this list of conditions and the following disclaimer in the\n",
" documentation and/or other materials provided with the distribution.\n",
" c. Neither the name of the Scikit-learn Developers nor the names of\n",
" its contributors may be used to endorse or promote products\n",
" derived from this software without specific prior written\n",
" permission. \n",
"\n",
"\n",
"THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n",
"AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n",
"IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n",
"ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR\n",
"ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n",
"DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n",
"SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n",
"CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\n",
"LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY\n",
"OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH\n",
"DAMAGE.\n"
]
}
],
"metadata": {
"authors": [
{
"name": "coverste"
},
{
"name": "paledger"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

View File

@@ -25,6 +25,13 @@
"3. Build new image and deploy it. " "3. Build new image and deploy it. "
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -1,5 +1,12 @@
{ {
"cells": [ "cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -4,7 +4,7 @@ These tutorials show how to create and deploy Open Neural Network eXchange ([ONN
## Tutorials ## Tutorials
0. [Configure your Azure Machine Learning Workspace](../../../configuration.ipynb) 0. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, [Configure your Azure Machine Learning Workspace](../../../configuration.ipynb)
#### Obtain pretrained models from the [ONNX Model Zoo](https://github.com/onnx/models) and deploy with ONNX Runtime #### Obtain pretrained models from the [ONNX Model Zoo](https://github.com/onnx/models) and deploy with ONNX Runtime
1. [MNIST - Handwritten Digit Classification with ONNX Runtime](onnx-inference-mnist-deploy.ipynb) 1. [MNIST - Handwritten Digit Classification with ONNX Runtime](onnx-inference-mnist-deploy.ipynb)
@@ -34,3 +34,6 @@ Licensed under the MIT License.
## Acknowledgements ## Acknowledgements
These tutorials were developed by Vinitra Swamy and Prasanth Pulavarthi of the Microsoft AI Frameworks team and adapted for presentation at Microsoft Ignite 2018. These tutorials were developed by Vinitra Swamy and Prasanth Pulavarthi of the Microsoft AI Frameworks team and adapted for presentation at Microsoft Ignite 2018.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/README.png)

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -33,7 +40,7 @@
"To make the best use of your time, make sure you have done the following:\n", "To make the best use of your time, make sure you have done the following:\n",
"\n", "\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* Go through the [configuration](../../../configuration.ipynb) notebook to:\n", "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook to:\n",
" * install the AML SDK\n", " * install the AML SDK\n",
" * create a workspace and its configuration file (config.json)" " * create a workspace and its configuration file (config.json)"
] ]
@@ -248,7 +255,7 @@
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies \n", "from azureml.core.conda_dependencies import CondaDependencies \n",
"\n", "\n",
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime==0.4.0\",\"azureml-core\"])\n",
"\n", "\n",
"with open(\"myenv.yml\",\"w\") as f:\n", "with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())" " f.write(myenv.serialize_to_string())"

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -34,7 +41,7 @@
"## Prerequisites\n", "## Prerequisites\n",
"\n", "\n",
"### 1. Install Azure ML SDK and create a new workspace\n", "### 1. Install Azure ML SDK and create a new workspace\n",
"Please follow [Azure ML configuration notebook](../../../configuration.ipynb) to set up your environment.\n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, please follow [Azure ML configuration notebook](../../../configuration.ipynb) to set up your environment.\n",
"\n", "\n",
"### 2. Install additional packages needed for this Notebook\n", "### 2. Install additional packages needed for this Notebook\n",
"You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n", "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n",

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -34,7 +41,7 @@
"## Prerequisites\n", "## Prerequisites\n",
"\n", "\n",
"### 1. Install Azure ML SDK and create a new workspace\n", "### 1. Install Azure ML SDK and create a new workspace\n",
"Please follow [Azure ML configuration notebook](../../../configuration.ipynb) to set up your environment.\n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, please follow [Azure ML configuration notebook](../../../configuration.ipynb) to set up your environment.\n",
"\n", "\n",
"### 2. Install additional packages needed for this tutorial notebook\n", "### 2. Install additional packages needed for this tutorial notebook\n",
"You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed. \n", "You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed. \n",

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -33,7 +40,7 @@
"To make the best use of your time, make sure you have done the following:\n", "To make the best use of your time, make sure you have done the following:\n",
"\n", "\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* Go through the [configuration notebook](../../../configuration.ipynb) to:\n", "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n",
" * install the AML SDK\n", " * install the AML SDK\n",
" * create a workspace and its configuration file (config.json)" " * create a workspace and its configuration file (config.json)"
] ]

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -30,7 +37,7 @@
"source": [ "source": [
"## Prerequisites\n", "## Prerequisites\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* Go through the [configuration notebook](../../../configuration.ipynb) to:\n", "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n",
" * install the AML SDK\n", " * install the AML SDK\n",
" * create a workspace and its configuration file (`config.json`)" " * create a workspace and its configuration file (`config.json`)"
] ]
@@ -91,7 +98,7 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# choose a name for your cluster\n", "# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n", "cluster_name = \"gpu-cluster\"\n",
"\n", "\n",
"try:\n", "try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",

View File

@@ -0,0 +1,407 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploying a web service to Azure Kubernetes Service (AKS)\n",
"This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n",
"We then test and delete the service, image and model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"from azureml.core.webservice import Webservice, AksWebservice\n",
"from azureml.core.image import Image\n",
"from azureml.core.model import Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"print(azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get workspace\n",
"Load existing workspace from the config file info."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.workspace import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Register the model\n",
"Register an existing trained model, add descirption and tags. Prior to registering the model, you should have a TensorFlow [Saved Model](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md) in the `resnet50` directory. You can download a [pretrained resnet50](https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model) and unpack it to that directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Register the model\n",
"from azureml.core.model import Model\n",
"model = Model.register(model_path = \"resnet50\", # this points to a local file\n",
" model_name = \"resnet50\", # this is the name the model is registered as\n",
" tags = {'area': \"Image classification\", 'type': \"classification\"},\n",
" description = \"Image classification trained on Imagenet Dataset\",\n",
" workspace = ws)\n",
"\n",
"print(model.name, model.description, model.version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create an image\n",
"Create an image using the registered model the script that will load and run the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"import ujson\n",
"from azureml.core.model import Model\n",
"from azureml.contrib.services.aml_request import AMLRequest, rawhttp\n",
"from azureml.contrib.services.aml_response import AMLResponse\n",
"\n",
"def init():\n",
" global session\n",
" global input_name\n",
" global output_name\n",
" \n",
" session = tf.Session()\n",
"\n",
" model_path = Model.get_model_path('resnet50')\n",
" model = tf.saved_model.loader.load(session, ['serve'], model_path)\n",
" if len(model.signature_def['serving_default'].inputs) > 1:\n",
" raise ValueError(\"This score.py only supports one input\")\n",
" if len(model.signature_def['serving_default'].outputs) > 1:\n",
" raise ValueError(\"This score.py only supports one input\")\n",
" input_name = [tensor.name for tensor in model.signature_def['serving_default'].inputs.values()][0]\n",
" output_name = [tensor.name for tensor in model.signature_def['serving_default'].outputs.values()][0]\n",
" \n",
"\n",
"@rawhttp\n",
"def run(request):\n",
" if request.method == 'POST':\n",
" reqBody = request.get_data(False)\n",
" resp = score(reqBody)\n",
" return AMLResponse(resp, 200)\n",
" if request.method == 'GET':\n",
" respBody = str.encode(\"GET is not supported\")\n",
" return AMLResponse(respBody, 405)\n",
" return AMLResponse(\"bad request\", 500)\n",
"\n",
"def score(data):\n",
" result = session.run(output_name, {input_name: [data]})\n",
" return ujson.dumps(result[0])\n",
"\n",
"if __name__ == \"__main__\":\n",
" init()\n",
" with open(\"test_image.jpg\", 'rb') as f:\n",
" content = f.read()\n",
" print(score(content))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['tensorflow-gpu==1.12.0','numpy','ujson','azureml-contrib-services'])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import ContainerImage\n",
"\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" gpu_enabled = True\n",
" )\n",
"\n",
"image = ContainerImage.create(name = \"GpuImage\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Provision the AKS Cluster\n",
"This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use the default configuration (can also provide parameters to customize)\n",
"prov_config = AksCompute.provisioning_configuration(vm_size=\"Standard_NC6\")\n",
"\n",
"aks_name = 'my-aks-9' \n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create AKS Cluster in an existing virtual network (optional)\n",
"See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-enable-virtual-network#use-azure-kubernetes-service) for more details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"'''\n",
"from azureml.core.compute import ComputeTarget, AksCompute\n",
"\n",
"# Create the compute configuration and set virtual network information\n",
"config = AksCompute.provisioning_configuration(vm_size=\"Standard_NC6\", location=\"eastus2\")\n",
"config.vnet_resourcegroup_name = \"mygroup\"\n",
"config.vnet_name = \"mynetwork\"\n",
"config.subnet_name = \"default\"\n",
"config.service_cidr = \"10.0.0.0/16\"\n",
"config.dns_service_ip = \"10.0.0.10\"\n",
"config.docker_bridge_cidr = \"172.17.0.1/16\"\n",
"\n",
"# Create the compute target\n",
"aks_target = ComputeTarget.create(workspace = ws,\n",
" name = \"myaks\",\n",
" provisioning_configuration = config)\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Enable SSL on the AKS Cluster (optional)\n",
"See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-secure-web-service) for more details"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# provisioning_config = AksCompute.provisioning_configuration(ssl_cert_pem_file=\"cert.pem\", ssl_key_pem_file=\"key.pem\", ssl_cname=\"www.contoso.com\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_target.wait_for_completion(show_output = True)\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Optional step: Attach existing AKS cluster\n",
"\n",
"If you have existing AKS cluster in your Azure subscription, you can attach it to the Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"'''\n",
"# Use the default configuration (can also provide parameters to customize)\n",
"resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n",
"\n",
"create_name='my-existing-aks' \n",
"# Create the cluster\n",
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
"aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n",
"# Wait for the operation to complete\n",
"aks_target.wait_for_completion(True)\n",
"'''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy web service to AKS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Set the web service configuration (using default here)\n",
"aks_config = AksWebservice.deploy_configuration()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_service_name ='aks-service-1'\n",
"\n",
"aks_service = Webservice.deploy_from_image(workspace = ws, \n",
" name = aks_service_name,\n",
" image = image,\n",
" deployment_config = aks_config,\n",
" deployment_target = aks_target)\n",
"aks_service.wait_for_deployment(show_output = True)\n",
"print(aks_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Test the web service\n",
"We test the web sevice by passing the test images content."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"import requests\n",
"key1, key2 = aks_service.get_keys()\n",
"\n",
"headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
"test_sampe = open('test_image.jpg', 'rb').read()\n",
"resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clean up\n",
"Delete the service, image, model and compute target"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"aks_service.delete()\n",
"image.delete()\n",
"model.delete()\n",
"aks_target.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "aashishb"
}
],
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -34,7 +41,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites\n", "## Prerequisites\n",
"Make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
] ]
}, },
{ {

View File

@@ -2,10 +2,7 @@
Follow these sample notebooks to learn: Follow these sample notebooks to learn:
1. [Explain tabular data](explain-tabular-data): Basic example of explaining model trained on tabular data. 1. [Explain tabular data locally](explain-tabular-data-local): Basic example of explaining model trained on tabular data.
2. [Explain local classification](explain-local-sklearn-classification): Explain a scikit-learn classification model.
3. [Explain local regression](explain-local-sklearn-regression): Explain a scikit-learn regression model.
4. [Explain on remote AMLCompute](explain-on-amlcompute): Explain a model on a remote AMLCompute target. 4. [Explain on remote AMLCompute](explain-on-amlcompute): Explain a model on a remote AMLCompute target.
5. [Explain classification using Run History](explain-run-history-sklearn-classification): Explain a scikit-learn classification model with Run History. 5. [Explain tabular data with Run History](explain-tabular-data-run-history): Explain a model with Run History.
6. [Explain regression using Run History](explain-run-history-sklearn-regression): Explain a scikit-learn regression model with Run History. 7. [Explain raw features](explain-tabular-data-raw-features): Explain the raw features of a trained model.
7. [Explain scikit-learn raw features](explain-sklearn-raw-features): Explain the raw features of a trained scikit-learn model.

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-on-amlcompute/regression-sklearn-on-amlcompute.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -32,7 +39,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites\n", "## Prerequisites\n",
"Make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't." "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't."
] ]
}, },
{ {
@@ -196,9 +203,6 @@
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", "# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n", "run_config.environment.python.user_managed_dependencies = False\n",
"\n", "\n",
"# auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
"run_config.auto_prepare_environment = True\n",
"\n",
"azureml_pip_packages = [\n", "azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n", " 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model'\n", " 'azureml-explain-model'\n",
@@ -576,7 +580,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "wamartin" "name": "mesameki"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -1,221 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summary\n",
"From raw data that is a mixture of categoricals and numeric, featurize the categoricals using one hot encoding. Use tabular explainer to get explain object and then get raw feature importances"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load titanic dataset. Impute missing values by filling both backward and forward since some data is at the first/last row. This is just for illustration and not a recommended way to impute missing data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"titanic_url = ('https://raw.githubusercontent.com/amueller/'\n",
" 'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')\n",
"data = pd.read_csv(titanic_url)\n",
"# fill missing values\n",
"data = data.fillna(method=\"ffill\")\n",
"data = data.fillna(method=\"bfill\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similar to example [here](https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py), use a subset of columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"numeric_features = ['age', 'fare']\n",
"categorical_features = ['embarked', 'sex', 'pclass']\n",
"\n",
"y = data['survived'].values\n",
"X = data[categorical_features + numeric_features]\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One hot encode the categorical features"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import OneHotEncoder\n",
"one_enc = OneHotEncoder()\n",
"one_enc.fit(X_train[categorical_features])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Columnwise concatenate one hot encoded categoricals and numerical features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from scipy import sparse\n",
"def get_feats(X):\n",
" a = one_enc.transform(X[categorical_features])\n",
" b = X[numeric_features]\n",
" return sparse.hstack((one_enc.transform(X[categorical_features]), X[numeric_features].values))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Train a logistic regression model on featurized training data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"X_train_transformed = get_feats(X_train)\n",
"X_test_transformed = get_feats(X_test)\n",
"\n",
"clf = LogisticRegression(solver='lbfgs', max_iter=200)\n",
"clf.fit(X_train_transformed, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get feature mapping between raw and generated features. Using the order in which features are concatenated in `get_feats` and using `categories_` in `OneHotEncoder` we are able to compute this mapping."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"raw_feat_mapping = []\n",
"start_index = 0\n",
"for cat_list in one_enc.categories_:\n",
" raw_feat_mapping.append([start_index + i for i in range(len(cat_list))])\n",
" start_index += len(cat_list)\n",
"for i in range(len(numeric_features)):\n",
" raw_feat_mapping.append([start_index])\n",
" start_index += 1 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
"\n",
"explainer = TabularExplainer(clf, X_train_transformed)\n",
"global_explanation = explainer.explain_global(X_test_transformed)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"raw_feat_imps = global_explanation.get_raw_feature_importances(raw_feat_mapping)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"feature_names = categorical_features + numeric_features\n",
"sorted_indices = np.argsort(raw_feat_imps)[::-1]\n",
"\n",
"for i in sorted_indices:\n",
" print(\"{}: {}\".format(feature_names[i], raw_feat_imps[i]))"
]
}
],
"metadata": {
"authors": [
{
"name": "hichando"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -7,6 +7,13 @@
"# Breast cancer diagnosis classification with scikit-learn (run model explainer locally)" "# Breast cancer diagnosis classification with scikit-learn (run model explainer locally)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-binary-classification.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -24,7 +31,8 @@
"\n", "\n",
"1. Train a SVM classification model using Scikit-learn\n", "1. Train a SVM classification model using Scikit-learn\n",
"2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n", "2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n",
"3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services" "3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services\n",
"4. Visualize the global and local explanations with the visualization dashboard."
] ]
}, },
{ {
@@ -181,7 +189,9 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"local_explanation = tabular_explainer.explain_local(x_test[0,:])" "# explain the first member of the test set\n",
"instance_num = 0\n",
"local_explanation = tabular_explainer.explain_local(x_test[instance_num,:])"
] ]
}, },
{ {
@@ -190,9 +200,21 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# local feature importance information\n", "# get the prediction for the first member of the test set and explain why model made that prediction\n",
"local_importance_values = local_explanation.local_importance_values\n", "prediction_value = clf.predict(x_test)[instance_num]\n",
"print('local importance for first instance: {}'.format(local_importance_values[y_test[0]]))" "\n",
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
"\n",
"\n",
"dict(zip(sorted_local_importance_names, sorted_local_importance_values))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Load visualization dashboard"
] ]
}, },
{ {
@@ -201,7 +223,12 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"print('local importance feature names: {}'.format(list(local_explanation.features)))" "# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"# Or, in Jupyter Labs, uncomment below\n",
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"# jupyter labextension install microsoft-mli-widget"
] ]
}, },
{ {
@@ -210,14 +237,23 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"dict(zip(local_explanation.features, local_explanation.local_importance_values[y_test[0]]))" "from azureml.contrib.explain.model.visualize import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, model, x_test)"
] ]
} }
], ],
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "wamartin" "name": "mesameki"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -0,0 +1,280 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Iris flower classification with scikit-learn (run model explainer locally)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-multiclass-classification.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explain a model with the AML explain-model package\n",
"\n",
"1. Train a SVM classification model using Scikit-learn\n",
"2. Run 'explain_model' with full data in local mode, which doesn't contact any Azure services\n",
"3. Run 'explain_model' with summarized data in local mode, which doesn't contact any Azure services\n",
"4. Visualize the global and local explanations with the visualization dashboard."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_iris\n",
"from sklearn import svm\n",
"from azureml.explain.model.tabular_explainer import TabularExplainer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Run model explainer locally with full data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load the breast cancer diagnosis data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iris = load_iris()\n",
"X = iris['data']\n",
"y = iris['target']\n",
"classes = iris['target_names']\n",
"feature_names = iris['feature_names']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Split data into train and test\n",
"from sklearn.model_selection import train_test_split\n",
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train a SVM classification model, which you want to explain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
"model = clf.fit(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explain predictions on your local machine"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tabular_explainer = TabularExplainer(model, x_train, features = feature_names, classes=classes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explain overall model predictions (global explanation)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"global_explanation = tabular_explainer.explain_global(x_test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Sorted SHAP values\n",
"print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
"# Corresponding feature names\n",
"print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
"# feature ranks (based on original order of features)\n",
"print('global importance rank: {}'.format(global_explanation.global_importance_rank))\n",
"# per class feature names\n",
"print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))\n",
"# per class feature importance values\n",
"print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explain overall model predictions as a collection of local (instance-level) explanations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# feature shap values for all features and all data points in the training data\n",
"print('local importance values: {}'.format(global_explanation.local_importance_values))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explain local data points (individual instances)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# explain the first member of the test set\n",
"instance_num = 0\n",
"local_explanation = tabular_explainer.explain_local(x_test[instance_num,:])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get the prediction for the first member of the test set and explain why model made that prediction\n",
"prediction_value = clf.predict(x_test)[instance_num]\n",
"\n",
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
"\n",
"\n",
"dict(zip(sorted_local_importance_names, sorted_local_importance_values))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"# Or, in Jupyter Labs, uncomment below\n",
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"# jupyter labextension install microsoft-mli-widget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, model, x_test)"
]
}
],
"metadata": {
"authors": [
{
"name": "mesameki"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -7,6 +7,13 @@
"# Boston Housing Price Prediction with scikit-learn (run model explainer locally)" "# Boston Housing Price Prediction with scikit-learn (run model explainer locally)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -24,7 +31,8 @@
"\n", "\n",
"1. Train a GradientBoosting regression model using Scikit-learn\n", "1. Train a GradientBoosting regression model using Scikit-learn\n",
"2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n", "2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n",
"3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services." "3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n",
"4. Visualize the global and local explanations with the visualization dashboard."
] ]
}, },
{ {
@@ -85,10 +93,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"clf = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n", "reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n",
" learning_rate=0.1, loss='huber',\n", " learning_rate=0.1, loss='huber',\n",
" random_state=1)\n", " random_state=1)\n",
"model = clf.fit(x_train, y_train)" "model = reg.fit(x_train, y_train)"
] ]
}, },
{ {
@@ -125,15 +133,6 @@
"global_explanation = tabular_explainer.explain_global(x_test)" "global_explanation = tabular_explainer.explain_global(x_test)"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"help(global_explanation)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -196,16 +195,58 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# local feature importance information\n", "# sorted local feature importance information; reflects the original feature order\n",
"local_importance_values = local_explanation.local_importance_values\n", "sorted_local_importance_names = local_explanation.get_ranked_local_names()\n",
"print('local importance values: {}'.format(local_importance_values))" "sorted_local_importance_values = local_explanation.get_ranked_local_values()\n",
"\n",
"print('sorted local importance names: {}'.format(sorted_local_importance_names))\n",
"print('sorted local importance values: {}'.format(sorted_local_importance_values))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"# Or, in Jupyter Labs, uncomment below\n",
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"# jupyter labextension install microsoft-mli-widget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, model, x_test)"
] ]
} }
], ],
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "wamartin" "name": "mesameki"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -0,0 +1,302 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summary\n",
"From raw data that is a mixture of categoricals and numeric, featurize the categoricals using one hot encoding. Use tabular explainer to get explain object and then get raw feature importances"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explain a model with the AML explain-model package on raw features\n",
"\n",
"1. Train a Logistic Regression model using Scikit-learn\n",
"2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n",
"3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n",
"4. Visualize the global and local explanations with the visualization dashboard."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example needs sklearn-pandas. If it is not installed, uncomment and run the following line."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install sklearn-pandas"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.pipeline import Pipeline\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.linear_model import LogisticRegression\n",
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
"from sklearn_pandas import DataFrameMapper\n",
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"titanic_url = ('https://raw.githubusercontent.com/amueller/'\n",
" 'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')\n",
"data = pd.read_csv(titanic_url)\n",
"# fill missing values\n",
"data = data.fillna(method=\"ffill\")\n",
"data = data.fillna(method=\"bfill\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Run model explainer locally with full data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similar to example [here](https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py), use a subset of columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"numeric_features = ['age', 'fare']\n",
"categorical_features = ['embarked', 'sex', 'pclass']\n",
"\n",
"y = data['survived'].values\n",
"X = data[categorical_features + numeric_features]\n",
"\n",
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.pipeline import Pipeline\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn_pandas import DataFrameMapper\n",
"\n",
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
"\n",
"transformations = [\n",
" ([\"age\", \"fare\"], Pipeline(steps=[\n",
" ('imputer', SimpleImputer(strategy='median')),\n",
" ('scaler', StandardScaler())\n",
" ])),\n",
" ([\"embarked\"], Pipeline(steps=[\n",
" (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n",
" (\"encoder\", OneHotEncoder(sparse=False))])),\n",
" ([\"sex\", \"pclass\"], OneHotEncoder(sparse=False)) \n",
"]\n",
"\n",
"\n",
"# Append classifier to preprocessing pipeline.\n",
"# Now we have a full prediction pipeline.\n",
"clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n",
" ('classifier', LogisticRegression(solver='lbfgs'))])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train a Logistic Regression model, which you want to explain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = clf.fit(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explain predictions on your local machine"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tabular_explainer = TabularExplainer(clf.steps[-1][1], initialization_examples=x_train, features=x_train.columns, transformations=transformations)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
"global_explanation = tabular_explainer.explain_global(x_test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sorted_global_importance_values = global_explanation.get_ranked_global_values()\n",
"sorted_global_importance_names = global_explanation.get_ranked_global_names()\n",
"dict(zip(sorted_global_importance_names, sorted_global_importance_values))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explain overall model predictions as a collection of local (instance-level) explanations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# explain the first member of the test set\n",
"local_explanation = tabular_explainer.explain_local(x_test[:1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get the prediction for the first member of the test set and explain why model made that prediction\n",
"prediction_value = clf.predict(x_test)[0]\n",
"\n",
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]\n",
"\n",
"# Sorted local SHAP values\n",
"print('ranked local importance values: {}'.format(sorted_local_importance_values))\n",
"# Corresponding feature names\n",
"print('ranked local importance names: {}'.format(sorted_local_importance_names))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Load visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"# Or, in Jupyter Labs, uncomment below\n",
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"# jupyter labextension install microsoft-mli-widget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, model, x_test)"
]
}
],
"metadata": {
"authors": [
{
"name": "mesameki"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -7,6 +7,13 @@
"# Breast cancer diagnosis classification with scikit-learn (save model explanations via AML Run History)" "# Breast cancer diagnosis classification with scikit-learn (save model explanations via AML Run History)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-classification.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -229,7 +236,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "wamartin" "name": "mesameki"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -7,6 +7,13 @@
"# Boston Housing Price Prediction with scikit-learn (save model explanations via AML Run History)" "# Boston Housing Price Prediction with scikit-learn (save model explanations via AML Run History)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-run-history/explain-run-history-sklearn-regression.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -243,7 +250,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "wamartin" "name": "mesameki"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -1,267 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Uncomment these if explanation packages are not already installed in your environment\n",
"#!pip install --upgrade azureml-sdk[explain]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explain a model with the AML explain-model package\n",
"\n",
"1. Train a SVM model using Scikit-learn\n",
"2. Run 'explain_model' in local mode, which doesn't contact any Azure services\n",
"3. Run 'explain_model' with AML Run History, which leverages Run History Service to store and manage the explanation data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Disclaimer: this notebook is a preview of model explainability, and the APIs shown below are subject to breaking changes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train a SVM model, which we will try to explain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import Iris dataset\n",
"from sklearn import datasets\n",
"iris = datasets.load_iris()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Split data into train and test\n",
"from sklearn.model_selection import train_test_split\n",
"x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import scikit learn, fit a SVM model\n",
"def create_scikit_learn_model(X, y):\n",
" from sklearn import svm\n",
" clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
" model = clf.fit(X, y)\n",
" return model\n",
"model = create_scikit_learn_model(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run model explainer locally"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.tabular_explainer import TabularExplainer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"start = time.time()\n",
"\n",
"explainer = TabularExplainer(model, x_train, features=iris.feature_names)\n",
"global_explanation = explainer.explain_global(x_test)\n",
"\n",
"# importance values for each class, test example, and feature (local importance)\n",
"local_imp_values = global_explanation.local_importance_values\n",
"# base prediction with feature importances ignored\n",
"expected_values = global_explanation.expected_values\n",
"# global feature importance information\n",
"global_imp_values = global_explanation.global_importance_values\n",
"ranked_global_imp_names = global_explanation.get_ranked_global_names()\n",
"# global per-class feature importance information\n",
"per_class_imp_values = global_explanation.per_class_values\n",
"ranked_per_class_imp_names = global_explanation.get_ranked_per_class_names()\n",
"\n",
"end = time.time()\n",
"print(end - start)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run model explainer with AML Run History"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"from azureml.core import Workspace, Experiment, Run\n",
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = 'explain_model'\n",
"experiment = Experiment(ws, experiment_name)\n",
"run = experiment.start_logging()\n",
"client = ExplanationClient.from_run(run)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"start = time.time()\n",
"explainer = TabularExplainer(model, x_train, features=iris.feature_names, classes=iris.target_names)\n",
"explanation = explainer.explain_global(x_test)\n",
"client.upload_model_explanation(explanation)\n",
"end = time.time()\n",
"print(end - start)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"explanation_from_run = client.download_model_explanation()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# global feature importance information\n",
"global_imp_values = explanation_from_run.global_importance_values\n",
"global_imp_names = explanation_from_run.get_ranked_global_names()\n",
"# global per-class feature importance information\n",
"per_class_imp_values = explanation_from_run.per_class_values\n",
"per_class_imp_names = explanation_from_run.get_ranked_per_class_names()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## This visualization is unsupported, and is not guaranteed to work in the future"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the shap values and explore locally\n",
"import shap\n",
"import numpy as np\n",
"shap.initjs()\n",
"display(shap.force_plot(explanation_from_run.expected_values[1], np.asarray(explanation_from_run.local_importance_values[1]), x_test))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.complete()"
]
}
],
"metadata": {
"authors": [
{
"name": "wamartin"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -37,20 +37,12 @@ Azure Machine Learning Pipelines optimize for simplicity, speed, and efficiency.
In this directory, there are two types of notebooks: In this directory, there are two types of notebooks:
* The first type of notebooks will introduce you to core Azure Machine Learning Pipelines features. These notebooks below belong in this category, and are designed to go in sequence; they're all located in the "intro-to-pipelines" folder: * The first type of notebooks will introduce you to core Azure Machine Learning Pipelines features. These notebooks below belong in this category, and are designed to go in sequence; they're all located in the "intro-to-pipelines" folder:
Take a look at [intro-to-pipelines](./intro-to-pipelines/) for the list of notebooks that introduce Azure Machine Learning concepts for you.
1. [aml-pipelines-getting-started.ipynb](https://aka.ms/pl-get-started): Start with this notebook to understand the concepts of using Azure Machine Learning Pipelines. This notebook will show you how to runs steps in parallel and in sequence.
2. [aml-pipelines-with-data-dependency-steps.ipynb](https://aka.ms/pl-data-dep): This notebooks shows how to connect steps in your pipeline using data. Data produced by one step is used by subsequent steps to force an explicit dependency between steps.
3. [aml-pipelines-publish-and-run-using-rest-endpoint.ipynb](https://aka.ms/pl-pub-rep): Once you are satisfied with your iterative runs in, you could publish your pipeline to get a REST endpoint which could be invoked from non-Pythons clients as well.
4. [aml-pipelines-data-transfer.ipynb](https://aka.ms/pl-data-trans): This notebook shows how you transfer data between supported datastores.
5. [aml-pipelines-use-databricks-as-compute-target.ipynb](https://aka.ms/pl-databricks): This notebooks shows how you can use Pipelines to send your compute payload to Azure Databricks.
6. [aml-pipelines-use-adla-as-compute-target.ipynb](https://aka.ms/pl-adla): This notebook shows how you can use Azure Data Lake Analytics (ADLA) as a compute target.
7. [aml-pipelines-how-to-use-estimatorstep.ipynb](https://aka.ms/pl-estimator): This notebook shows how to use the EstimatorStep.
7. [aml-pipelines-parameter-tuning-with-hyperdrive.ipynb](https://aka.ms/pl-hyperdrive): HyperDriveStep in Pipelines shows how you can do hyper parameter tuning using Pipelines.
8. [aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb](https://aka.ms/pl-azbatch): AzureBatchStep can be used to run your custom code in AzureBatch cluster.
9. [aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb](https://aka.ms/pl-schedule): Once you publish a Pipeline, you can schedule it to trigger based on an interval or on data change in a defined datastore.
10. [aml-pipelines-with-automated-machine-learning-step.ipynb](https://aka.ms/pl-automl): AutoMLStep in Pipelines shows how you can do automated machine learning using Pipelines.
* The second type of notebooks illustrate more sophisticated scenarios, and are independent of each other. These notebooks include: * The second type of notebooks illustrate more sophisticated scenarios, and are independent of each other. These notebooks include:
1. [pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score): This notebook demonstrates how to run a batch scoring job using Azure Machine Learning pipelines. 1. [pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score): This notebook demonstrates how to run a batch scoring job using Azure Machine Learning pipelines.
2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans) 2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute.
3. [nyc-taxi-data-regression-model-building.ipynb](https://aka.ms/pl-nyctaxi-tutorial): This notebook is an AzureML Pipelines version of the previously published two part sample.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/README.png)

View File

@@ -13,3 +13,6 @@ These notebooks below are designed to go in sequence.
8. [aml-pipelines-parameter-tuning-with-hyperdrive.ipynb](https://aka.ms/pl-hyperdrive): HyperDriveStep in Pipelines shows how you can do hyper parameter tuning using Pipelines. 8. [aml-pipelines-parameter-tuning-with-hyperdrive.ipynb](https://aka.ms/pl-hyperdrive): HyperDriveStep in Pipelines shows how you can do hyper parameter tuning using Pipelines.
9. [aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb](https://aka.ms/pl-azbatch): AzureBatchStep can be used to run your custom code in AzureBatch cluster. 9. [aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb](https://aka.ms/pl-azbatch): AzureBatchStep can be used to run your custom code in AzureBatch cluster.
10. [aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb](https://aka.ms/pl-schedule): Once you publish a Pipeline, you can schedule it to trigger based on an interval or on data change in a defined datastore. 10. [aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb](https://aka.ms/pl-schedule): Once you publish a Pipeline, you can schedule it to trigger based on an interval or on data change in a defined datastore.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.png)

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -53,7 +60,7 @@
"source": [ "source": [
"## Initialize Workspace\n", "## Initialize Workspace\n",
"\n", "\n",
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n", "Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json\n",
"\n", "\n",
"If you don't have a config.json file, please go through the configuration Notebook located here:\n", "If you don't have a config.json file, please go through the configuration Notebook located here:\n",
"https://github.com/Azure/MachineLearningNotebooks. \n", "https://github.com/Azure/MachineLearningNotebooks. \n",

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -37,7 +44,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites and Azure Machine Learning Basics\n", "## Prerequisites and Azure Machine Learning Basics\n",
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n" "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n"
] ]
}, },
{ {
@@ -58,8 +65,6 @@
"import os\n", "import os\n",
"import azureml.core\n", "import azureml.core\n",
"from azureml.core import Workspace, Experiment, Datastore\n", "from azureml.core import Workspace, Experiment, Datastore\n",
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"from azureml.widgets import RunDetails\n", "from azureml.widgets import RunDetails\n",
"\n", "\n",
"# Check core SDK version number\n", "# Check core SDK version number\n",
@@ -109,36 +114,20 @@
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n", "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
"\n", "\n",
"# Default datastore (Azure file storage)\n", "# Default datastore\n",
"def_file_store = ws.get_default_datastore() \n", "def_blob_store = ws.get_default_datastore() \n",
"# The above call is equivalent to Datastore(ws, \"workspacefilestore\") or simply Datastore(ws)\n",
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
"\n",
"# Blob storage associated with the workspace\n",
"# The following call GETS the Azure Blob Store associated with your workspace.\n", "# The following call GETS the Azure Blob Store associated with your workspace.\n",
"# Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is** \n", "# Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is** \n",
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n", "def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
"print(\"Blobstore's name: {}\".format(def_blob_store.name))" "print(\"Blobstore's name: {}\".format(def_blob_store.name))"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# project folder\n",
"project_folder = '.'\n",
" \n",
"print('Sample projects will be created in {}.'.format(os.path.realpath(project_folder)))"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Required data and script files for the the tutorial\n", "### Required data and script files for the the tutorial\n",
"Sample files required to finish this tutorial are already copied to the project folder specified above. Even though the .py provided in the samples don't have much \"ML work,\" as a data scientist, you will work on this extensively as part of your work. To complete this tutorial, the contents of these files are not very important. The one-line files are for demostration purpose only." "Sample files required to finish this tutorial are already copied to the corresponding source_directory locations. Even though the .py provided in the samples don't have much \"ML work,\" as a data scientist, you will work on this extensively as part of your work. To complete this tutorial, the contents of these files are not very important. The one-line files are for demostration purpose only."
] ]
}, },
{ {
@@ -146,7 +135,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Datastore concepts\n", "### Datastore concepts\n",
"A [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target. \n", "A [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target. \n",
"\n", "\n",
"A Datastore can either be backed by an Azure File Storage (default) or by an Azure Blob Storage.\n", "A Datastore can either be backed by an Azure File Storage (default) or by an Azure Blob Storage.\n",
"\n", "\n",
@@ -169,19 +158,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# get_default_datastore() gets the default Azure File Store associated with your workspace.\n", "# get_default_datastore() gets the default Azure Blob Store associated with your workspace.\n",
"# Here we are reusing the def_file_store object we obtained earlier\n", "# Here we are reusing the def_blob_store object we obtained earlier\n",
"\n",
"# target_path is the directory at the destination\n",
"def_file_store.upload_files(['./20news.pkl'], \n",
" target_path = '20newsgroups', \n",
" overwrite = True, \n",
" show_progress = True)\n",
"\n",
"# Here we are reusing the def_blob_store we created earlier\n",
"def_blob_store.upload_files([\"./20news.pkl\"], target_path=\"20newsgroups\", overwrite=True)\n", "def_blob_store.upload_files([\"./20news.pkl\"], target_path=\"20newsgroups\", overwrite=True)\n",
"\n", "print(\"Upload call completed\")"
"print(\"Upload calls completed\")"
] ]
}, },
{ {
@@ -189,7 +169,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### (Optional) See your files using Azure Portal\n", "#### (Optional) See your files using Azure Portal\n",
"Once you successfully uploaded the files, you can browse to them (or upload more files) using [Azure Portal](https://portal.azure.com). At the portal, make sure you have selected **AzureML Nursery** as your subscription (click *Resource Groups* and then select the subscription). Then look for your **Machine Learning Workspace** (it has your *alias* as the name). It has a link to your storage. Click on the storage link. It will take you to a page where you can see [Blobs](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction), [Files](https://docs.microsoft.com/en-us/azure/storage/files/storage-files-introduction), [Tables](https://docs.microsoft.com/en-us/azure/storage/tables/table-storage-overview), and [Queues](https://docs.microsoft.com/en-us/azure/storage/queues/storage-queues-introduction). We have just uploaded a file to the Blob storage and another one to the File storage. You should be able to see both of these files in their respective locations. " "Once you successfully uploaded the files, you can browse to them (or upload more files) using [Azure Portal](https://portal.azure.com). At the portal, make sure you have selected your subscription (click *Resource Groups* and then select the subscription). Then look for your **Machine Learning Workspace** name. It has a link to your storage. Click on the storage link. It will take you to a page where you can see [Blobs](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction), [Files](https://docs.microsoft.com/en-us/azure/storage/files/storage-files-introduction), [Tables](https://docs.microsoft.com/en-us/azure/storage/tables/table-storage-overview), and [Queues](https://docs.microsoft.com/en-us/azure/storage/queues/storage-queues-introduction). We have uploaded a file each to the Blob storage and to the File storage in the above step. You should be able to see both of these files in their respective locations. "
] ]
}, },
{ {
@@ -243,9 +223,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"aml_compute_target = \"aml-compute\"\n", "aml_compute_target = \"cpu-cluster\"\n",
"try:\n", "try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n", " aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n", " print(\"found existing compute target.\")\n",
@@ -295,15 +276,20 @@
"## Creating a Step in a Pipeline\n", "## Creating a Step in a Pipeline\n",
"A Step is a unit of execution. Step typically needs a target of execution (compute target), a script to execute, and may require script arguments and inputs, and can produce outputs. The step also could take a number of other parameters. Azure Machine Learning Pipelines provides the following built-in Steps:\n", "A Step is a unit of execution. Step typically needs a target of execution (compute target), a script to execute, and may require script arguments and inputs, and can produce outputs. The step also could take a number of other parameters. Azure Machine Learning Pipelines provides the following built-in Steps:\n",
"\n", "\n",
"- [**PythonScriptStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py): Add a step to run a Python script in a Pipeline.\n", "- [**PythonScriptStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py): Adds a step to run a Python script in a Pipeline.\n",
"- [**AdlaStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.adla_step.adlastep?view=azure-ml-py): Adds a step to run U-SQL script using Azure Data Lake Analytics.\n", "- [**AdlaStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.adla_step.adlastep?view=azure-ml-py): Adds a step to run U-SQL script using Azure Data Lake Analytics.\n",
"- [**DataTransferStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.data_transfer_step.datatransferstep?view=azure-ml-py): Transfers data between Azure Blob and Data Lake accounts.\n", "- [**DataTransferStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.data_transfer_step.datatransferstep?view=azure-ml-py): Transfers data between Azure Blob and Data Lake accounts.\n",
"- [**DatabricksStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py): Adds a DataBricks notebook as a step in a Pipeline.\n", "- [**DatabricksStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py): Adds a DataBricks notebook as a step in a Pipeline.\n",
"- [**HyperDriveStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.hyper_drive_step.hyperdrivestep?view=azure-ml-py): Creates a Hyper Drive step for Hyper Parameter Tuning in a Pipeline.\n", "- [**HyperDriveStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.hyper_drive_step.hyperdrivestep?view=azure-ml-py): Creates a Hyper Drive step for Hyper Parameter Tuning in a Pipeline.\n",
"- [**AzureBatchStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.azurebatch_step.azurebatchstep?view=azure-ml-py): Creates a step for submitting jobs to Azure Batch\n",
"- [**EstimatorStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.estimator_step.estimatorstep?view=azure-ml-py): Adds a step to run Estimator in a Pipeline.\n",
"- [**MpiStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.mpi_step.mpistep?view=azure-ml-py): Adds a step to run a MPI job in a Pipeline.\n",
"- [**AutoMLStep**](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py): Creates a AutoML step in a Pipeline.\n",
"\n", "\n",
"The following code will create a PythonScriptStep to be executed in the Azure Machine Learning Compute we created above using train.py, one of the files already made available in the project folder.\n", "The following code will create a PythonScriptStep to be executed in the Azure Machine Learning Compute we created above using train.py, one of the files already made available in the `source_directory`.\n",
"\n", "\n",
"A **PythonScriptStep** is a basic, built-in step to run a Python Script on a compute target. It takes a script name and optionally other parameters like arguments for the script, compute target, inputs and outputs. If no compute target is specified, default compute target for the workspace is used. You can also use a [**RunConfiguration**](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.runconfiguration?view=azure-ml-py) to specify requirements for the PythonScriptStep, such as conda dependencies and docker image." "A **PythonScriptStep** is a basic, built-in step to run a Python Script on a compute target. It takes a script name and optionally other parameters like arguments for the script, compute target, inputs and outputs. If no compute target is specified, default compute target for the workspace is used. You can also use a [**RunConfiguration**](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.runconfiguration?view=azure-ml-py) to specify requirements for the PythonScriptStep, such as conda dependencies and docker image.\n",
"> The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
] ]
}, },
{ {
@@ -314,6 +300,9 @@
"source": [ "source": [
"# Uses default values for PythonScriptStep construct.\n", "# Uses default values for PythonScriptStep construct.\n",
"\n", "\n",
"source_directory = './train'\n",
"print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))\n",
"\n",
"# Syntax\n", "# Syntax\n",
"# PythonScriptStep(\n", "# PythonScriptStep(\n",
"# script_name, \n", "# script_name, \n",
@@ -332,7 +321,7 @@
"step1 = PythonScriptStep(name=\"train_step\",\n", "step1 = PythonScriptStep(name=\"train_step\",\n",
" script_name=\"train.py\", \n", " script_name=\"train.py\", \n",
" compute_target=aml_compute, \n", " compute_target=aml_compute, \n",
" source_directory=project_folder,\n", " source_directory=source_directory,\n",
" allow_reuse=True)\n", " allow_reuse=True)\n",
"print(\"Step1 created\")" "print(\"Step1 created\")"
] ]
@@ -362,12 +351,15 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# All steps use files already available in the project_folder\n", "# For this step, we use a different source_directory\n",
"source_directory = './compare'\n",
"print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))\n",
"\n",
"# All steps use the same Azure Machine Learning compute target as well\n", "# All steps use the same Azure Machine Learning compute target as well\n",
"step2 = PythonScriptStep(name=\"compare_step\",\n", "step2 = PythonScriptStep(name=\"compare_step\",\n",
" script_name=\"compare.py\", \n", " script_name=\"compare.py\", \n",
" compute_target=aml_compute, \n", " compute_target=aml_compute, \n",
" source_directory=project_folder)\n", " source_directory=source_directory)\n",
"\n", "\n",
"# Use a RunConfiguration to specify some additional requirements for this step.\n", "# Use a RunConfiguration to specify some additional requirements for this step.\n",
"from azureml.core.runconfig import RunConfiguration\n", "from azureml.core.runconfig import RunConfiguration\n",
@@ -386,16 +378,17 @@
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", "# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n", "run_config.environment.python.user_managed_dependencies = False\n",
"\n", "\n",
"# auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
"run_config.auto_prepare_environment = True\n",
"\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])\n", "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
"\n", "\n",
"# For this step, we use yet another source_directory\n",
"source_directory = './extract'\n",
"print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))\n",
"\n",
"step3 = PythonScriptStep(name=\"extract_step\",\n", "step3 = PythonScriptStep(name=\"extract_step\",\n",
" script_name=\"extract.py\", \n", " script_name=\"extract.py\", \n",
" compute_target=aml_compute, \n", " compute_target=aml_compute, \n",
" source_directory=project_folder,\n", " source_directory=source_directory,\n",
" runconfig=run_config)\n", " runconfig=run_config)\n",
"\n", "\n",
"# list of steps to run\n", "# list of steps to run\n",
@@ -467,8 +460,8 @@
"source": [ "source": [
"# Submit syntax\n", "# Submit syntax\n",
"# submit(experiment_name, \n", "# submit(experiment_name, \n",
"# pipeline_parameters=None, \n", "# pipeline_params=None, \n",
"# continue_on_node_failure=False, \n", "# continue_on_step_failure=False, \n",
"# regenerate_outputs=False)\n", "# regenerate_outputs=False)\n",
"\n", "\n",
"pipeline_run1 = Experiment(ws, 'Hello_World1').submit(pipeline1, regenerate_outputs=False)\n", "pipeline_run1 = Experiment(ws, 'Hello_World1').submit(pipeline1, regenerate_outputs=False)\n",
@@ -611,7 +604,7 @@
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "diray" "name": "sanpil"
} }
], ],
"kernelspec": { "kernelspec": {

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -67,7 +74,7 @@
"source": [ "source": [
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n", "Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n",
"\n", "\n",
"If you don't have a config.json file, please go through the configuration Notebook located [here](https://github.com/Azure/MachineLearningNotebooks). \n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, If you don't have a config.json file, please go through the configuration Notebook located [here](https://github.com/Azure/MachineLearningNotebooks). \n",
"\n", "\n",
"This sets you up with a working config file that has information on your workspace, subscription id, etc. " "This sets you up with a working config file that has information on your workspace, subscription id, etc. "
] ]
@@ -189,8 +196,8 @@
"\n", "\n",
"\n", "\n",
"def upload_file_to_datastore(datastore, file_name, content):\n", "def upload_file_to_datastore(datastore, file_name, content):\n",
" dir = create_local_file(content=content, file_name=file_name)\n", " src_dir = create_local_file(content=content, file_name=file_name)\n",
" datastore.upload(src_dir=dir, overwrite=True, show_progress=True)" " datastore.upload(src_dir=src_dir, overwrite=True, show_progress=True)"
] ]
}, },
{ {
@@ -245,7 +252,7 @@
"\n", "\n",
"file_name=\"azurebatch.cmd\"\n", "file_name=\"azurebatch.cmd\"\n",
"with open(path.join(binaries_folder, file_name), 'w') as f:\n", "with open(path.join(binaries_folder, file_name), 'w') as f:\n",
" f.write(\"copy \\\"%1\\\" \\\"%2\\\"\")" " f.write(\"copy \\\"%1\\\" \\\"%2\\\"\")"
] ]
}, },
{ {

View File

@@ -9,6 +9,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -20,7 +27,7 @@
"\n", "\n",
"## Prerequisite:\n", "## Prerequisite:\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* Go through the [configuration notebook](../../../configuration.ipynb) to:\n", "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n",
" * install the AML SDK\n", " * install the AML SDK\n",
" * create a workspace and its configuration file (`config.json`)" " * create a workspace and its configuration file (`config.json`)"
] ]
@@ -93,7 +100,7 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# choose a name for your cluster\n", "# choose a name for your cluster\n",
"cluster_name = \"cpucluster\"\n", "cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"try:\n", "try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n", " cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
@@ -117,7 +124,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpucluster' of type `AmlCompute`." "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpu-cluster' of type `AmlCompute`."
] ]
}, },
{ {

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -15,9 +22,17 @@
"# Azure Machine Learning Pipeline with HyperDriveStep\n", "# Azure Machine Learning Pipeline with HyperDriveStep\n",
"\n", "\n",
"\n", "\n",
"This notebook is used to demonstrate the use of HyperDriveStep in AML Pipeline.\n", "This notebook is used to demonstrate the use of HyperDriveStep in AML Pipeline."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites and Azure Machine Learning Basics\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
"\n", "\n",
"## Azure Machine Learning and Pipeline SDK-specific imports\n" "## Azure Machine Learning and Pipeline SDK-specific imports"
] ]
}, },
{ {
@@ -26,19 +41,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import os\n",
"import shutil\n",
"import urllib\n",
"import azureml.core\n", "import azureml.core\n",
"from azureml.core import Workspace, Experiment\n", "from azureml.core import Workspace, Experiment\n",
"from azureml.core.datastore import Datastore\n", "from azureml.core.datastore import Datastore\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.exceptions import ComputeTargetException\n", "from azureml.exceptions import ComputeTargetException\n",
"from azureml.data.data_reference import DataReference\n", "from azureml.data.data_reference import DataReference\n",
"from azureml.pipeline.steps import HyperDriveStep\n", "from azureml.pipeline.steps import HyperDriveStep, HyperDriveStepRun\n",
"from azureml.pipeline.core import Pipeline, PipelineData\n", "from azureml.pipeline.core import Pipeline, PipelineData\n",
"from azureml.train.dnn import TensorFlow\n", "from azureml.train.dnn import TensorFlow\n",
"from azureml.train.hyperdrive import *\n", "# from azureml.train.hyperdrive import *\n",
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n",
"from azureml.train.hyperdrive import choice, loguniform\n",
"\n",
"import os\n",
"import shutil\n",
"import urllib\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n", "\n",
"# Check core SDK version number\n", "# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)" "print(\"SDK version:\", azureml.core.VERSION)"
@@ -50,7 +70,7 @@
"source": [ "source": [
"## Initialize workspace\n", "## Initialize workspace\n",
"\n", "\n",
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json" "Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json"
] ]
}, },
{ {
@@ -80,7 +100,7 @@
"script_folder = './tf-mnist'\n", "script_folder = './tf-mnist'\n",
"os.makedirs(script_folder, exist_ok=True)\n", "os.makedirs(script_folder, exist_ok=True)\n",
"\n", "\n",
"exp = Experiment(workspace=ws, name='tf-mnist')" "exp = Experiment(workspace=ws, name='Hyperdrive_sample')"
] ]
}, },
{ {
@@ -105,6 +125,42 @@
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Show some sample images\n",
"Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from utils import load_data\n",
"\n",
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n",
"X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n",
"y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n",
"\n",
"X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n",
"y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n",
"\n",
"count = 0\n",
"sample_size = 30\n",
"plt.figure(figsize = (16, 6))\n",
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
" count = count + 1\n",
" plt.subplot(1, sample_size, count)\n",
" plt.axhline('')\n",
" plt.axvline('')\n",
" plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n",
" plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n",
"plt.show()"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -144,7 +200,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"cluster_name = \"gpucluster\"\n", "cluster_name = \"gpu-cluster\"\n",
"\n", "\n",
"try:\n", "try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
@@ -186,8 +242,12 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Create TensorFlow estimator\n", "## Create TensorFlow estimator\n",
"Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the Batch AI cluster as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", "Next, we construct an [TensorFlow](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) estimator object.\n",
"The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker." "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker.\n",
"\n",
"The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release.\n",
"\n",
"The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release."
] ]
}, },
{ {
@@ -199,7 +259,8 @@
"est = TensorFlow(source_directory=script_folder, \n", "est = TensorFlow(source_directory=script_folder, \n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
" entry_script='tf_mnist.py', \n", " entry_script='tf_mnist.py', \n",
" use_gpu=True)" " use_gpu=True,\n",
" framework_version='1.13')"
] ]
}, },
{ {
@@ -207,7 +268,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Intelligent hyperparameter tuning\n", "## Intelligent hyperparameter tuning\n",
"We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling.\n", "Now let's try hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling.\n",
"\n", "\n",
"In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`validation_acc`)." "In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`validation_acc`)."
] ]
@@ -259,13 +320,13 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"hd_config = HyperDriveRunConfig(estimator=est, \n", "hd_config = HyperDriveConfig(estimator=est, \n",
" hyperparameter_sampling=ps,\n", " hyperparameter_sampling=ps,\n",
" policy=early_termination_policy,\n", " policy=early_termination_policy,\n",
" primary_metric_name='validation_acc', \n", " primary_metric_name='validation_acc', \n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n",
" max_total_runs=1,\n", " max_total_runs=10,\n",
" max_concurrent_runs=1)" " max_concurrent_runs=4)"
] ]
}, },
{ {
@@ -274,6 +335,7 @@
"source": [ "source": [
"## Add HyperDrive as a step of pipeline\n", "## Add HyperDrive as a step of pipeline\n",
"\n", "\n",
"### Setup an input for the hypderdrive step\n",
"Let's setup a data reference for inputs of hyperdrive step." "Let's setup a data reference for inputs of hyperdrive step."
] ]
}, },
@@ -295,7 +357,7 @@
"### HyperDriveStep\n", "### HyperDriveStep\n",
"HyperDriveStep can be used to run HyperDrive job as a step in pipeline.\n", "HyperDriveStep can be used to run HyperDrive job as a step in pipeline.\n",
"- **name:** Name of the step\n", "- **name:** Name of the step\n",
"- **hyperdrive_run_config:** A HyperDriveRunConfig that defines the configuration for this HyperDrive run\n", "- **hyperdrive_config:** A HyperDriveConfig that defines the configuration for this HyperDrive run\n",
"- **estimator_entry_script_arguments:** List of command-line arguments for estimator entry script\n", "- **estimator_entry_script_arguments:** List of command-line arguments for estimator entry script\n",
"- **inputs:** List of input port bindings\n", "- **inputs:** List of input port bindings\n",
"- **outputs:** List of output port bindings\n", "- **outputs:** List of output port bindings\n",
@@ -315,9 +377,10 @@
" datastore=ds,\n", " datastore=ds,\n",
" pipeline_output_name=metrics_output_name)\n", " pipeline_output_name=metrics_output_name)\n",
"\n", "\n",
"hd_step_name='hd_step01'\n",
"hd_step = HyperDriveStep(\n", "hd_step = HyperDriveStep(\n",
" name=\"hyperdrive_module\",\n", " name=hd_step_name,\n",
" hyperdrive_run_config=hd_config,\n", " hyperdrive_config=hd_config,\n",
" estimator_entry_script_arguments=['--data-folder', data_folder],\n", " estimator_entry_script_arguments=['--data-folder', data_folder],\n",
" inputs=[data_folder],\n", " inputs=[data_folder],\n",
" metrics_output=metirics_data)" " metrics_output=metirics_data)"
@@ -337,7 +400,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"pipeline = Pipeline(workspace=ws, steps=[hd_step])\n", "pipeline = Pipeline(workspace=ws, steps=[hd_step])\n",
"pipeline_run = Experiment(ws, 'Hyperdrive_Test').submit(pipeline)" "pipeline_run = exp.submit(pipeline)"
] ]
}, },
{ {
@@ -370,7 +433,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"pipeline_run.wait_for_completion()" "# PUBLISHONLY\n",
"# pipeline_run.wait_for_completion()"
] ]
}, },
{ {
@@ -387,8 +451,9 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)\n", "# PUBLISHONLY\n",
"num_file_downloaded = metrics_output.download('.', show_progress=True)" "# metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)\n",
"# num_file_downloaded = metrics_output.download('.', show_progress=True)"
] ]
}, },
{ {
@@ -397,14 +462,374 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import pandas as pd\n", "# PUBLISHONLY\n",
"import json\n", "# import pandas as pd\n",
"with open(metrics_output._path_on_datastore) as f: \n", "# import json\n",
" metrics_output_result = f.read()\n", "# with open(metrics_output._path_on_datastore) as f: \n",
"# metrics_output_result = f.read()\n",
" \n", " \n",
"deserialized_metrics_output = json.loads(metrics_output_result)\n", "# deserialized_metrics_output = json.loads(metrics_output_result)\n",
"df = pd.DataFrame(deserialized_metrics_output)\n", "# df = pd.DataFrame(deserialized_metrics_output)\n",
"df" "# df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Find and register best model\n",
"When all the jobs finish, we can find out the one that has the highest accuracy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# hd_step_run = HyperDriveStepRun(step_run=pipeline_run.find_step_run(hd_step_name)[0])\n",
"# best_run = hd_step_run.get_best_run_by_primary_metric()\n",
"# best_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's list the model files uploaded during the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# print(best_run.get_file_names())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then register the folder (and all files in it) as a model named `tf-dnn-mnist` under the workspace for deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy the model in ACI\n",
"Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n",
"### Create score.py\n",
"First, we will create a scoring script that will be invoked by the web service call. \n",
"\n",
"* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n",
" * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n",
" * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile score.py\n",
"import json\n",
"import numpy as np\n",
"import os\n",
"import tensorflow as tf\n",
"\n",
"from azureml.core.model import Model\n",
"\n",
"def init():\n",
" global X, output, sess\n",
" tf.reset_default_graph()\n",
" model_root = Model.get_model_path('tf-dnn-mnist')\n",
" saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n",
" X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n",
" output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n",
" \n",
" sess = tf.Session()\n",
" saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n",
"\n",
"def run(raw_data):\n",
" data = np.array(json.loads(raw_data)['data'])\n",
" # make prediction\n",
" out = output.eval(session=sess, feed_dict={X: data})\n",
" y_hat = np.argmax(out, axis=1)\n",
" return y_hat.tolist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create myenv.yml\n",
"We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# from azureml.core.runconfig import CondaDependencies\n",
"\n",
"# cd = CondaDependencies.create()\n",
"# cd.add_conda_package('numpy')\n",
"# cd.add_tensorflow_conda_package()\n",
"# cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
"\n",
"# print(cd.serialize_to_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy to ACI\n",
"We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# from azureml.core.webservice import AciWebservice\n",
"\n",
"# aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
"# memory_gb=1, \n",
"# tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n",
"# description='Tensorflow DNN on MNIST')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deployment Process\n",
"Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n",
"1. **Register model** \n",
"Take the local `model` folder (which contains our previously downloaded trained model files) and register it (and the files inside that folder) as a model named `model` under the workspace. Azure ML will register the model directory or model file(s) we specify to the `model_paths` parameter of the `Webservice.deploy` call.\n",
"2. **Build Docker image** \n",
"Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` folder containing the TensorFlow model files. \n",
"3. **Register image** \n",
"Register that image under the workspace. \n",
"4. **Ship to ACI** \n",
"And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# from azureml.core.image import ContainerImage\n",
"\n",
"# imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n",
"# runtime=\"python\", \n",
"# conda_file=\"myenv.yml\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# %%time\n",
"# from azureml.core.webservice import Webservice\n",
"\n",
"# service = Webservice.deploy_from_model(workspace=ws,\n",
"# name='tf-mnist-svc',\n",
"# deployment_config=aciconfig,\n",
"# models=[model],\n",
"# image_config=imgconfig)\n",
"\n",
"# service.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# print(service.get_logs())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the scoring web service endpoint:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# print(service.scoring_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test the deployed model\n",
"Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n",
"\n",
"After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# import json\n",
"\n",
"# # find 30 random samples from test set\n",
"# n = 30\n",
"# sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
"\n",
"# test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n",
"# test_samples = bytes(test_samples, encoding='utf8')\n",
"\n",
"# # predict using the deployed model\n",
"# result = service.run(input_data=test_samples)\n",
"\n",
"# # compare actual value vs. the predicted values:\n",
"# i = 0\n",
"# plt.figure(figsize = (20, 1))\n",
"\n",
"# for s in sample_indices:\n",
"# plt.subplot(1, n, i + 1)\n",
"# plt.axhline('')\n",
"# plt.axvline('')\n",
" \n",
"# # use different color for misclassified sample\n",
"# font_color = 'red' if y_test[s] != result[i] else 'black'\n",
"# clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
" \n",
"# plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n",
"# plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
" \n",
"# i = i + 1\n",
"# plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also send raw HTTP request to the service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# import requests\n",
"\n",
"# # send a random row from the test set to score\n",
"# random_index = np.random.randint(0, len(X_test)-1)\n",
"# input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n",
"\n",
"# headers = {'Content-Type':'application/json'}\n",
"\n",
"# resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
"\n",
"# print(\"POST to url\", service.scoring_uri)\n",
"# print(\"input data:\", input_data)\n",
"# print(\"label:\", y_test[random_index])\n",
"# print(\"prediction:\", resp.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at the workspace after the web service was deployed. You should see \n",
"* a registered model named 'model' and with the id 'model:1'\n",
"* an image called 'tf-mnist' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n",
"* a webservice called 'tf-mnist' with some scoring URL"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# models = ws.models\n",
"# for name, model in models.items():\n",
"# print(\"Model: {}, ID: {}\".format(name, model.id))\n",
" \n",
"# images = ws.images\n",
"# for name, image in images.items():\n",
"# print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
" \n",
"# webservices = ws.webservices\n",
"# for name, webservice in webservices.items():\n",
"# print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"You can delete the ACI deployment with a simple delete API call."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# PUBLISHONLY\n",
"# service.delete()"
] ]
} }
], ],

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -21,7 +28,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites and Azure Machine Learning Basics\n", "## Prerequisites and Azure Machine Learning Basics\n",
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
"\n", "\n",
"### Initialization Steps" "### Initialization Steps"
] ]
@@ -74,7 +81,7 @@
"source": [ "source": [
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"aml_compute_target = \"cpucluster\"\n", "aml_compute_target = \"cpu-cluster\"\n",
"try:\n", "try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n", " aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n", " print(\"found existing compute target.\")\n",

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -21,7 +28,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites and AML Basics\n", "## Prerequisites and AML Basics\n",
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n",
"\n", "\n",
"### Initialization Steps" "### Initialization Steps"
] ]
@@ -69,7 +76,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute, ComputeTarget\n", "from azureml.core.compute import AmlCompute, ComputeTarget\n",
"aml_compute_target = \"aml-compute\"\n", "aml_compute_target = \"cpu-cluster\"\n",
"try:\n", "try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n", " aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"Found existing compute target: {}\".format(aml_compute_target))\n", " print(\"Found existing compute target: {}\".format(aml_compute_target))\n",
@@ -301,7 +308,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Disable the schedule" "### Disable the schedule\n",
"It is important to note the best practice of disabling schedules when not in use.\n",
"The number of schedule triggers allowed per month per region per subscription is 100,000.\n",
"This is calculated using the project trigger counts for all active schedules."
] ]
}, },
{ {
@@ -376,7 +386,7 @@
"source": [ "source": [
"### Create a schedule for the pipeline using a Datastore\n", "### Create a schedule for the pipeline using a Datastore\n",
"This schedule will run when additions or modifications are made to Blobs in the Datastore.\n", "This schedule will run when additions or modifications are made to Blobs in the Datastore.\n",
"By default, the Datastore container is monitored for changes. Use the path_on_datastore parameter to instead specify a path on the Datastore to monitor for changes. Changes made to subfolders in the container/path will not trigger the schedule.\n", "By default, the Datastore container is monitored for changes. Use the path_on_datastore parameter to instead specify a path on the Datastore to monitor for changes. Note: the path_on_datastore will be under the container for the datastore, so the actual path monitored will be container/path_on_datastore. Changes made to subfolders in the container/path will not trigger the schedule.\n",
"Note: Only Blob Datastores are supported." "Note: Only Blob Datastores are supported."
] ]
}, },
@@ -396,6 +406,7 @@
" datastore=datastore,\n", " datastore=datastore,\n",
" wait_for_provisioning=True,\n", " wait_for_provisioning=True,\n",
" description=\"Schedule Run\")\n", " description=\"Schedule Run\")\n",
" #polling_interval=5, use polling_interval to specify how often to poll for blob additions/modifications. Default value is 5 minutes.\n",
" #path_on_datastore=\"file/path\") use path_on_datastore to specify a specific folder to monitor for changes.\n", " #path_on_datastore=\"file/path\") use path_on_datastore to specify a specific folder to monitor for changes.\n",
"\n", "\n",
"# You may want to make sure that the schedule is provisioned properly\n", "# You may want to make sure that the schedule is provisioned properly\n",

View File

@@ -0,0 +1,567 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# How to Setup a PipelineEndpoint and Submit a Pipeline Using the PipelineEndpoint.\n",
"In this notebook, we will see how to setup a PipelineEndpoint and run specific pipeline version.\n",
"\n",
"PipelineEndpoint can be used to update a published pipeline while maintaining same endpoint.\n",
"PipelineEndpoint, provides a way to keep track of [PublishedPipelines](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.publishedpipeline) using versions. PipelineEndpoint uses endpoint with version information to trigger underlying published pipeline. Pipeline endpoints are uniquely named within a workspace. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites and AML Basics\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://github.com/Azure/MachineLearningNotebooks) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Notebook Overview\n",
"In this notebook, we provide an introduction to Azure machine learning PipelineEndpoints. It covers:\n",
"* [Create PipelineEndpoint](#Create-PipelineEndpoint), How to create PipelineEndpoint.\n",
"* [Retrieving PipelineEndpoint](#Retrieving-PipelineEndpoint), How to get specific PipelineEndpoint from worskpace by name/Id and get all [PipelineEndpoints](#Get-all-PipelineEndpoints-in-workspace) within workspace.\n",
"* [PipelineEndpoint Properties](#PipelineEndpoint-properties). How to get and set PipelineEndpoint properties, such as default version of PipelineEndpoint.\n",
"* [PipelineEndpoint Submission](#PipelineEndpoint-Submission). How to run a Pipeline using PipelineEndpoint."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create PipelineEndpoint\n",
"Following are required input parameters to create PipelineEndpoint:\n",
"\n",
"* *workspace*: AML workspace.\n",
"* *name*: name of PipelineEndpoint, it is unique within workspace.\n",
"* *description*: description details for PipelineEndpoint.\n",
"* *pipeline*: A [Pipeline](#Steps-to-create-simple-Pipeline) or [PublishedPipeline](#Publish-Pipeline), to set default version of PipelineEndpoint. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialization, Steps to create a Pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Run, Experiment, Datastore\n",
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"from azureml.pipeline.steps import PythonScriptStep\n",
"from azureml.pipeline.core import Pipeline\n",
"\n",
"#Retrieve an already attached Azure Machine Learning Compute\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"Found existing compute target: {}\".format(aml_compute_target))\n",
"except:\n",
" print(\"Creating new compute target: {}\".format(aml_compute_target))\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n",
"# source_directory\n",
"source_directory = '.'\n",
"# define a single step pipeline for demonstration purpose.\n",
"trainStep = PythonScriptStep(\n",
" name=\"Training_Step\",\n",
" script_name=\"train.py\", \n",
" compute_target=aml_compute_target, \n",
" source_directory=source_directory\n",
")\n",
"print(\"TrainStep created\")\n",
"# build and validate Pipeline\n",
"pipeline = Pipeline(workspace=ws, steps=[trainStep])\n",
"print(\"Pipeline is built\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Publish Pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
"timenow = datetime.now().strftime('%m-%d-%Y-%H-%M')\n",
"\n",
"pipeline_name = timenow + \"-Pipeline\"\n",
"print(pipeline_name)\n",
"\n",
"published_pipeline = pipeline.publish(\n",
" name=pipeline_name, \n",
" description=pipeline_name)\n",
"print(\"Newly published pipeline id: {}\".format(published_pipeline.id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Publishing PipelineEndpoint\n",
"Create PipelineEndpoint with required parameters: workspace, name, description and pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import PipelineEndpoint\n",
"\n",
"pipeline_endpoint = PipelineEndpoint.publish(workspace=ws, name=\"PipelineEndpointTest\",\n",
" pipeline=pipeline, description=\"Test description Notebook\")\n",
"pipeline_endpoint"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieving PipelineEndpoint\n",
"\n",
"PipelineEndpoint is uniquely defined by name and id within workspace. PipelineEndpoint in workspace can be retrived by Id or by name."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get PipelineEndpoint by Name\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name = PipelineEndpoint.get(workspace=ws, name=\"PipelineEndpointTest\")\n",
"pipeline_endpoint_by_name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get PipelineEndpoint by Id\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#get the PipelineEndpoint Id\n",
"pipeline_endpoint_by_name = PipelineEndpoint.get(workspace=ws, name=\"PipelineEndpointTest\")\n",
"endpoint_id = pipeline_endpoint_by_name.id\n",
"\n",
"pipeline_endpoint_by_id = PipelineEndpoint.get(workspace=ws, id=endpoint_id)\n",
"pipeline_endpoint_by_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get all PipelineEndpoints in workspace\n",
"Returns all PipelineEndpoints within workspace"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"endpoint_list = PipelineEndpoint.get_all(workspace=ws, active_only=True)\n",
"endpoint_list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PipelineEndpoint properties"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Default Version of PipelineEndpoint\n",
"Default version of PipelineEndpoint starts from \"0\" and increments on addition of pipelines.\n",
"\n",
"##### Get the Default Version"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"default_version = pipeline_endpoint_by_name.get_default_version()\n",
"default_version"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Set default version \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name.set_default_version(\"0\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get the Published Pipeline corresponds to specific version of PipelineEndpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline = pipeline_endpoint_by_name.get_pipeline(\"0\")\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get default version Published Pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline = pipeline_endpoint_by_name.get_pipeline()\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Set Published Pipeline to default version"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set Published Pipeline to PipelineEndpoint, if exists\n",
"pipeline_endpoint_by_name.set_default(published_pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get all Versions in PipelineEndpoint\n",
"Returns list of published pipelines and its versions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"versions = pipeline_endpoint_by_name.get_all_versions()\n",
"\n",
"for ve in versions:\n",
" print(ve.version)\n",
" print(ve.pipeline.id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get all Published Pipelines in PipelineEndpoint\n",
"Returns all active pipelines in PipelineEnpoint, if active_only flag is set to True."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipelines = pipeline_endpoint_by_name.get_all_pipelines(active_only=True)\n",
"pipelines"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Name property of PipelineEndpoint\n",
"PipelineEndpoint is uniquely identified by name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Set Name PipelineEndpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name.set_name(name=\"NewName\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add Published Pipeline to PipelineEndpoint, \n",
"Adding published pipeline, if its not present in PipelineEndpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name.add(published_pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add Published pipeline to PipelineEndpoint and set it to default version\n",
"Adding published pipeline to PipelineEndpoint if not present and set it to default"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name.add_default(published_pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PipelineEndpoint Submission\n",
"PipelineEndpoint triggers specific versioned pipeline or default pipeline by:\n",
"* Rest Endpoint \n",
"* Submit call."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Run Pipeline by endpoint property of PipelineEndpoint\n",
"Run specific pipeline using endpoint property of PipelineEndpoint and executing http post."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name = PipelineEndpoint.get(workspace=ws, name=\"PipelineEndpointTest\")\n",
"\n",
"# endpoint with id \n",
"rest_endpoint_id = pipeline_endpoint_by_name.endpoint\n",
"\n",
"# for default version pipeline\n",
"rest_endpoint_id_without_version_with_id = rest_endpoint_id\n",
"\n",
"# for specific version pipeline just append version info\n",
"version=\"0\"\n",
"rest_endpoint_id_with_version = rest_endpoint_id_without_version_with_id+\"/\"+ version\n",
"print(rest_endpoint_id_with_version)\n",
"pipeline_endpoint_by_name"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# endpoint with name\n",
"rest_endpoint_name = rest_endpoint_id.split(\"Id\", 1)[0] + \"Name?name=\" + pipeline_endpoint_by_name.name\n",
"\n",
"# for default version pipeline\n",
"rest_endpoint_name_without_version = rest_endpoint_name\n",
"\n",
"# for specific version pipeline just append version info\n",
"version=\"0\"\n",
"rest_endpoint_name_with_version = rest_endpoint_name_without_version+\"&pipelineVersion=\"+ version\n",
"print(rest_endpoint_name_with_version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
"import requests\n",
"\n",
"auth = InteractiveLoginAuthentication()\n",
"aad_token = auth.get_authentication_header()\n",
"\n",
"#endpoint = pipeline_endpoint_by_name.url\n",
"\n",
"print(\"You can perform HTTP POST on URL {} to trigger this pipeline\".format(rest_endpoint_name_with_version))\n",
"\n",
"# specify the param when running the pipeline\n",
"response = requests.post(rest_endpoint_name_with_version, \n",
" headers=aad_token, \n",
" json={\"ExperimentName\": \"default_pipeline\",\n",
" \"RunSource\": \"SDK\",\n",
" \"ParameterAssignments\": {\"1\": \"united\", \"2\":\"city\"}})\n",
"\n",
"run_id = response.json()[\"Id\"]\n",
"print(run_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Run Pipeline by Submit call of PipelineEndpoint \n",
"Run specific pipeline using Submit api of PipelineEndpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# submit pipeline with specific version\n",
"run_id = pipeline_endpoint_by_name.submit(\"TestPipelineEndpoint\", pipeline_version=\"0\")\n",
"print(run_id)\n",
"\n",
"# submit pipeline with default version\n",
"run_id = pipeline_endpoint_by_name.submit(\"TestPipelineEndpoint\")\n",
"print(run_id)"
]
}
],
"metadata": {
"authors": [
{
"name": "mameghwa"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-adla-as-compute-target.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -52,7 +59,7 @@
"source": [ "source": [
"## Initialize Workspace\n", "## Initialize Workspace\n",
"\n", "\n",
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json" "Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json"
] ]
}, },
{ {

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -105,7 +112,7 @@
"source": [ "source": [
"## Initialize Workspace\n", "## Initialize Workspace\n",
"\n", "\n",
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json" "Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json"
] ]
}, },
{ {
@@ -397,7 +404,9 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### 1. Running the demo notebook already added to the Databricks workspace\n", "### 1. Running the demo notebook already added to the Databricks workspace\n",
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:" "Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:\n",
"\n",
"your notebook's path in Azure Databricks UI by hovering over to notebook's title. A typical path of notebook looks like this `/Users/example@databricks.com/example`. See [Databricks Workspace](https://docs.azuredatabricks.net/user-guide/workspace.html) to learn about the folder structure."
] ]
}, },
{ {

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -23,7 +30,7 @@
"## Introduction\n", "## Introduction\n",
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n", "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
"\n", "\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n", "\n",
"In this notebook you would see\n", "In this notebook you would see\n",
"1. Create an `Experiment` in an existing `Workspace`.\n", "1. Create an `Experiment` in an existing `Workspace`.\n",
@@ -123,9 +130,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create or Attach existing AmlCompute\n", "### Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you get the default `AmlCompute` as your training compute resource."
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process."
] ]
}, },
{ {
@@ -135,7 +140,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpucluster\"\n", "amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"found = False\n", "found = False\n",
"# Check if this compute target already exists in the workspace.\n", "# Check if this compute target already exists in the workspace.\n",
@@ -237,7 +242,7 @@
" X_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n", " X_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n", " y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
"\n", "\n",
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n" " return { \"X\" : X_train.values, \"y\" : y_train.values.flatten() }\n"
] ]
}, },
{ {

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -21,7 +28,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites and Azure Machine Learning Basics\n", "## Prerequisites and Azure Machine Learning Basics\n",
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
"\n", "\n",
"### Azure Machine Learning and Pipeline SDK-specific Imports" "### Azure Machine Learning and Pipeline SDK-specific Imports"
] ]
@@ -120,8 +127,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Retrieve or create a Aml compute\n", "#### Retrieve or create an Aml compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Aml Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target." "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's get the default Aml Compute in the current workspace. We will then run the training script on this compute target."
] ]
}, },
{ {
@@ -132,7 +139,7 @@
"source": [ "source": [
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"aml_compute_target = \"aml-compute\"\n", "aml_compute_target = \"cpu-cluster\"\n",
"try:\n", "try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n", " aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n", " print(\"found existing compute target.\")\n",
@@ -290,9 +297,6 @@
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", "# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n", "run_config.environment.python.user_managed_dependencies = False\n",
"\n", "\n",
"# auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
"run_config.auto_prepare_environment = True\n",
"\n",
"# specify CondaDependencies obj\n", "# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" "run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
] ]
@@ -429,6 +433,77 @@
"RunDetails(pipeline_run1).show()" "RunDetails(pipeline_run1).show()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Wait for pipeline run to complete"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run1.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### See Outputs\n",
"\n",
"See where outputs of each pipeline step are located on your datastore.\n",
"\n",
"***Wait for pipeline run to complete, to make sure all the outputs are ready***"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get Steps\n",
"for step in pipeline_run1.get_steps():\n",
" print(\"Outputs of step \" + step.name)\n",
" \n",
" # Get a dictionary of StepRunOutputs with the output name as the key \n",
" output_dict = step.get_outputs()\n",
" \n",
" for name, output in output_dict.items():\n",
" \n",
" output_reference = output.get_port_data_reference() # Get output port data reference\n",
" print(\"\\tname: \" + name)\n",
" print(\"\\tdatastore: \" + output_reference.datastore_name)\n",
" print(\"\\tpath on datastore: \" + output_reference.path_on_datastore)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download Outputs\n",
"\n",
"We can download the output of any step to our local machine using the SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Retrieve the step runs by name 'train.py'\n",
"train_step = pipeline_run1.find_step_run('train.py')\n",
"\n",
"if train_step:\n",
" train_step_obj = train_step[0] # since we have only one step by name 'train.py'\n",
" train_step_obj.get_output_data('processed_data1').download(\"./outputs\") # download the output to current directory"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -0,0 +1,58 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
import argparse
import os
import pandas as pd
import azureml.dataprep as dprep
def get_dict(dict_str):
pairs = dict_str.strip("{}").split("\;")
new_dict = {}
for pair in pairs:
key, value = pair.strip('\\').split(":")
new_dict[key.strip().strip("'")] = value.strip().strip("'")
return new_dict
print("Cleans the input data")
parser = argparse.ArgumentParser("cleanse")
parser.add_argument("--input_cleanse", type=str, help="raw taxi data")
parser.add_argument("--output_cleanse", type=str, help="cleaned taxi data directory")
parser.add_argument("--useful_columns", type=str, help="useful columns to keep")
parser.add_argument("--columns", type=str, help="rename column pattern")
args = parser.parse_args()
print("Argument 1(input taxi data path): %s" % args.input_cleanse)
print("Argument 2(columns to keep): %s" % str(args.useful_columns.strip("[]").split("\;")))
print("Argument 3(columns renaming mapping): %s" % str(args.columns.strip("{}").split("\;")))
print("Argument 4(output cleansed taxi data path): %s" % args.output_cleanse)
raw_df = dprep.read_csv(path=args.input_cleanse, header=dprep.PromoteHeadersMode.GROUPED)
# These functions ensure that null data is removed from the data set,
# which will help increase machine learning model accuracy.
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep
# for more details
useful_columns = [s.strip().strip("'") for s in args.useful_columns.strip("[]").split("\;")]
columns = get_dict(args.columns)
all_columns = dprep.ColumnSelector(term=".*", use_regex=True)
drop_if_all_null = [all_columns, dprep.ColumnRelationship(dprep.ColumnRelationship.ALL)]
new_df = (raw_df
.replace_na(columns=all_columns)
.drop_nulls(*drop_if_all_null)
.rename_columns(column_pairs=columns)
.keep_columns(columns=useful_columns))
if not (args.output_cleanse is None):
os.makedirs(args.output_cleanse, exist_ok=True)
print("%s created" % args.output_cleanse)
write_df = new_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_cleanse))
write_df.run_local()

View File

@@ -0,0 +1,55 @@
import argparse
import os
import azureml.dataprep as dprep
print("Filters out coordinates for locations that are outside the city border.",
"Chain the column filter commands within the filter() function",
"and define the minimum and maximum bounds for each field.")
parser = argparse.ArgumentParser("filter")
parser.add_argument("--input_filter", type=str, help="merged taxi data directory")
parser.add_argument("--output_filter", type=str, help="filter out out of city locations")
args = parser.parse_args()
print("Argument 1(input taxi data path): %s" % args.input_filter)
print("Argument 2(output filtered taxi data path): %s" % args.output_filter)
combined_df = dprep.read_csv(args.input_filter + '/part-*')
# These functions filter out coordinates for locations that are outside the city border.
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details
# Create a condensed view of the dataflow to just show the lat/long fields,
# which makes it easier to evaluate missing or out-of-scope coordinates
decimal_type = dprep.TypeConverter(data_type=dprep.FieldType.DECIMAL)
combined_df = combined_df.set_column_types(type_conversions={
"pickup_longitude": decimal_type,
"pickup_latitude": decimal_type,
"dropoff_longitude": decimal_type,
"dropoff_latitude": decimal_type
})
# Filter out coordinates for locations that are outside the city border.
# Chain the column filter commands within the filter() function
# and define the minimum and maximum bounds for each field
latlong_filtered_df = (combined_df
.drop_nulls(columns=["pickup_longitude",
"pickup_latitude",
"dropoff_longitude",
"dropoff_latitude"],
column_relationship=dprep.ColumnRelationship(dprep.ColumnRelationship.ANY))
.filter(dprep.f_and(dprep.col("pickup_longitude") <= -73.72,
dprep.col("pickup_longitude") >= -74.09,
dprep.col("pickup_latitude") <= 40.88,
dprep.col("pickup_latitude") >= 40.53,
dprep.col("dropoff_longitude") <= -73.72,
dprep.col("dropoff_longitude") >= -74.09,
dprep.col("dropoff_latitude") <= 40.88,
dprep.col("dropoff_latitude") >= 40.53)))
if not (args.output_filter is None):
os.makedirs(args.output_filter, exist_ok=True)
print("%s created" % args.output_filter)
write_df = latlong_filtered_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_filter))
write_df.run_local()

View File

@@ -0,0 +1,29 @@
import argparse
import os
import azureml.dataprep as dprep
print("Merge Green and Yellow taxi data")
parser = argparse.ArgumentParser("merge")
parser.add_argument("--input_green_merge", type=str, help="cleaned green taxi data directory")
parser.add_argument("--input_yellow_merge", type=str, help="cleaned yellow taxi data directory")
parser.add_argument("--output_merge", type=str, help="green and yellow taxi data merged")
args = parser.parse_args()
print("Argument 1(input green taxi data path): %s" % args.input_green_merge)
print("Argument 2(input yellow taxi data path): %s" % args.input_yellow_merge)
print("Argument 3(output merge taxi data path): %s" % args.output_merge)
green_df = dprep.read_csv(args.input_green_merge + '/part-*')
yellow_df = dprep.read_csv(args.input_yellow_merge + '/part-*')
# Appending yellow data to green data
combined_df = green_df.append_rows([yellow_df])
if not (args.output_merge is None):
os.makedirs(args.output_merge, exist_ok=True)
print("%s created" % args.output_merge)
write_df = combined_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_merge))
write_df.run_local()

View File

@@ -0,0 +1,47 @@
import argparse
import os
import azureml.dataprep as dprep
print("Replace undefined values to relavant values and rename columns to meaningful names")
parser = argparse.ArgumentParser("normalize")
parser.add_argument("--input_normalize", type=str, help="combined and converted taxi data")
parser.add_argument("--output_normalize", type=str, help="replaced undefined values and renamed columns")
args = parser.parse_args()
print("Argument 1(input taxi data path): %s" % args.input_normalize)
print("Argument 2(output normalized taxi data path): %s" % args.output_normalize)
combined_converted_df = dprep.read_csv(args.input_normalize + '/part-*')
# These functions replace undefined values and rename to use meaningful names.
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details
replaced_stfor_vals_df = combined_converted_df.replace(columns="store_forward",
find="0",
replace_with="N").fill_nulls("store_forward", "N")
replaced_distance_vals_df = replaced_stfor_vals_df.replace(columns="distance",
find=".00",
replace_with=0).fill_nulls("distance", 0)
replaced_distance_vals_df = replaced_distance_vals_df.to_number(["distance"])
time_split_df = (replaced_distance_vals_df
.split_column_by_example(source_column="pickup_datetime")
.split_column_by_example(source_column="dropoff_datetime"))
# Split the pickup and dropoff datetime values into the respective date and time columns
renamed_col_df = (time_split_df
.rename_columns(column_pairs={
"pickup_datetime_1": "pickup_date",
"pickup_datetime_2": "pickup_time",
"dropoff_datetime_1": "dropoff_date",
"dropoff_datetime_2": "dropoff_time"}))
if not (args.output_normalize is None):
os.makedirs(args.output_normalize, exist_ok=True)
print("%s created" % args.output_normalize)
write_df = renamed_col_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_normalize))
write_df.run_local()

View File

@@ -0,0 +1,88 @@
import argparse
import os
import azureml.dataprep as dprep
print("Transforms the renamed taxi data to the required format")
parser = argparse.ArgumentParser("transform")
parser.add_argument("--input_transform", type=str, help="renamed taxi data")
parser.add_argument("--output_transform", type=str, help="transformed taxi data")
args = parser.parse_args()
print("Argument 1(input taxi data path): %s" % args.input_transform)
print("Argument 2(output final transformed taxi data): %s" % args.output_transform)
renamed_df = dprep.read_csv(args.input_transform + '/part-*')
# These functions transform the renamed data to be used finally for training.
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details
# Split the pickup and dropoff date further into the day of the week, day of the month, and month values.
# To get the day of the week value, use the derive_column_by_example() function.
# The function takes an array parameter of example objects that define the input data,
# and the preferred output. The function automatically determines your preferred transformation.
# For the pickup and dropoff time columns, split the time into the hour, minute, and second by using
# the split_column_by_example() function with no example parameter. After you generate the new features,
# use the drop_columns() function to delete the original fields as the newly generated features are preferred.
# Rename the rest of the fields to use meaningful descriptions.
transformed_features_df = (renamed_df
.derive_column_by_example(
source_columns="pickup_date",
new_column_name="pickup_weekday",
example_data=[("2009-01-04", "Sunday"), ("2013-08-22", "Thursday")])
.derive_column_by_example(
source_columns="dropoff_date",
new_column_name="dropoff_weekday",
example_data=[("2013-08-22", "Thursday"), ("2013-11-03", "Sunday")])
.split_column_by_example(source_column="pickup_time")
.split_column_by_example(source_column="dropoff_time")
.split_column_by_example(source_column="pickup_time_1")
.split_column_by_example(source_column="dropoff_time_1")
.drop_columns(columns=[
"pickup_date", "pickup_time", "dropoff_date", "dropoff_time",
"pickup_date_1", "dropoff_date_1", "pickup_time_1", "dropoff_time_1"])
.rename_columns(column_pairs={
"pickup_date_2": "pickup_month",
"pickup_date_3": "pickup_monthday",
"pickup_time_1_1": "pickup_hour",
"pickup_time_1_2": "pickup_minute",
"pickup_time_2": "pickup_second",
"dropoff_date_2": "dropoff_month",
"dropoff_date_3": "dropoff_monthday",
"dropoff_time_1_1": "dropoff_hour",
"dropoff_time_1_2": "dropoff_minute",
"dropoff_time_2": "dropoff_second"}))
# Drop the pickup_datetime and dropoff_datetime columns because they're
# no longer needed (granular time features like hour,
# minute and second are more useful for model training).
processed_df = transformed_features_df.drop_columns(columns=["pickup_datetime", "dropoff_datetime"])
# Use the type inference functionality to automatically check the data type of each field,
# and display the inference results.
type_infer = processed_df.builders.set_column_types()
type_infer.learn()
# The inference results look correct based on the data. Now apply the type conversions to the dataflow.
type_converted_df = type_infer.to_dataflow()
# Before you package the dataflow, run two final filters on the data set.
# To eliminate incorrectly captured data points,
# filter the dataflow on records where both the cost and distance variable values are greater than zero.
# This step will significantly improve machine learning model accuracy,
# because data points with a zero cost or distance represent major outliers that throw off prediction accuracy.
final_df = type_converted_df.filter(dprep.col("distance") > 0)
final_df = final_df.filter(dprep.col("cost") > 0)
# Writing the final dataframe to use for training in the following steps
if not (args.output_transform is None):
os.makedirs(args.output_transform, exist_ok=True)
print("%s created" % args.output_transform)
write_df = final_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_transform))
write_df.run_local()

View File

@@ -0,0 +1,31 @@
import argparse
import os
import azureml.dataprep as dprep
import azureml.core
print("Extracts important features from prepared data")
parser = argparse.ArgumentParser("featurization")
parser.add_argument("--input_featurization", type=str, help="input featurization")
parser.add_argument("--useful_columns", type=str, help="columns to use")
parser.add_argument("--output_featurization", type=str, help="output featurization")
args = parser.parse_args()
print("Argument 1(input training data path): %s" % args.input_featurization)
print("Argument 2(column features to use): %s" % str(args.useful_columns.strip("[]").split("\;")))
print("Argument 3:(output featurized training data path) %s" % args.output_featurization)
dflow_prepared = dprep.read_csv(args.input_featurization + '/part-*')
# These functions extracts useful features for training
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models for more detail
useful_columns = [s.strip().strip("'") for s in args.useful_columns.strip("[]").split("\;")]
dflow = dflow_prepared.keep_columns(useful_columns)
if not (args.output_featurization is None):
os.makedirs(args.output_featurization, exist_ok=True)
print("%s created" % args.output_featurization)
write_df = dflow.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_featurization))
write_df.run_local()

View File

@@ -0,0 +1,12 @@
import os
import pandas as pd
def get_data():
print("In get_data")
print(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'])
X_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'] + "/part-00000", header=0)
y_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_y'] + "/part-00000", header=0)
return {"X": X_train.values, "y": y_train.values.flatten()}

View File

@@ -0,0 +1,48 @@
import argparse
import os
import azureml.dataprep as dprep
import azureml.core
from sklearn.model_selection import train_test_split
def write_output(df, path):
os.makedirs(path, exist_ok=True)
print("%s created" % path)
df.to_csv(path + "/part-00000", index=False)
print("Split the data into train and test")
parser = argparse.ArgumentParser("split")
parser.add_argument("--input_split_features", type=str, help="input split features")
parser.add_argument("--input_split_labels", type=str, help="input split labels")
parser.add_argument("--output_split_train_x", type=str, help="output split train features")
parser.add_argument("--output_split_train_y", type=str, help="output split train labels")
parser.add_argument("--output_split_test_x", type=str, help="output split test features")
parser.add_argument("--output_split_test_y", type=str, help="output split test labels")
args = parser.parse_args()
print("Argument 1(input taxi data features path): %s" % args.input_split_features)
print("Argument 2(input taxi data labels path): %s" % args.input_split_labels)
print("Argument 3(output training features split path): %s" % args.output_split_train_x)
print("Argument 4(output training labels split path): %s" % args.output_split_train_y)
print("Argument 5(output test features split path): %s" % args.output_split_test_x)
print("Argument 6(output test labels split path): %s" % args.output_split_test_y)
x_df = dprep.read_csv(path=args.input_split_features, header=dprep.PromoteHeadersMode.GROUPED).to_pandas_dataframe()
y_df = dprep.read_csv(path=args.input_split_labels, header=dprep.PromoteHeadersMode.GROUPED).to_pandas_dataframe()
# These functions splits the input features and labels into test and train data
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models for more detail
x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223)
if not (args.output_split_train_x is None and
args.output_split_test_x is None and
args.output_split_train_y is None and
args.output_split_test_y is None):
write_output(x_train, args.output_split_train_x)
write_output(y_train, args.output_split_train_y)
write_output(x_test, args.output_split_test_x)
write_output(y_test, args.output_split_test_y)

View File

@@ -8,6 +8,13 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -28,7 +35,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites\n", "## Prerequisites\n",
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. " "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. "
] ]
}, },
{ {
@@ -165,7 +172,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# choose a name for your cluster\n", "# choose a name for your cluster\n",
"aml_compute_name = os.environ.get(\"AML_COMPUTE_NAME\", \"gpucluster\")\n", "aml_compute_name = os.environ.get(\"AML_COMPUTE_NAME\", \"gpu-cluster\")\n",
"cluster_min_nodes = os.environ.get(\"AML_COMPUTE_MIN_NODES\", 0)\n", "cluster_min_nodes = os.environ.get(\"AML_COMPUTE_MIN_NODES\", 0)\n",
"cluster_max_nodes = os.environ.get(\"AML_COMPUTE_MAX_NODES\", 1)\n", "cluster_max_nodes = os.environ.get(\"AML_COMPUTE_MAX_NODES\", 1)\n",
"vm_size = os.environ.get(\"AML_COMPUTE_SKU\", \"STANDARD_NC6\")\n", "vm_size = os.environ.get(\"AML_COMPUTE_SKU\", \"STANDARD_NC6\")\n",
@@ -310,7 +317,7 @@
"source": [ "source": [
"from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n", "from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n",
"\n", "\n",
"cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.10.0\", \"azureml-defaults\"])\n", "cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.13.1\", \"azureml-defaults\"])\n",
"\n", "\n",
"# Runconfig\n", "# Runconfig\n",
"amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n", "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n",

Some files were not shown because too many files have changed in this diff Show More