mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
version 1.0.8
This commit is contained in:
50
README.md
50
README.md
@@ -1,36 +1,34 @@
|
|||||||
# Azure Machine Learning service sample notebooks
|
# Azure Machine Learning service example notebooks
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK
|
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK
|
||||||
which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK
|
which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK
|
||||||
allows you the choice of using local or cloud compute resources, while managing
|
allows you the choice of using local or cloud compute resources, while managing
|
||||||
and maintaining the complete data science workflow from the cloud.
|
and maintaining the complete data science workflow from the cloud.
|
||||||
|
|
||||||
* Read [instructions on setting up notebooks](./NBSETUP.md) to run these notebooks.
|

|
||||||
|
|
||||||
* Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
|
## How to use and navigate the example notebooks?
|
||||||
|
|
||||||
## Getting Started
|
You can set up you own Python environment or use Azure Notebooks with Azure ML SDK pre-installed. Read [these instructions](./NBSETUP.md) to set up your environment and clone the example notebooks.
|
||||||
|
|
||||||
These examples will provide you with an effective way to get started using AML. Once you're familiar with
|
You should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
|
||||||
some of the capabilities, explore the repository for specific topics.
|
|
||||||
|
|
||||||
- [Configuration](./configuration.ipynb) configures your notebook library to easily connect to an
|
If you want to...
|
||||||
Azure Machine Learning workspace, and sets up your workspace to be used by many of the other examples. You should
|
|
||||||
always run this first when setting up a notebook library on a new machine or in a new environment
|
* ...try out and explore Azure ML, start with image classification tutorials [part 1 training](./tutorials/img-classification-part1-training.ipynb) and [part 2 deployment](./tutorials/img-classification-part2-deploy.ipynb).
|
||||||
- [Train in notebook](./how-to-use-azureml/training/train-within-notebook) shows how to create a model directly in a notebook while recording
|
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
|
||||||
metrics and deploy that model to a test service
|
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
|
||||||
- [Train on remote](./how-to-use-azureml/training/train-on-remote-vm) takes the previous example and shows how to create the model on a cloud compute target
|
* ...deploy model as realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
|
||||||
- [Production deploy to AKS](./how-to-use-azureml/deployment/production-deploy-to-aks) shows how to create a production grade inferencing webservice
|
* ...deploy models as batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-mpi-batch-prediction.ipynb).
|
||||||
|
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
|
||||||
|
|
||||||
## Tutorials
|
## Tutorials
|
||||||
|
|
||||||
The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs)
|
The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs)
|
||||||
|
|
||||||
## How to use AML
|
## How to use Azure ML
|
||||||
|
|
||||||
The [How to use AML](./how-to-use-azureml) folder contains specific examples demonstrating the features of the Azure Machine Learning SDK
|
The [How to use Azure ML](./how-to-use-azureml) folder contains specific examples demonstrating the features of the Azure Machine Learning SDK
|
||||||
|
|
||||||
- [Training](./how-to-use-azureml/training) - Examples of how to build models using Azure ML's logging and execution capabilities on local and remote compute targets.
|
- [Training](./how-to-use-azureml/training) - Examples of how to build models using Azure ML's logging and execution capabilities on local and remote compute targets.
|
||||||
- [Training with Deep Learning](./how-to-use-azureml/training-with-deep-learning) - Examples demonstrating how to build deep learning models using estimators and parameter sweeps
|
- [Training with Deep Learning](./how-to-use-azureml/training-with-deep-learning) - Examples demonstrating how to build deep learning models using estimators and parameter sweeps
|
||||||
@@ -38,3 +36,21 @@ The [How to use AML](./how-to-use-azureml) folder contains specific examples dem
|
|||||||
- [Machine Learning Pipelines](./how-to-use-azureml/machine-learning-pipelines) - Examples showing how to create and use reusable pipelines for training and batch scoring
|
- [Machine Learning Pipelines](./how-to-use-azureml/machine-learning-pipelines) - Examples showing how to create and use reusable pipelines for training and batch scoring
|
||||||
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
|
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
|
||||||
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
|
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
|
||||||
|
|
||||||
|
---
|
||||||
|
## Documentation
|
||||||
|
|
||||||
|
* Quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
|
||||||
|
|
||||||
|
* [Python SDK reference]( https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py)
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Projects using Azure Machine Learning
|
||||||
|
|
||||||
|
Visit following repos to see projects contributed by Azure ML users:
|
||||||
|
|
||||||
|
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
||||||
|
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
||||||
|
|
||||||
@@ -23,6 +23,10 @@ if errorlevel 1 goto ErrorExit
|
|||||||
|
|
||||||
call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
|
call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
|
||||||
|
|
||||||
|
REM azureml.widgets is now installed as part of the pip install under the conda env.
|
||||||
|
REM Removing the old user install so that the notebooks will use the latest widget.
|
||||||
|
call jupyter nbextension uninstall --user --py azureml.widgets
|
||||||
|
|
||||||
echo.
|
echo.
|
||||||
echo.
|
echo.
|
||||||
echo ***************************************
|
echo ***************************************
|
||||||
|
|||||||
@@ -22,11 +22,13 @@ fi
|
|||||||
if source activate $CONDA_ENV_NAME 2> /dev/null
|
if source activate $CONDA_ENV_NAME 2> /dev/null
|
||||||
then
|
then
|
||||||
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
||||||
pip install --upgrade azureml-sdk[automl,notebooks,explain]
|
pip install --upgrade azureml-sdk[automl,notebooks,explain] &&
|
||||||
|
jupyter nbextension uninstall --user --py azureml.widgets
|
||||||
else
|
else
|
||||||
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
||||||
source activate $CONDA_ENV_NAME &&
|
source activate $CONDA_ENV_NAME &&
|
||||||
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
|
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
|
||||||
|
jupyter nbextension uninstall --user --py azureml.widgets &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "***************************************" &&
|
echo "***************************************" &&
|
||||||
|
|||||||
@@ -22,13 +22,15 @@ fi
|
|||||||
if source activate $CONDA_ENV_NAME 2> /dev/null
|
if source activate $CONDA_ENV_NAME 2> /dev/null
|
||||||
then
|
then
|
||||||
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
||||||
pip install --upgrade azureml-sdk[automl,notebooks,explain]
|
pip install --upgrade azureml-sdk[automl,notebooks,explain] &&
|
||||||
|
jupyter nbextension uninstall --user --py azureml.widgets
|
||||||
else
|
else
|
||||||
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
||||||
source activate $CONDA_ENV_NAME &&
|
source activate $CONDA_ENV_NAME &&
|
||||||
conda install lightgbm -c conda-forge -y &&
|
conda install lightgbm -c conda-forge -y &&
|
||||||
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
|
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
|
||||||
pip install numpy==1.15.3
|
jupyter nbextension uninstall --user --py azureml.widgets &&
|
||||||
|
pip install numpy==1.15.3 &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "***************************************" &&
|
echo "***************************************" &&
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -1,403 +1,398 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Classification using whitelist models**_\n",
|
"_**Classification using whitelist models**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"This notebooks shows how can automl can be trained on a a selected list of models,see the readme.md for the models.\n",
|
"This notebooks shows how can automl can be trained on a a selected list of models,see the readme.md for the models.\n",
|
||||||
"This trains the model exclusively on tensorflow based models.\n",
|
"This trains the model exclusively on tensorflow based models.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to:\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"3. Train the model on a whilelisted models using local compute. \n",
|
"3. Train the model on a whilelisted models using local compute. \n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"\n",
|
||||||
"import random\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"\n",
|
"import numpy as np\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"import pandas as pd\n",
|
||||||
"from matplotlib.pyplot import imshow\n",
|
"from sklearn import datasets\n",
|
||||||
"import numpy as np\n",
|
"\n",
|
||||||
"import pandas as pd\n",
|
"import azureml.core\n",
|
||||||
"from sklearn import datasets\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import azureml.core\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"from azureml.core.experiment import Experiment\n",
|
]
|
||||||
"from azureml.core.workspace import Workspace\n",
|
},
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
{
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"ws = Workspace.from_config()\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"source": [
|
"experiment_name = 'automl-local-whitelist'\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"project_folder = './sample_projects/automl-local-whitelist'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"experiment_name = 'automl-local-whitelist'\n",
|
"\n",
|
||||||
"project_folder = './sample_projects/automl-local-whitelist'\n",
|
"output = {}\n",
|
||||||
"\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"output = {}\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Location'] = ws.location\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"outputDf.T"
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
]
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
},
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"outputs": [],
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
"source": [
|
]
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
},
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"## Data\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
"## Data\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"digits = datasets.load_digits()\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"from sklearn import datasets\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"\n",
|
"y_train = digits.target[100:]"
|
||||||
"digits = datasets.load_digits()\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
{
|
||||||
"X_train = digits.data[100:,:]\n",
|
"cell_type": "markdown",
|
||||||
"y_train = digits.target[100:]"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Train\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"|Property|Description|\n",
|
||||||
"## Train\n",
|
"|-|-|\n",
|
||||||
"\n",
|
"|**task**|classification or regression|\n",
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
"|Property|Description|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|-|-|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
]
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
},
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
{
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"cell_type": "code",
|
||||||
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"execution_count": null,
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"metadata": {},
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"outputs": [],
|
" iteration_timeout_minutes = 60,\n",
|
||||||
"source": [
|
" iterations = 10,\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
" n_cross_validations = 3,\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" X = X_train, \n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
" y = y_train,\n",
|
||||||
" iterations = 10,\n",
|
" enable_tf=True,\n",
|
||||||
" n_cross_validations = 3,\n",
|
" whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"],\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" path = project_folder)"
|
||||||
" X = X_train, \n",
|
]
|
||||||
" y = y_train,\n",
|
},
|
||||||
" enable_tf=True,\n",
|
{
|
||||||
" whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"],\n",
|
"cell_type": "markdown",
|
||||||
" path = project_folder)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
{
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"cell_type": "code",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "code",
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"local_run"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"local_run"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Results"
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"## Results"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Widget for Monitoring Runs\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"#### Widget for Monitoring Runs\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"execution_count": null,
|
"RunDetails(local_run).show() "
|
||||||
"metadata": {},
|
]
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"from azureml.widgets import RunDetails\n",
|
"cell_type": "markdown",
|
||||||
"RunDetails(local_run).show() "
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"\n",
|
||||||
{
|
"#### Retrieve All Child Runs\n",
|
||||||
"cell_type": "markdown",
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Retrieve All Child Runs\n",
|
"cell_type": "code",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"children = list(local_run.get_children())\n",
|
||||||
"execution_count": null,
|
"metricslist = {}\n",
|
||||||
"metadata": {},
|
"for run in children:\n",
|
||||||
"outputs": [],
|
" properties = run.get_properties()\n",
|
||||||
"source": [
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
"children = list(local_run.get_children())\n",
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
"metricslist = {}\n",
|
"\n",
|
||||||
"for run in children:\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
" properties = run.get_properties()\n",
|
"rundata"
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
]
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"cell_type": "markdown",
|
||||||
"rundata"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"### Retrieve the Best Model\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"### Retrieve the Best Model\n",
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"execution_count": null,
|
"print(best_run)\n",
|
||||||
"metadata": {},
|
"print(fitted_model)"
|
||||||
"outputs": [],
|
]
|
||||||
"source": [
|
},
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
{
|
||||||
"print(best_run)\n",
|
"cell_type": "markdown",
|
||||||
"print(fitted_model)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
{
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"cell_type": "code",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"execution_count": null,
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"metadata": {},
|
"print(best_run)\n",
|
||||||
"outputs": [],
|
"print(fitted_model)"
|
||||||
"source": [
|
]
|
||||||
"lookup_metric = \"log_loss\"\n",
|
},
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
{
|
||||||
"print(best_run)\n",
|
"cell_type": "markdown",
|
||||||
"print(fitted_model)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Model from a Specific Iteration\n",
|
||||||
{
|
"Show the run and the model from the third iteration:"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"#### Model from a Specific Iteration\n",
|
"cell_type": "code",
|
||||||
"Show the run and the model from the third iteration:"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"iteration = 3\n",
|
||||||
"execution_count": null,
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"metadata": {},
|
"print(third_run)\n",
|
||||||
"outputs": [],
|
"print(third_model)"
|
||||||
"source": [
|
]
|
||||||
"iteration = 3\n",
|
},
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
{
|
||||||
"print(third_run)\n",
|
"cell_type": "markdown",
|
||||||
"print(third_model)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Test\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"#### Load Test Data"
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"## Test\n",
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"#### Load Test Data"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"digits = datasets.load_digits()\n",
|
||||||
"execution_count": null,
|
"X_test = digits.data[:10, :]\n",
|
||||||
"metadata": {},
|
"y_test = digits.target[:10]\n",
|
||||||
"outputs": [],
|
"images = digits.images[:10]"
|
||||||
"source": [
|
]
|
||||||
"digits = datasets.load_digits()\n",
|
},
|
||||||
"X_test = digits.data[:10, :]\n",
|
{
|
||||||
"y_test = digits.target[:10]\n",
|
"cell_type": "markdown",
|
||||||
"images = digits.images[:10]"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Testing Our Best Fitted Model\n",
|
||||||
{
|
"We will try to predict 2 digits and see how our model works."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"cell_type": "code",
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"# Randomly select digits and test.\n",
|
||||||
"execution_count": null,
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
"metadata": {},
|
" print(index)\n",
|
||||||
"outputs": [],
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
"source": [
|
" label = y_test[index]\n",
|
||||||
"# Randomly select digits and test.\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
" fig = plt.figure(1, figsize = (3,3))\n",
|
||||||
" print(index)\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
" ax1.set_title(title)\n",
|
||||||
" label = y_test[index]\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" plt.show()"
|
||||||
" fig = plt.figure(1, figsize = (3,3))\n",
|
]
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
}
|
||||||
" ax1.set_title(title)\n",
|
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
|
||||||
" plt.show()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,418 +1,413 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Classification with Local Compute**_\n",
|
"_**Classification with Local Compute**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to:\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"3. Train the model using local compute.\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"\n",
|
||||||
"import random\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"\n",
|
"import numpy as np\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"import pandas as pd\n",
|
||||||
"from matplotlib.pyplot import imshow\n",
|
"from sklearn import datasets\n",
|
||||||
"import numpy as np\n",
|
"\n",
|
||||||
"import pandas as pd\n",
|
"import azureml.core\n",
|
||||||
"from sklearn import datasets\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import azureml.core\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"from azureml.core.experiment import Experiment\n",
|
]
|
||||||
"from azureml.core.workspace import Workspace\n",
|
},
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
{
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"ws = Workspace.from_config()\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"source": [
|
"experiment_name = 'automl-local-classification'\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"project_folder = './sample_projects/automl-local-classification'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"experiment_name = 'automl-local-classification'\n",
|
"\n",
|
||||||
"project_folder = './sample_projects/automl-local-classification'\n",
|
"output = {}\n",
|
||||||
"\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"output = {}\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Location'] = ws.location\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"outputDf.T"
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
]
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
},
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"outputs": [],
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
"source": [
|
]
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
},
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"## Data\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
"## Data\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"digits = datasets.load_digits()\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"from sklearn import datasets\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"\n",
|
"y_train = digits.target[100:]"
|
||||||
"digits = datasets.load_digits()\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
{
|
||||||
"X_train = digits.data[100:,:]\n",
|
"cell_type": "markdown",
|
||||||
"y_train = digits.target[100:]"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Train\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"|Property|Description|\n",
|
||||||
"## Train\n",
|
"|-|-|\n",
|
||||||
"\n",
|
"|**task**|classification or regression|\n",
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
"|Property|Description|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|-|-|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
]
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
},
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
{
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
"cell_type": "code",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"execution_count": null,
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"metadata": {},
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"outputs": [],
|
" iteration_timeout_minutes = 60,\n",
|
||||||
"source": [
|
" iterations = 25,\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
" n_cross_validations = 3,\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" X = X_train, \n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
" y = y_train,\n",
|
||||||
" iterations = 25,\n",
|
" path = project_folder)"
|
||||||
" n_cross_validations = 3,\n",
|
]
|
||||||
" verbosity = logging.INFO,\n",
|
},
|
||||||
" X = X_train, \n",
|
{
|
||||||
" y = y_train,\n",
|
"cell_type": "markdown",
|
||||||
" path = project_folder)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
{
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"cell_type": "code",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "code",
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"local_run"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"local_run"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"cell_type": "code",
|
||||||
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"local_run = local_run.continue_experiment(X = X_train, \n",
|
||||||
"execution_count": null,
|
" y = y_train, \n",
|
||||||
"metadata": {},
|
" show_output = True,\n",
|
||||||
"outputs": [],
|
" iterations = 5)"
|
||||||
"source": [
|
]
|
||||||
"local_run = local_run.continue_experiment(X = X_train, \n",
|
},
|
||||||
" y = y_train, \n",
|
{
|
||||||
" show_output = True,\n",
|
"cell_type": "markdown",
|
||||||
" iterations = 5)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Results"
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"## Results"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Widget for Monitoring Runs\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"#### Widget for Monitoring Runs\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"execution_count": null,
|
"RunDetails(local_run).show() "
|
||||||
"metadata": {},
|
]
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"from azureml.widgets import RunDetails\n",
|
"cell_type": "markdown",
|
||||||
"RunDetails(local_run).show() "
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"\n",
|
||||||
{
|
"#### Retrieve All Child Runs\n",
|
||||||
"cell_type": "markdown",
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Retrieve All Child Runs\n",
|
"cell_type": "code",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"children = list(local_run.get_children())\n",
|
||||||
"execution_count": null,
|
"metricslist = {}\n",
|
||||||
"metadata": {},
|
"for run in children:\n",
|
||||||
"outputs": [],
|
" properties = run.get_properties()\n",
|
||||||
"source": [
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
"children = list(local_run.get_children())\n",
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
"metricslist = {}\n",
|
"\n",
|
||||||
"for run in children:\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
" properties = run.get_properties()\n",
|
"rundata"
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
]
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"cell_type": "markdown",
|
||||||
"rundata"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"### Retrieve the Best Model\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"### Retrieve the Best Model\n",
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"execution_count": null,
|
"print(best_run)\n",
|
||||||
"metadata": {},
|
"print(fitted_model)"
|
||||||
"outputs": [],
|
]
|
||||||
"source": [
|
},
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
{
|
||||||
"print(best_run)\n",
|
"cell_type": "markdown",
|
||||||
"print(fitted_model)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
{
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"cell_type": "code",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"execution_count": null,
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"metadata": {},
|
"print(best_run)\n",
|
||||||
"outputs": [],
|
"print(fitted_model)"
|
||||||
"source": [
|
]
|
||||||
"lookup_metric = \"log_loss\"\n",
|
},
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
{
|
||||||
"print(best_run)\n",
|
"cell_type": "markdown",
|
||||||
"print(fitted_model)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Model from a Specific Iteration\n",
|
||||||
{
|
"Show the run and the model from the third iteration:"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"#### Model from a Specific Iteration\n",
|
"cell_type": "code",
|
||||||
"Show the run and the model from the third iteration:"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"iteration = 3\n",
|
||||||
"execution_count": null,
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"metadata": {},
|
"print(third_run)\n",
|
||||||
"outputs": [],
|
"print(third_model)"
|
||||||
"source": [
|
]
|
||||||
"iteration = 3\n",
|
},
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
{
|
||||||
"print(third_run)\n",
|
"cell_type": "markdown",
|
||||||
"print(third_model)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Test \n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"#### Load Test Data"
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"## Test \n",
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"#### Load Test Data"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"digits = datasets.load_digits()\n",
|
||||||
"execution_count": null,
|
"X_test = digits.data[:10, :]\n",
|
||||||
"metadata": {},
|
"y_test = digits.target[:10]\n",
|
||||||
"outputs": [],
|
"images = digits.images[:10]"
|
||||||
"source": [
|
]
|
||||||
"digits = datasets.load_digits()\n",
|
},
|
||||||
"X_test = digits.data[:10, :]\n",
|
{
|
||||||
"y_test = digits.target[:10]\n",
|
"cell_type": "markdown",
|
||||||
"images = digits.images[:10]"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Testing Our Best Fitted Model\n",
|
||||||
{
|
"We will try to predict 2 digits and see how our model works."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"cell_type": "code",
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"# Randomly select digits and test.\n",
|
||||||
"execution_count": null,
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
"metadata": {},
|
" print(index)\n",
|
||||||
"outputs": [],
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
"source": [
|
" label = y_test[index]\n",
|
||||||
"# Randomly select digits and test.\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
" fig = plt.figure(1, figsize = (3,3))\n",
|
||||||
" print(index)\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
" ax1.set_title(title)\n",
|
||||||
" label = y_test[index]\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" plt.show()"
|
||||||
" fig = plt.figure(1, figsize = (3,3))\n",
|
]
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
}
|
||||||
" ax1.set_title(title)\n",
|
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
|
||||||
" plt.show()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,154 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Automated Machine Learning Configuration\n",
|
|
||||||
"\n",
|
|
||||||
"In this example you will create an Azure Machine Learning `Workspace` object and initialize your notebook directory to easily reload this object from a configuration file. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Check the Azure ML Core SDK Version to Validate Your Installation"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import azureml.core\n",
|
|
||||||
"\n",
|
|
||||||
"print(\"SDK Version:\", azureml.core.VERSION)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Initialize an Azure ML Workspace\n",
|
|
||||||
"### What is an Azure ML Workspace and Why Do I Need One?\n",
|
|
||||||
"\n",
|
|
||||||
"An Azure ML workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"### What do I Need?\n",
|
|
||||||
"\n",
|
|
||||||
"To create or access an Azure ML workspace, you will need to import the Azure ML library and specify following information:\n",
|
|
||||||
"* A name for your workspace. You can choose one.\n",
|
|
||||||
"* Your subscription id. Use the `id` value from the `az account show` command output above.\n",
|
|
||||||
"* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n",
|
|
||||||
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"subscription_id = \"<subscription_id>\"\n",
|
|
||||||
"resource_group = \"myrg\"\n",
|
|
||||||
"workspace_name = \"myws\"\n",
|
|
||||||
"workspace_region = \"eastus2\""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Creating a Workspace\n",
|
|
||||||
"If you already have access to an Azure ML workspace you want to use, you can skip this cell. Otherwise, this cell will create an Azure ML workspace for you in the specified subscription, provided you have the correct permissions for the given `subscription_id`.\n",
|
|
||||||
"\n",
|
|
||||||
"This will fail when:\n",
|
|
||||||
"1. The workspace already exists.\n",
|
|
||||||
"2. You do not have permission to create a workspace in the resource group.\n",
|
|
||||||
"3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription.\n",
|
|
||||||
"\n",
|
|
||||||
"If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n",
|
|
||||||
"\n",
|
|
||||||
"**Note:** Creation of a new workspace can take several minutes."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"ws = Workspace.create(name = workspace_name,\n",
|
|
||||||
" subscription_id = subscription_id,\n",
|
|
||||||
" resource_group = resource_group, \n",
|
|
||||||
" location = workspace_region)\n",
|
|
||||||
"ws.get_details()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Configuring Your Local Environment\n",
|
|
||||||
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"ws = Workspace(workspace_name = workspace_name,\n",
|
|
||||||
" subscription_id = subscription_id,\n",
|
|
||||||
" resource_group = resource_group)\n",
|
|
||||||
"\n",
|
|
||||||
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
|
||||||
"ws.write_config()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -1,469 +1,466 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Prepare Data using `azureml.dataprep` for Local Execution**_\n",
|
"_**Prepare Data using `azureml.dataprep` for Local Execution**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
|
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to:\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
||||||
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
||||||
"3. Pass the `Dataflow` to AutoML for a remote run."
|
"3. Pass the `Dataflow` to AutoML for a remote run."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"\n",
|
||||||
"\n",
|
"import pandas as pd\n",
|
||||||
"import pandas as pd\n",
|
"\n",
|
||||||
"\n",
|
"import azureml.core\n",
|
||||||
"import azureml.core\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"import azureml.dataprep as dprep\n",
|
||||||
"import azureml.dataprep as dprep\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"ws = Workspace.from_config()\n",
|
||||||
"ws = Workspace.from_config()\n",
|
" \n",
|
||||||
" \n",
|
"# choose a name for experiment\n",
|
||||||
"# choose a name for experiment\n",
|
"experiment_name = 'automl-dataprep-local'\n",
|
||||||
"experiment_name = 'automl-dataprep-local'\n",
|
"# project folder\n",
|
||||||
"# project folder\n",
|
"project_folder = './sample_projects/automl-dataprep-local'\n",
|
||||||
"project_folder = './sample_projects/automl-dataprep-local'\n",
|
" \n",
|
||||||
" \n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
" \n",
|
||||||
" \n",
|
"output = {}\n",
|
||||||
"output = {}\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Location'] = ws.location\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
"outputDf.T"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
||||||
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
|
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
|
||||||
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
|
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
|
||||||
"X = dprep.auto_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
|
"X = dprep.auto_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
|
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
|
||||||
"# and convert column types manually.\n",
|
"# and convert column types manually.\n",
|
||||||
"# Here we read a comma delimited file and convert all columns to integers.\n",
|
"# Here we read a comma delimited file and convert all columns to integers.\n",
|
||||||
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Review the Data Preparation Result\n",
|
"### Review the Data Preparation Result\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
|
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"X.skip(1).head(5)"
|
"X.skip(1).head(5)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This creates a general AutoML settings object applicable for both local and remote runs."
|
"This creates a general AutoML settings object applicable for both local and remote runs."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\" : 10,\n",
|
" \"iteration_timeout_minutes\" : 10,\n",
|
||||||
" \"iterations\" : 2,\n",
|
" \"iterations\" : 2,\n",
|
||||||
" \"primary_metric\" : 'AUC_weighted',\n",
|
" \"primary_metric\" : 'AUC_weighted',\n",
|
||||||
" \"preprocess\" : False,\n",
|
" \"preprocess\" : False,\n",
|
||||||
" \"verbosity\" : logging.INFO,\n",
|
" \"verbosity\" : logging.INFO,\n",
|
||||||
" \"n_cross_validations\": 3\n",
|
" \"n_cross_validations\": 3\n",
|
||||||
"}"
|
"}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pass Data with `Dataflow` Objects\n",
|
"### Pass Data with `Dataflow` Objects\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
" X = X,\n",
|
" X = X,\n",
|
||||||
" y = y,\n",
|
" y = y,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show()"
|
"RunDetails(local_run).show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
"for run in children:\n",
|
"for run in children:\n",
|
||||||
" properties = run.get_properties()\n",
|
" properties = run.get_properties()\n",
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
" \n",
|
" \n",
|
||||||
"import pandas as pd\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata"
|
||||||
"rundata"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"### Retrieve the Best Model\n",
|
||||||
"### Retrieve the Best Model\n",
|
"\n",
|
||||||
"\n",
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"print(best_run)\n",
|
||||||
"print(best_run)\n",
|
"print(fitted_model)"
|
||||||
"print(fitted_model)"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"print(best_run)\n",
|
||||||
"print(best_run)\n",
|
"print(fitted_model)"
|
||||||
"print(fitted_model)"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"#### Model from a Specific Iteration\n",
|
||||||
"#### Model from a Specific Iteration\n",
|
"Show the run and the model from the first iteration:"
|
||||||
"Show the run and the model from the first iteration:"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"iteration = 0\n",
|
||||||
"iteration = 0\n",
|
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
|
"print(best_run)\n",
|
||||||
"print(best_run)\n",
|
"print(fitted_model)"
|
||||||
"print(fitted_model)"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"## Test\n",
|
||||||
"## Test\n",
|
"\n",
|
||||||
"\n",
|
"#### Load Test Data"
|
||||||
"#### Load Test Data"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"from sklearn import datasets\n",
|
||||||
"from sklearn import datasets\n",
|
"\n",
|
||||||
"\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"y_test = digits.target[:10]\n",
|
"images = digits.images[:10]"
|
||||||
"images = digits.images[:10]"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"We will try to predict 2 digits and see how our model works."
|
||||||
"We will try to predict 2 digits and see how our model works."
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"#Randomly select digits and test\n",
|
||||||
"#Randomly select digits and test\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"import numpy as np\n",
|
||||||
"from matplotlib.pyplot import imshow\n",
|
"\n",
|
||||||
"import random\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
"import numpy as np\n",
|
" print(index)\n",
|
||||||
"\n",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
" label = y_test[index]\n",
|
||||||
" print(index)\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
" label = y_test[index]\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" ax1.set_title(title)\n",
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
" plt.show()"
|
||||||
" ax1.set_title(title)\n",
|
]
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
},
|
||||||
" plt.show()"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"## Appendix"
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"## Appendix"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
||||||
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"# sklearn.digits.data + target\n",
|
||||||
"outputs": [],
|
"digits_complete = dprep.auto_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
|
||||||
"source": [
|
]
|
||||||
"# sklearn.digits.data + target\n",
|
},
|
||||||
"digits_complete = dprep.auto_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"print(digits_complete.to_pandas_dataframe().shape)\n",
|
||||||
"outputs": [],
|
"labels_column = 'Column64'\n",
|
||||||
"source": [
|
"dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
|
||||||
"digits_complete.to_pandas_dataframe().shape\n",
|
"dflow_y = digits_complete.keep_columns(columns = [labels_column])"
|
||||||
"labels_column = 'Column64'\n",
|
]
|
||||||
"dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
|
}
|
||||||
"dflow_y = digits_complete.keep_columns(columns = [labels_column])"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,370 +1,359 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Exploring Previous Runs**_\n",
|
"_**Exploring Previous Runs**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Explore](#Explore)\n",
|
"1. [Explore](#Explore)\n",
|
||||||
"1. [Download](#Download)\n",
|
"1. [Download](#Download)\n",
|
||||||
"1. [Register](#Register)"
|
"1. [Register](#Register)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n",
|
"In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to:\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. List all experiments in a workspace.\n",
|
"1. List all experiments in a workspace.\n",
|
||||||
"2. List all AutoML runs in an experiment.\n",
|
"2. List all AutoML runs in an experiment.\n",
|
||||||
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
||||||
"4. Download a fitted pipeline for any iteration."
|
"4. Download a fitted pipeline for any iteration."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup"
|
"## Setup"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import pandas as pd\n",
|
||||||
"import os\n",
|
"import json\n",
|
||||||
"import random\n",
|
"\n",
|
||||||
"import re\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
"from matplotlib.pyplot import imshow\n",
|
]
|
||||||
"import numpy as np\n",
|
},
|
||||||
"import pandas as pd\n",
|
{
|
||||||
"from sklearn import datasets\n",
|
"cell_type": "code",
|
||||||
"\n",
|
"execution_count": null,
|
||||||
"import azureml.core\n",
|
"metadata": {},
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"outputs": [],
|
||||||
"from azureml.core.run import Run\n",
|
"source": [
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"ws = Workspace.from_config()"
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
]
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
"metadata": {},
|
]
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"ws = Workspace.from_config()"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"source": [
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"## Explore"
|
||||||
"outputs": [],
|
]
|
||||||
"source": [
|
},
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
{
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"### List Experiments"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Explore"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"experiment_list = Experiment.list(workspace=ws)\n",
|
||||||
"source": [
|
"\n",
|
||||||
"### List Experiments"
|
"summary_df = pd.DataFrame(index = ['No of Runs'])\n",
|
||||||
]
|
"for experiment in experiment_list:\n",
|
||||||
},
|
" automl_runs = list(experiment.get_runs(type='automl'))\n",
|
||||||
{
|
" summary_df[experiment.name] = [len(automl_runs)]\n",
|
||||||
"cell_type": "code",
|
" \n",
|
||||||
"execution_count": null,
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"metadata": {},
|
"summary_df.T"
|
||||||
"outputs": [],
|
]
|
||||||
"source": [
|
},
|
||||||
"experiment_list = Experiment.list(workspace=ws)\n",
|
{
|
||||||
"\n",
|
"cell_type": "markdown",
|
||||||
"summary_df = pd.DataFrame(index = ['No of Runs'])\n",
|
"metadata": {},
|
||||||
"for experiment in experiment_list:\n",
|
"source": [
|
||||||
" automl_runs = list(experiment.get_runs(type='automl'))\n",
|
"### List runs for an experiment\n",
|
||||||
" summary_df[experiment.name] = [len(automl_runs)]\n",
|
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
||||||
" \n",
|
]
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
},
|
||||||
"summary_df.T"
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
||||||
"### List runs for an experiment\n",
|
"\n",
|
||||||
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
"proj = ws.experiments[experiment_name]\n",
|
||||||
]
|
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n",
|
||||||
},
|
"automl_runs = list(proj.get_runs(type='automl'))\n",
|
||||||
{
|
"automl_runs_project = []\n",
|
||||||
"cell_type": "code",
|
"for run in automl_runs:\n",
|
||||||
"execution_count": null,
|
" properties = run.get_properties()\n",
|
||||||
"metadata": {},
|
" tags = run.get_tags()\n",
|
||||||
"outputs": [],
|
" amlsettings = json.loads(properties['AMLSettingsJsonString'])\n",
|
||||||
"source": [
|
" if 'iterations' in tags:\n",
|
||||||
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
" iterations = tags['iterations']\n",
|
||||||
"\n",
|
" else:\n",
|
||||||
"proj = ws.experiments[experiment_name]\n",
|
" iterations = properties['num_iterations']\n",
|
||||||
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n",
|
" summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]\n",
|
||||||
"automl_runs = list(proj.get_runs(type='automl'))\n",
|
" if run.get_details()['status'] == 'Completed':\n",
|
||||||
"automl_runs_project = []\n",
|
" automl_runs_project.append(run.id)\n",
|
||||||
"for run in automl_runs:\n",
|
" \n",
|
||||||
" properties = run.get_properties()\n",
|
"from IPython.display import HTML\n",
|
||||||
" tags = run.get_tags()\n",
|
"projname_html = HTML(\"<h3>{}</h3>\".format(proj.name))\n",
|
||||||
" amlsettings = eval(properties['RawAMLSettingsString'])\n",
|
"\n",
|
||||||
" if 'iterations' in tags:\n",
|
"from IPython.display import display\n",
|
||||||
" iterations = tags['iterations']\n",
|
"display(projname_html)\n",
|
||||||
" else:\n",
|
"display(summary_df.T)"
|
||||||
" iterations = properties['num_iterations']\n",
|
]
|
||||||
" summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]\n",
|
},
|
||||||
" if run.get_details()['status'] == 'Completed':\n",
|
{
|
||||||
" automl_runs_project.append(run.id)\n",
|
"cell_type": "markdown",
|
||||||
" \n",
|
"metadata": {},
|
||||||
"from IPython.display import HTML\n",
|
"source": [
|
||||||
"projname_html = HTML(\"<h3>{}</h3>\".format(proj.name))\n",
|
"### Get details for a run\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from IPython.display import display\n",
|
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
||||||
"display(projname_html)\n",
|
]
|
||||||
"display(summary_df.T)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"### Get details for a run\n",
|
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
|
||||||
"\n",
|
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
|
||||||
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
"\n",
|
||||||
]
|
"from azureml.widgets import RunDetails\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"cell_type": "code",
|
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name', 'Start Time', 'End Time'])\n",
|
||||||
"outputs": [],
|
"properties = ml_run.get_properties()\n",
|
||||||
"source": [
|
"tags = ml_run.get_tags()\n",
|
||||||
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
|
"status = ml_run.get_details()\n",
|
||||||
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
|
"amlsettings = json.loads(properties['AMLSettingsJsonString'])\n",
|
||||||
"\n",
|
"if 'iterations' in tags:\n",
|
||||||
"from azureml.widgets import RunDetails\n",
|
" iterations = tags['iterations']\n",
|
||||||
"\n",
|
"else:\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
" iterations = properties['num_iterations']\n",
|
||||||
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)\n",
|
"start_time = None\n",
|
||||||
"\n",
|
"if 'startTimeUtc' in status:\n",
|
||||||
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name', 'Start Time', 'End Time'])\n",
|
" start_time = status['startTimeUtc']\n",
|
||||||
"properties = ml_run.get_properties()\n",
|
"end_time = None\n",
|
||||||
"tags = ml_run.get_tags()\n",
|
"if 'endTimeUtc' in status:\n",
|
||||||
"status = ml_run.get_details()\n",
|
" end_time = status['endTimeUtc']\n",
|
||||||
"amlsettings = eval(properties['RawAMLSettingsString'])\n",
|
"summary_df[ml_run.id] = [amlsettings['task_type'], status['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name'], start_time, end_time]\n",
|
||||||
"if 'iterations' in tags:\n",
|
"display(HTML('<h3>Runtime Details</h3>'))\n",
|
||||||
" iterations = tags['iterations']\n",
|
"display(summary_df)\n",
|
||||||
"else:\n",
|
"\n",
|
||||||
" iterations = properties['num_iterations']\n",
|
"#settings_df = pd.DataFrame(data = amlsettings, index = [''])\n",
|
||||||
"start_time = None\n",
|
"display(HTML('<h3>AutoML Settings</h3>'))\n",
|
||||||
"if 'startTimeUtc' in status:\n",
|
"display(amlsettings)\n",
|
||||||
" start_time = status['startTimeUtc']\n",
|
"\n",
|
||||||
"end_time = None\n",
|
"display(HTML('<h3>Iterations</h3>'))\n",
|
||||||
"if 'endTimeUtc' in status:\n",
|
"RunDetails(ml_run).show() \n",
|
||||||
" end_time = status['endTimeUtc']\n",
|
"\n",
|
||||||
"summary_df[ml_run.id] = [amlsettings['task_type'], status['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name'], start_time, end_time]\n",
|
"children = list(ml_run.get_children())\n",
|
||||||
"display(HTML('<h3>Runtime Details</h3>'))\n",
|
"metricslist = {}\n",
|
||||||
"display(summary_df)\n",
|
"for run in children:\n",
|
||||||
"\n",
|
" properties = run.get_properties()\n",
|
||||||
"#settings_df = pd.DataFrame(data = amlsettings, index = [''])\n",
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
"display(HTML('<h3>AutoML Settings</h3>'))\n",
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
"display(amlsettings)\n",
|
"\n",
|
||||||
"\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"display(HTML('<h3>Iterations</h3>'))\n",
|
"display(HTML('<h3>Metrics</h3>'))\n",
|
||||||
"RunDetails(ml_run).show() \n",
|
"display(rundata)\n"
|
||||||
"\n",
|
]
|
||||||
"children = list(ml_run.get_children())\n",
|
},
|
||||||
"metricslist = {}\n",
|
{
|
||||||
"for run in children:\n",
|
"cell_type": "markdown",
|
||||||
" properties = run.get_properties()\n",
|
"metadata": {},
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
"source": [
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
"## Download"
|
||||||
"\n",
|
]
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
},
|
||||||
"display(HTML('<h3>Metrics</h3>'))\n",
|
{
|
||||||
"display(rundata)\n"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"### Download the Best Model for Any Given Metric"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Download"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"source": [
|
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
||||||
"### Download the Best Model for Any Given Metric"
|
"fitted_model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "markdown",
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"### Download the Model for Any Given Iteration"
|
||||||
"source": [
|
]
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
},
|
||||||
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
{
|
||||||
"fitted_model"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"iteration = 1 # Replace with an iteration number.\n",
|
||||||
"source": [
|
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
||||||
"### Download the Model for Any Given Iteration"
|
"fitted_model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "markdown",
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"## Register"
|
||||||
"source": [
|
]
|
||||||
"iteration = 1 # Replace with an iteration number.\n",
|
},
|
||||||
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
{
|
||||||
"fitted_model"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"### Register fitted model for deployment\n",
|
||||||
"cell_type": "markdown",
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"## Register"
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"description = 'AutoML Model'\n",
|
||||||
"### Register fitted model for deployment\n",
|
"tags = None\n",
|
||||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
"ml_run.register_model(description = description, tags = tags)\n",
|
||||||
]
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
},
|
]
|
||||||
{
|
},
|
||||||
"cell_type": "code",
|
{
|
||||||
"execution_count": null,
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"### Register the Best Model for Any Given Metric"
|
||||||
"description = 'AutoML Model'\n",
|
]
|
||||||
"tags = None\n",
|
},
|
||||||
"ml_run.register_model(description = description, tags = tags)\n",
|
{
|
||||||
"ml_run.model_id # Use this id to deploy the model as a web service in Azure."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"source": [
|
"description = 'AutoML Model'\n",
|
||||||
"### Register the Best Model for Any Given Metric"
|
"tags = None\n",
|
||||||
]
|
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
||||||
},
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
{
|
]
|
||||||
"cell_type": "code",
|
},
|
||||||
"execution_count": null,
|
{
|
||||||
"metadata": {},
|
"cell_type": "markdown",
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
"### Register the Model for Any Given Iteration"
|
||||||
"description = 'AutoML Model'\n",
|
]
|
||||||
"tags = None\n",
|
},
|
||||||
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
{
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"iteration = 1 # Replace with an iteration number.\n",
|
||||||
"source": [
|
"description = 'AutoML Model'\n",
|
||||||
"### Register the Model for Any Given Iteration"
|
"tags = None\n",
|
||||||
]
|
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
||||||
},
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
{
|
]
|
||||||
"cell_type": "code",
|
}
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"iteration = 1 # Replace with an iteration number.\n",
|
|
||||||
"description = 'AutoML Model'\n",
|
|
||||||
"tags = None\n",
|
|
||||||
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,418 +1,376 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Energy Demand Forecasting**_\n",
|
"_**Energy Demand Forecasting**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)"
|
"1. [Train](#Train)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example, we show how AutoML can be used for energy demand forecasting.\n",
|
"In this example, we show how AutoML can be used for energy demand forecasting.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you would see\n",
|
||||||
"1. Creating an Experiment in an existing Workspace\n",
|
"1. Creating an Experiment in an existing Workspace\n",
|
||||||
"2. Instantiating AutoMLConfig with new task type \"forecasting\" for timeseries data training, and other timeseries related settings: for this dataset we use the basic one: \"time_column_name\" \n",
|
"2. Instantiating AutoMLConfig with new task type \"forecasting\" for timeseries data training, and other timeseries related settings: for this dataset we use the basic one: \"time_column_name\" \n",
|
||||||
"3. Training the Model using local compute\n",
|
"3. Training the Model using local compute\n",
|
||||||
"4. Exploring the results\n",
|
"4. Exploring the results\n",
|
||||||
"5. Testing the fitted model"
|
"5. Testing the fitted model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"import os\n",
|
"import logging\n",
|
||||||
"import logging\n",
|
"import warnings\n",
|
||||||
"import warnings\n",
|
"# Squash warning messages for cleaner output in the notebook\n",
|
||||||
"# Squash warning messages for cleaner output in the notebook\n",
|
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
||||||
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun\n",
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score"
|
||||||
"from matplotlib import pyplot as plt\n",
|
]
|
||||||
"from matplotlib.pyplot import imshow\n",
|
},
|
||||||
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score"
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"ws = Workspace.from_config()\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"# choose a name for the run history container in the workspace\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"experiment_name = 'automl-energydemandforecasting'\n",
|
||||||
"\n",
|
"# project folder\n",
|
||||||
"# choose a name for the run history container in the workspace\n",
|
"project_folder = './sample_projects/automl-local-energydemandforecasting'\n",
|
||||||
"experiment_name = 'automl-energydemandforecasting'\n",
|
"\n",
|
||||||
"# project folder\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"project_folder = './sample_projects/automl-local-energydemandforecasting'\n",
|
"\n",
|
||||||
"\n",
|
"output = {}\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"output = {}\n",
|
"output['Workspace'] = ws.name\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Workspace'] = ws.name\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"output['Run History Name'] = experiment_name\n",
|
||||||
"output['Location'] = ws.location\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"output['Run History Name'] = experiment_name\n",
|
"outputDf.T"
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
]
|
||||||
"pd.DataFrame(data=output, index=['']).T"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"## Data\n",
|
||||||
"source": [
|
"Read energy demanding data from file, and preview data."
|
||||||
"## Data\n",
|
]
|
||||||
"Read energy demanding data from file, and preview data."
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"data = pd.read_csv(\"nyc_energy.csv\", parse_dates=['timeStamp'])\n",
|
||||||
"source": [
|
"data.head()"
|
||||||
"data = pd.read_csv(\"nyc_energy.csv\", parse_dates=['timeStamp'])\n",
|
]
|
||||||
"data.head()"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Split the data to train and test\n",
|
||||||
"source": [
|
"\n"
|
||||||
"### Split the data to train and test\n",
|
]
|
||||||
"\n"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"train = data[data['timeStamp'] < '2017-02-01']\n",
|
||||||
"source": [
|
"test = data[data['timeStamp'] >= '2017-02-01']\n"
|
||||||
"train = data[data['timeStamp'] < '2017-02-01']\n",
|
]
|
||||||
"test = data[data['timeStamp'] >= '2017-02-01']\n"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Prepare the test data, we will feed X_test to the fitted model and get prediction"
|
||||||
"source": [
|
]
|
||||||
"### Prepare the test data, we will feed X_test to the fitted model and get prediction"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"y_test = test.pop('demand').values\n",
|
||||||
"source": [
|
"X_test = test"
|
||||||
"y_test = test.pop('demand').values\n",
|
]
|
||||||
"X_test = test"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Split the train data to train and valid\n",
|
||||||
"source": [
|
"\n",
|
||||||
"### Split the train data to train and valid\n",
|
"Use one month's data as valid data\n"
|
||||||
"\n",
|
]
|
||||||
"Use one month's data as valid data\n"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"X_train = train[train['timeStamp'] < '2017-01-01']\n",
|
||||||
"source": [
|
"X_valid = train[train['timeStamp'] >= '2017-01-01']\n",
|
||||||
"X_train = train[train['timeStamp'] < '2017-01-01']\n",
|
"y_train = X_train.pop('demand').values\n",
|
||||||
"X_valid = train[train['timeStamp'] >= '2017-01-01']\n",
|
"y_valid = X_valid.pop('demand').values\n",
|
||||||
"y_train = X_train.pop('demand').values\n",
|
"print(X_train.shape)\n",
|
||||||
"y_valid = X_valid.pop('demand').values\n",
|
"print(y_train.shape)\n",
|
||||||
"print(X_train.shape)\n",
|
"print(X_valid.shape)\n",
|
||||||
"print(y_train.shape)\n",
|
"print(y_valid.shape)"
|
||||||
"print(X_valid.shape)\n",
|
]
|
||||||
"print(y_valid.shape)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"## Train\n",
|
||||||
"source": [
|
"\n",
|
||||||
"## Train\n",
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
"|Property|Description|\n",
|
||||||
"\n",
|
"|-|-|\n",
|
||||||
"|Property|Description|\n",
|
"|**task**|forecasting|\n",
|
||||||
"|-|-|\n",
|
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
||||||
"|**task**|forecasting|\n",
|
"|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n",
|
"|**y_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n",
|
||||||
"|**X_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
||||||
"|**y_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n",
|
]
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"time_column_name = 'timeStamp'\n",
|
||||||
"source": [
|
"automl_settings = {\n",
|
||||||
"time_column_name = 'timeStamp'\n",
|
" \"time_column_name\": time_column_name,\n",
|
||||||
"automl_settings = {\n",
|
"}\n",
|
||||||
" \"time_column_name\": time_column_name,\n",
|
"\n",
|
||||||
"}\n",
|
"\n",
|
||||||
"\n",
|
"automl_config = AutoMLConfig(task = 'forecasting',\n",
|
||||||
"\n",
|
" debug_log = 'automl_nyc_energy_errors.log',\n",
|
||||||
"automl_config = AutoMLConfig(task = 'forecasting',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" debug_log = 'automl_nyc_energy_errors.log',\n",
|
" iterations = 10,\n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" iteration_timeout_minutes = 5,\n",
|
||||||
" iterations = 10,\n",
|
" X = X_train,\n",
|
||||||
" iteration_timeout_minutes = 5,\n",
|
" y = y_train,\n",
|
||||||
" X = X_train,\n",
|
" X_valid = X_valid,\n",
|
||||||
" y = y_train,\n",
|
" y_valid = y_valid,\n",
|
||||||
" X_valid = X_valid,\n",
|
" path=project_folder,\n",
|
||||||
" y_valid = y_valid,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" path=project_folder,\n",
|
" **automl_settings)"
|
||||||
" verbosity = logging.INFO,\n",
|
]
|
||||||
" **automl_settings)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
||||||
"source": [
|
"You will see the currently running iterations printing to the console."
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
]
|
||||||
"You will see the currently running iterations printing to the console."
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
"source": [
|
]
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"local_run"
|
||||||
"source": [
|
]
|
||||||
"local_run"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Retrieve the Best Model\n",
|
||||||
"source": [
|
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
||||||
"### Retrieve the Best Model\n",
|
]
|
||||||
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"source": [
|
"fitted_model.steps"
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
]
|
||||||
"fitted_model.steps"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Test the Best Fitted Model\n",
|
||||||
"source": [
|
"\n",
|
||||||
"### Test the Best Fitted Model\n",
|
"Predict on training and test set, and calculate residual values."
|
||||||
"\n",
|
]
|
||||||
"Predict on training and test set, and calculate residual values."
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
"source": [
|
"y_pred"
|
||||||
"y_pred = fitted_model.predict(X_test)\n",
|
]
|
||||||
"y_pred"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Use the Check Data Function to remove the nan values from y_test to avoid error when calculate metrics "
|
||||||
"source": [
|
]
|
||||||
"### Define a Check Data Function\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"Remove the nan values from y_test to avoid error when calculate metrics "
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"if len(y_test) != len(y_pred):\n",
|
||||||
"metadata": {},
|
" raise ValueError(\n",
|
||||||
"outputs": [],
|
" 'the true values and prediction values do not have equal length.')\n",
|
||||||
"source": [
|
"elif len(y_test) == 0:\n",
|
||||||
"def _check_calc_input(y_true, y_pred, rm_na=True):\n",
|
" raise ValueError(\n",
|
||||||
" \"\"\"\n",
|
" 'y_true and y_pred are empty.')\n",
|
||||||
" Check that 'y_true' and 'y_pred' are non-empty and\n",
|
"\n",
|
||||||
" have equal length.\n",
|
"# if there is any non-numeric element in the y_true or y_pred,\n",
|
||||||
"\n",
|
"# the ValueError exception will be thrown.\n",
|
||||||
" :param y_true: Vector of actual values\n",
|
"y_test_f = np.array(y_test).astype(float)\n",
|
||||||
" :type y_true: array-like\n",
|
"y_pred_f = np.array(y_pred).astype(float)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" :param y_pred: Vector of predicted values\n",
|
"# remove entries both in y_true and y_pred where at least\n",
|
||||||
" :type y_pred: array-like\n",
|
"# one element in y_true or y_pred is missing\n",
|
||||||
"\n",
|
"y_test = y_test_f[~(np.isnan(y_test_f) | np.isnan(y_pred_f))]\n",
|
||||||
" :param rm_na:\n",
|
"y_pred = y_pred_f[~(np.isnan(y_test_f) | np.isnan(y_pred_f))]"
|
||||||
" If rm_na=True, remove entries where y_true=NA and y_pred=NA.\n",
|
]
|
||||||
" :type rm_na: boolean\n",
|
},
|
||||||
"\n",
|
{
|
||||||
" :return:\n",
|
"cell_type": "markdown",
|
||||||
" Tuple (y_true, y_pred). if rm_na=True,\n",
|
"metadata": {},
|
||||||
" the returned vectors may differ from their input values.\n",
|
"source": [
|
||||||
" :rtype: Tuple with 2 entries\n",
|
"### Calculate metrics for the prediction\n"
|
||||||
" \"\"\"\n",
|
]
|
||||||
" if len(y_true) != len(y_pred):\n",
|
},
|
||||||
" raise ValueError(\n",
|
{
|
||||||
" 'the true values and prediction values do not have equal length.')\n",
|
"cell_type": "code",
|
||||||
" elif len(y_true) == 0:\n",
|
"execution_count": null,
|
||||||
" raise ValueError(\n",
|
"metadata": {},
|
||||||
" 'y_true and y_pred are empty.')\n",
|
"outputs": [],
|
||||||
" # if there is any non-numeric element in the y_true or y_pred,\n",
|
"source": [
|
||||||
" # the ValueError exception will be thrown.\n",
|
"print(\"[Test Data] \\nRoot Mean squared error: %.2f\" % np.sqrt(mean_squared_error(y_test, y_pred)))\n",
|
||||||
" y_true = np.array(y_true).astype(float)\n",
|
"# Explained variance score: 1 is perfect prediction\n",
|
||||||
" y_pred = np.array(y_pred).astype(float)\n",
|
"print('mean_absolute_error score: %.2f' % mean_absolute_error(y_test, y_pred))\n",
|
||||||
" if rm_na:\n",
|
"print('R2 score: %.2f' % r2_score(y_test, y_pred))\n",
|
||||||
" # remove entries both in y_true and y_pred where at least\n",
|
"\n",
|
||||||
" # one element in y_true or y_pred is missing\n",
|
"\n",
|
||||||
" y_true_rm_na = y_true[~(np.isnan(y_true) | np.isnan(y_pred))]\n",
|
"\n",
|
||||||
" y_pred_rm_na = y_pred[~(np.isnan(y_true) | np.isnan(y_pred))]\n",
|
"# Plot outputs\n",
|
||||||
" return (y_true_rm_na, y_pred_rm_na)\n",
|
"%matplotlib notebook\n",
|
||||||
" else:\n",
|
"test_pred = plt.scatter(y_test, y_pred, color='b')\n",
|
||||||
" return y_true, y_pred"
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
]
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
},
|
"plt.show()"
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
}
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Use the Check Data Function to remove the nan values from y_test to avoid error when calculate metrics "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"y_test,y_pred = _check_calc_input(y_test,y_pred)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Calculate metrics for the prediction\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"print(\"[Test Data] \\nRoot Mean squared error: %.2f\" % np.sqrt(mean_squared_error(y_test, y_pred)))\n",
|
|
||||||
"# Explained variance score: 1 is perfect prediction\n",
|
|
||||||
"print('mean_absolute_error score: %.2f' % mean_absolute_error(y_test, y_pred))\n",
|
|
||||||
"print('R2 score: %.2f' % r2_score(y_test, y_pred))\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"# Plot outputs\n",
|
|
||||||
"%matplotlib notebook\n",
|
|
||||||
"test_pred = plt.scatter(y_test, y_pred, color='b')\n",
|
|
||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
|
||||||
"plt.show()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "xiaga"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "xiaga"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,413 +1,412 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Orange Juice Sales Forecasting**_\n",
|
"_**Orange Juice Sales Forecasting**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)"
|
"1. [Train](#Train)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example, we use AutoML to find and tune a time-series forecasting model.\n",
|
"In this example, we use AutoML to find and tune a time-series forecasting model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook, you will:\n",
|
"In this notebook, you will:\n",
|
||||||
"1. Create an Experiment in an existing Workspace\n",
|
"1. Create an Experiment in an existing Workspace\n",
|
||||||
"2. Instantiate an AutoMLConfig \n",
|
"2. Instantiate an AutoMLConfig \n",
|
||||||
"3. Find and train a forecasting model using local compute\n",
|
"3. Find and train a forecasting model using local compute\n",
|
||||||
"4. Evaluate the performance of the model\n",
|
"4. Evaluate the performance of the model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The examples in the follow code samples use the [University of Chicago's Dominick's Finer Foods dataset](https://research.chicagobooth.edu/kilts/marketing-databases/dominicks) to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
|
"The examples in the follow code samples use the [University of Chicago's Dominick's Finer Foods dataset](https://research.chicagobooth.edu/kilts/marketing-databases/dominicks) to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment is a named object in a Workspace which represents a predictive task, the output of which is a trained model and a set of evaluation metrics for the model. "
|
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment is a named object in a Workspace which represents a predictive task, the output of which is a trained model and a set of evaluation metrics for the model. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"import os\n",
|
"import logging\n",
|
||||||
"import logging\n",
|
"import warnings\n",
|
||||||
"import warnings\n",
|
"# Squash warning messages for cleaner output in the notebook\n",
|
||||||
"# Squash warning messages for cleaner output in the notebook\n",
|
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
||||||
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
||||||
"from azureml.train.automl.run import AutoMLRun\n",
|
]
|
||||||
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"ws = Workspace.from_config()\n",
|
||||||
"source": [
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"# choose a name for the run history container in the workspace\n",
|
||||||
"\n",
|
"experiment_name = 'automl-ojsalesforecasting'\n",
|
||||||
"# choose a name for the run history container in the workspace\n",
|
"# project folder\n",
|
||||||
"experiment_name = 'automl-ojsalesforecasting'\n",
|
"project_folder = './sample_projects/automl-local-ojsalesforecasting'\n",
|
||||||
"# project folder\n",
|
"\n",
|
||||||
"project_folder = './sample_projects/automl-local-ojsalesforecasting'\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"output = {}\n",
|
||||||
"\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"output = {}\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Workspace'] = ws.name\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['Workspace'] = ws.name\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Location'] = ws.location\n",
|
"output['Run History Name'] = experiment_name\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Run History Name'] = experiment_name\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"outputDf.T"
|
||||||
"pd.DataFrame(data=output, index=['']).T"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"## Data\n",
|
||||||
"## Data\n",
|
"You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
|
||||||
"You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"time_column_name = 'WeekStarting'\n",
|
||||||
"time_column_name = 'WeekStarting'\n",
|
"data = pd.read_csv(\"dominicks_OJ.csv\", parse_dates=[time_column_name])\n",
|
||||||
"data = pd.read_csv(\"dominicks_OJ.csv\", parse_dates=[time_column_name])\n",
|
"data.head()"
|
||||||
"data.head()"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
|
||||||
"Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
|
"\n",
|
||||||
"\n",
|
"The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we thus define the **grain** - the columns whose values determine the boundaries between time-series: "
|
||||||
"The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we thus define the **grain** - the columns whose values determine the boundaries between time-series: "
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"grain_column_names = ['Store', 'Brand']\n",
|
||||||
"grain_column_names = ['Store', 'Brand']\n",
|
"nseries = data.groupby(grain_column_names).ngroups\n",
|
||||||
"nseries = data.groupby(grain_column_names).ngroups\n",
|
"print('Data contains {0} individual time-series.'.format(nseries))"
|
||||||
"print('Data contains {0} individual time-series.'.format(nseries))"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"### Data Splitting\n",
|
||||||
"### Data Splitting\n",
|
"For the purposes of demonstration and later forecast evaluation, we now split the data into a training and a testing set. The test set will contain the final 20 weeks of observed sales for each time-series."
|
||||||
"For the purposes of demonstration and later forecast evaluation, we now split the data into a training and a testing set. The test set will contain the final 20 weeks of observed sales for each time-series."
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"ntest_periods = 20\n",
|
||||||
"ntest_periods = 20\n",
|
"\n",
|
||||||
"\n",
|
"def split_last_n_by_grain(df, n):\n",
|
||||||
"def split_last_n_by_grain(df, n):\n",
|
" \"\"\"\n",
|
||||||
" \"\"\"\n",
|
" Group df by grain and split on last n rows for each group\n",
|
||||||
" Group df by grain and split on last n rows for each group\n",
|
" \"\"\"\n",
|
||||||
" \"\"\"\n",
|
" df_grouped = (df.sort_values(time_column_name) # Sort by ascending time\n",
|
||||||
" df_grouped = (df.sort_values(time_column_name) # Sort by ascending time\n",
|
" .groupby(grain_column_names, group_keys=False))\n",
|
||||||
" .groupby(grain_column_names, group_keys=False))\n",
|
" df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n",
|
||||||
" df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n",
|
" df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n",
|
||||||
" df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n",
|
" return df_head, df_tail\n",
|
||||||
" return df_head, df_tail\n",
|
"\n",
|
||||||
"\n",
|
"X_train, X_test = split_last_n_by_grain(data, ntest_periods)"
|
||||||
"X_train, X_test = split_last_n_by_grain(data, ntest_periods)"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"## Modeling\n",
|
||||||
"## Modeling\n",
|
"\n",
|
||||||
"\n",
|
"For forecasting tasks, AutoML uses pre-processing and estimation steps that are specific to time-series. AutoML will undertake the following pre-processing steps:\n",
|
||||||
"For forecasting tasks, AutoML uses pre-processing and estimation steps that are specific to time-series. AutoML will undertake the following pre-processing steps:\n",
|
"* Detect time-series sample frequency (e.g. hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span \n",
|
||||||
"* Detect time-series sample frequency (e.g. hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span \n",
|
"* Impute missing values in the target (via forward-fill) and feature columns (using median column values) \n",
|
||||||
"* Impute missing values in the target (via forward-fill) and feature columns (using median column values) \n",
|
"* Create grain-based features to enable fixed effects across different series\n",
|
||||||
"* Create grain-based features to enable fixed effects across different series\n",
|
"* Create time-based features to assist in learning seasonal patterns\n",
|
||||||
"* Create time-based features to assist in learning seasonal patterns\n",
|
"* Encode categorical variables to numeric quantities\n",
|
||||||
"* Encode categorical variables to numeric quantities\n",
|
"\n",
|
||||||
"\n",
|
"AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n",
|
||||||
"AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n",
|
"\n",
|
||||||
"\n",
|
"You are almost ready to start an AutoML training job. We will first need to create a validation set from the existing training set (i.e. for hyper-parameter tuning): "
|
||||||
"You are almost ready to start an AutoML training job. We will first need to create a validation set from the existing training set (i.e. for hyper-parameter tuning): "
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"nvalidation_periods = 20\n",
|
||||||
"nvalidation_periods = 20\n",
|
"X_train, X_validate = split_last_n_by_grain(X_train, nvalidation_periods)"
|
||||||
"X_train, X_validate = split_last_n_by_grain(X_train, nvalidation_periods)"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"We also need to separate the target column from the rest of the DataFrame: "
|
||||||
"We also need to separate the target column from the rest of the DataFrame: "
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"target_column_name = 'Quantity'\n",
|
||||||
"target_column_name = 'Quantity'\n",
|
"y_train = X_train.pop(target_column_name).values\n",
|
||||||
"y_train = X_train.pop(target_column_name).values\n",
|
"y_validate = X_validate.pop(target_column_name).values "
|
||||||
"y_validate = X_validate.pop(target_column_name).values "
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"## Train\n",
|
||||||
"## Train\n",
|
"\n",
|
||||||
"\n",
|
"The AutoMLConfig object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, and the training and validation data. \n",
|
||||||
"The AutoMLConfig object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, and the training and validation data. \n",
|
"\n",
|
||||||
"\n",
|
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time and the grain column names. A time column is required for forecasting, while the grain is optional. If a grain is not given, the forecaster assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak. \n",
|
||||||
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time and the grain column names. A time column is required for forecasting, while the grain is optional. If a grain is not given, the forecaster assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak. \n",
|
"\n",
|
||||||
"\n",
|
"|Property|Description|\n",
|
||||||
"|Property|Description|\n",
|
"|-|-|\n",
|
||||||
"|-|-|\n",
|
"|**task**|forecasting|\n",
|
||||||
"|**task**|forecasting|\n",
|
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
"|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|\n",
|
"|**X**|Training matrix of features, shape = [n_training_samples, n_features]|\n",
|
||||||
"|**X**|Training matrix of features, shape = [n_training_samples, n_features]|\n",
|
"|**y**|Target values, shape = [n_training_samples, ]|\n",
|
||||||
"|**y**|Target values, shape = [n_training_samples, ]|\n",
|
"|**X_valid**|Validation matrix of features, shape = [n_validation_samples, n_features]|\n",
|
||||||
"|**X_valid**|Validation matrix of features, shape = [n_validation_samples, n_features]|\n",
|
"|**y_valid**|Target values for validation, shape = [n_validation_samples, ]\n",
|
||||||
"|**y_valid**|Target values for validation, shape = [n_validation_samples, ]\n",
|
"|**enable_ensembling**|Allow AutoML to create ensembles of the best performing models\n",
|
||||||
"|**enable_ensembling**|Allow AutoML to create ensembles of the best performing models\n",
|
"|**debug_log**|Log file path for writing debugging information\n",
|
||||||
"|**debug_log**|Log file path for writing debugging information\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"automl_settings = {\n",
|
||||||
"automl_settings = {\n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
" 'time_column_name': time_column_name,\n",
|
" 'grain_column_names': grain_column_names,\n",
|
||||||
" 'grain_column_names': grain_column_names,\n",
|
" 'drop_column_names': ['logQuantity']\n",
|
||||||
" 'drop_column_names': ['logQuantity']\n",
|
"}\n",
|
||||||
"}\n",
|
"\n",
|
||||||
"\n",
|
"automl_config = AutoMLConfig(task='forecasting',\n",
|
||||||
"automl_config = AutoMLConfig(task='forecasting',\n",
|
" debug_log='automl_oj_sales_errors.log',\n",
|
||||||
" debug_log='automl_oj_sales_errors.log',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" iterations=10,\n",
|
||||||
" iterations=10,\n",
|
" X=X_train,\n",
|
||||||
" X=X_train,\n",
|
" y=y_train,\n",
|
||||||
" y=y_train,\n",
|
" X_valid=X_validate,\n",
|
||||||
" X_valid=X_validate,\n",
|
" y_valid=y_validate,\n",
|
||||||
" y_valid=y_validate,\n",
|
" enable_ensembling=False,\n",
|
||||||
" enable_ensembling=False,\n",
|
" path=project_folder,\n",
|
||||||
" path=project_folder,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" verbosity=logging.INFO,\n",
|
" **automl_settings)"
|
||||||
" **automl_settings)"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"You can now submit a new training run. For local runs, the execution is synchronous. Depending on the data and number of iterations this operation may take several minutes.\n",
|
||||||
"You can now submit a new training run. For local runs, the execution is synchronous. Depending on the data and number of iterations this operation may take several minutes.\n",
|
"Information from each iteration will be printed to the console."
|
||||||
"Information from each iteration will be printed to the console."
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"local_run"
|
||||||
"local_run"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"### Retrieve the Best Model\n",
|
||||||
"### Retrieve the Best Model\n",
|
"Each run within an Experiment stores serialized (i.e. pickled) pipelines from the AutoML iterations. We can now retrieve the pipeline with the best performance on the validation dataset:"
|
||||||
"Each run within an Experiment stores serialized (i.e. pickled) pipelines from the AutoML iterations. We can now retrieve the pipeline with the best performance on the validation dataset:"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"best_run, fitted_pipeline = local_run.get_output()\n",
|
||||||
"best_run, fitted_pipeline = local_run.get_output()\n",
|
"fitted_pipeline.steps"
|
||||||
"fitted_pipeline.steps"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"### Make Predictions from the Best Fitted Model\n",
|
||||||
"### Make Predictions from the Best Fitted Model\n",
|
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:"
|
||||||
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"y_test = X_test.pop(target_column_name).values"
|
||||||
"y_test = X_test.pop(target_column_name).values"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"X_test.head()"
|
||||||
"X_test.head()"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data. \n",
|
||||||
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data. \n",
|
"\n",
|
||||||
"\n",
|
"The target predictions can be retrieved by calling the `predict` method on the best model:"
|
||||||
"The target predictions can be retrieved by calling the `predict` method on the best model:"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"y_pred = fitted_pipeline.predict(X_test)"
|
||||||
"y_pred = fitted_pipeline.predict(X_test)"
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "markdown",
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"### Calculate evaluation metrics for the prediction\n",
|
||||||
"### Calculate evaluation metrics for the prediction\n",
|
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE)."
|
||||||
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE)."
|
]
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"cell_type": "code",
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"def MAPE(actual, pred):\n",
|
||||||
"def MAPE(actual, pred):\n",
|
" \"\"\"\n",
|
||||||
" \"\"\"\n",
|
" Calculate mean absolute percentage error.\n",
|
||||||
" Calculate mean absolute percentage error.\n",
|
" Remove NA and values where actual is close to zero\n",
|
||||||
" Remove NA and values where actual is close to zero\n",
|
" \"\"\"\n",
|
||||||
" \"\"\"\n",
|
" not_na = ~(np.isnan(actual) | np.isnan(pred))\n",
|
||||||
" not_na = ~(np.isnan(actual) | np.isnan(pred))\n",
|
" not_zero = ~np.isclose(actual, 0.0)\n",
|
||||||
" not_zero = ~np.isclose(actual, 0.0)\n",
|
" actual_safe = actual[not_na & not_zero]\n",
|
||||||
" actual_safe = actual[not_na & not_zero]\n",
|
" pred_safe = pred[not_na & not_zero]\n",
|
||||||
" pred_safe = pred[not_na & not_zero]\n",
|
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
||||||
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
" return np.mean(APE)\n",
|
||||||
" return np.mean(APE)\n",
|
"\n",
|
||||||
"\n",
|
"print(\"[Test Data] \\nRoot Mean squared error: %.2f\" % np.sqrt(mean_squared_error(y_test, y_pred)))\n",
|
||||||
"print(\"[Test Data] \\nRoot Mean squared error: %.2f\" % np.sqrt(mean_squared_error(y_test, y_pred)))\n",
|
"print('mean_absolute_error score: %.2f' % mean_absolute_error(y_test, y_pred))\n",
|
||||||
"print('mean_absolute_error score: %.2f' % mean_absolute_error(y_test, y_pred))\n",
|
"print('MAPE: %.2f' % MAPE(y_test, y_pred))"
|
||||||
"print('MAPE: %.2f' % MAPE(y_test, y_pred))"
|
]
|
||||||
]
|
}
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "erwright"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "erwright"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,401 +1,396 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Blacklisting Models, Early Termination, and Handling Missing Data**_\n",
|
"_**Blacklisting Models, Early Termination, and Handling Missing Data**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metrics so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a blacklist of algorithms that AutoML will ignore for this run.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metrics so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a blacklist of algorithms that AutoML will ignore for this run.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to:\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"4. Train the model.\n",
|
"4. Train the model.\n",
|
||||||
"5. Explore the results.\n",
|
"5. Explore the results.\n",
|
||||||
"6. Test the best fitted model.\n",
|
"6. Test the best fitted model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In addition this notebook showcases the following features\n",
|
"In addition this notebook showcases the following features\n",
|
||||||
"- **Blacklisting** certain pipelines\n",
|
"- **Blacklisting** certain pipelines\n",
|
||||||
"- Specifying **target metrics** to indicate stopping criteria\n",
|
"- Specifying **target metrics** to indicate stopping criteria\n",
|
||||||
"- Handling **missing data** in the input"
|
"- Handling **missing data** in the input"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"\n",
|
||||||
"import random\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"\n",
|
"import numpy as np\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"import pandas as pd\n",
|
||||||
"from matplotlib.pyplot import imshow\n",
|
"from sklearn import datasets\n",
|
||||||
"import numpy as np\n",
|
"\n",
|
||||||
"import pandas as pd\n",
|
"import azureml.core\n",
|
||||||
"from sklearn import datasets\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import azureml.core\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"from azureml.core.experiment import Experiment\n",
|
]
|
||||||
"from azureml.core.workspace import Workspace\n",
|
},
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
{
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"ws = Workspace.from_config()\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"# Choose a name for the experiment.\n",
|
||||||
"source": [
|
"experiment_name = 'automl-local-missing-data'\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for the experiment.\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"experiment_name = 'automl-local-missing-data'\n",
|
"\n",
|
||||||
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
"output = {}\n",
|
||||||
"\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"\n",
|
"output['Workspace'] = ws.name\n",
|
||||||
"output = {}\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Workspace'] = ws.name\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Location'] = ws.location\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"outputDf.T"
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
]
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
},
|
||||||
"pd.DataFrame(data=output, index=['']).T"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"outputs": [],
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
"source": [
|
]
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
},
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"## Data"
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"## Data"
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"digits = datasets.load_digits()\n",
|
||||||
"outputs": [],
|
"X_train = digits.data[10:,:]\n",
|
||||||
"source": [
|
"y_train = digits.target[10:]\n",
|
||||||
"from scipy import sparse\n",
|
"\n",
|
||||||
"\n",
|
"# Add missing values in 75% of the lines.\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"missing_rate = 0.75\n",
|
||||||
"X_train = digits.data[10:,:]\n",
|
"n_missing_samples = int(np.floor(X_train.shape[0] * missing_rate))\n",
|
||||||
"y_train = digits.target[10:]\n",
|
"missing_samples = np.hstack((np.zeros(X_train.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n",
|
||||||
"\n",
|
"rng = np.random.RandomState(0)\n",
|
||||||
"# Add missing values in 75% of the lines.\n",
|
"rng.shuffle(missing_samples)\n",
|
||||||
"missing_rate = 0.75\n",
|
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
||||||
"n_missing_samples = int(np.floor(X_train.shape[0] * missing_rate))\n",
|
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
||||||
"missing_samples = np.hstack((np.zeros(X_train.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n",
|
]
|
||||||
"rng = np.random.RandomState(0)\n",
|
},
|
||||||
"rng.shuffle(missing_samples)\n",
|
{
|
||||||
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
"cell_type": "code",
|
||||||
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"df = pd.DataFrame(data = X_train)\n",
|
||||||
"execution_count": null,
|
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
||||||
"metadata": {},
|
"df.head()"
|
||||||
"outputs": [],
|
]
|
||||||
"source": [
|
},
|
||||||
"df = pd.DataFrame(data = X_train)\n",
|
{
|
||||||
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
"cell_type": "markdown",
|
||||||
"df.head()"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Train\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment. This includes setting `experiment_exit_score`, which should cause the run to complete before the `iterations` count is reached.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"|Property|Description|\n",
|
||||||
"## Train\n",
|
"|-|-|\n",
|
||||||
"\n",
|
"|**task**|classification or regression|\n",
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment. This includes setting `experiment_exit_score`, which should cause the run to complete before the `iterations` count is reached.\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
"|Property|Description|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|-|-|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
"|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"|**blacklist_models**|*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i>|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
"|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
|
]
|
||||||
"|**blacklist_models**|*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i>|\n",
|
},
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
{
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
"cell_type": "code",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"execution_count": null,
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"metadata": {},
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"outputs": [],
|
" iteration_timeout_minutes = 60,\n",
|
||||||
"source": [
|
" iterations = 20,\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
" n_cross_validations = 5,\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" preprocess = True,\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" experiment_exit_score = 0.9984,\n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
" blacklist_models = ['KNN','LinearSVM'],\n",
|
||||||
" iterations = 20,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" n_cross_validations = 5,\n",
|
" X = X_train, \n",
|
||||||
" preprocess = True,\n",
|
" y = y_train,\n",
|
||||||
" experiment_exit_score = 0.9984,\n",
|
" path = project_folder)"
|
||||||
" blacklist_models = ['KNN','LinearSVM'],\n",
|
]
|
||||||
" verbosity = logging.INFO,\n",
|
},
|
||||||
" X = X_train, \n",
|
{
|
||||||
" y = y_train,\n",
|
"cell_type": "markdown",
|
||||||
" path = project_folder)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
{
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"cell_type": "code",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "code",
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"local_run"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"local_run"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Results"
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"## Results"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Widget for Monitoring Runs\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"#### Widget for Monitoring Runs\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"execution_count": null,
|
"RunDetails(local_run).show() "
|
||||||
"metadata": {},
|
]
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"from azureml.widgets import RunDetails\n",
|
"cell_type": "markdown",
|
||||||
"RunDetails(local_run).show() "
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"\n",
|
||||||
{
|
"#### Retrieve All Child Runs\n",
|
||||||
"cell_type": "markdown",
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Retrieve All Child Runs\n",
|
"cell_type": "code",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"children = list(local_run.get_children())\n",
|
||||||
"execution_count": null,
|
"metricslist = {}\n",
|
||||||
"metadata": {},
|
"for run in children:\n",
|
||||||
"outputs": [],
|
" properties = run.get_properties()\n",
|
||||||
"source": [
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
"children = list(local_run.get_children())\n",
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
"metricslist = {}\n",
|
"\n",
|
||||||
"for run in children:\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
" properties = run.get_properties()\n",
|
"rundata"
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
]
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"cell_type": "markdown",
|
||||||
"rundata"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"### Retrieve the Best Model\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"### Retrieve the Best Model\n",
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
{
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"cell_type": "code",
|
||||||
"Show the run and the model which has the smallest `accuracy` value:"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"# lookup_metric = \"accuracy\"\n",
|
||||||
"execution_count": null,
|
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
||||||
"metadata": {},
|
]
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"# lookup_metric = \"accuracy\"\n",
|
"cell_type": "markdown",
|
||||||
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Model from a Specific Iteration\n",
|
||||||
{
|
"Show the run and the model from the third iteration:"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"#### Model from a Specific Iteration\n",
|
"cell_type": "code",
|
||||||
"Show the run and the model from the third iteration:"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"# iteration = 3\n",
|
||||||
"execution_count": null,
|
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
||||||
"metadata": {},
|
]
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"# iteration = 3\n",
|
"cell_type": "markdown",
|
||||||
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Test"
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"cell_type": "code",
|
||||||
"## Test"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"digits = datasets.load_digits()\n",
|
||||||
"execution_count": null,
|
"X_test = digits.data[:10, :]\n",
|
||||||
"metadata": {},
|
"y_test = digits.target[:10]\n",
|
||||||
"outputs": [],
|
"images = digits.images[:10]\n",
|
||||||
"source": [
|
"\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"# Randomly select digits and test.\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
"y_test = digits.target[:10]\n",
|
" print(index)\n",
|
||||||
"images = digits.images[:10]\n",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
"\n",
|
" label = y_test[index]\n",
|
||||||
"# Randomly select digits and test.\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
" print(index)\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
" ax1.set_title(title)\n",
|
||||||
" label = y_test[index]\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" plt.show()\n"
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
]
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
}
|
||||||
" ax1.set_title(title)\n",
|
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
|
||||||
" plt.show()\n"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,367 +1,365 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Explain classification model and visualize the explanation**_\n",
|
"_**Explain classification model and visualize the explanation**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)"
|
"1. [Results](#Results)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example we use the sklearn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use the AutoML Classifier for a simple classification problem.\n",
|
"In this example we use the sklearn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use the AutoML Classifier for a simple classification problem.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you would see\n",
|
||||||
"1. Creating an Experiment in an existing Workspace\n",
|
"1. Creating an Experiment in an existing Workspace\n",
|
||||||
"2. Instantiating AutoMLConfig\n",
|
"2. Instantiating AutoMLConfig\n",
|
||||||
"3. Training the Model using local compute and explain the model\n",
|
"3. Training the Model using local compute and explain the model\n",
|
||||||
"4. Visualization model's feature importance in widget\n",
|
"4. Visualization model's feature importance in widget\n",
|
||||||
"5. Explore best model's explanation"
|
"5. Explore best model's explanation"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"\n",
|
||||||
"import random\n",
|
"import pandas as pd\n",
|
||||||
"\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"import azureml.core\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"from azureml.core.workspace import Workspace\n",
|
]
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
},
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"ws = Workspace.from_config()\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"# choose a name for experiment\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"experiment_name = 'automl-local-classification'\n",
|
||||||
"\n",
|
"# project folder\n",
|
||||||
"# choose a name for experiment\n",
|
"project_folder = './sample_projects/automl-local-classification-model-explanation'\n",
|
||||||
"experiment_name = 'automl-local-classification'\n",
|
"\n",
|
||||||
"# project folder\n",
|
"experiment=Experiment(ws, experiment_name)\n",
|
||||||
"project_folder = './sample_projects/automl-local-classification-model-explanation'\n",
|
"\n",
|
||||||
"\n",
|
"output = {}\n",
|
||||||
"experiment=Experiment(ws, experiment_name)\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"output = {}\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"output['Location'] = ws.location\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
"outputDf.T"
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
]
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
||||||
"source": [
|
]
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"source": [
|
"set_diagnostics_collection(send_diagnostics=True)"
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
]
|
||||||
"set_diagnostics_collection(send_diagnostics=True)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"## Data"
|
||||||
"source": [
|
]
|
||||||
"## Data"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"from sklearn import datasets\n",
|
||||||
"source": [
|
"\n",
|
||||||
"from sklearn import datasets\n",
|
"iris = datasets.load_iris()\n",
|
||||||
"\n",
|
"y = iris.target\n",
|
||||||
"iris = datasets.load_iris()\n",
|
"X = iris.data\n",
|
||||||
"y = iris.target\n",
|
"\n",
|
||||||
"X = iris.data\n",
|
"features = iris.feature_names\n",
|
||||||
"\n",
|
"\n",
|
||||||
"features = iris.feature_names\n",
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"\n",
|
"X_train, X_test, y_train, y_test = train_test_split(X,\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
" y,\n",
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(X,\n",
|
" test_size=0.1,\n",
|
||||||
" y,\n",
|
" random_state=100,\n",
|
||||||
" test_size=0.1,\n",
|
" stratify=y)\n",
|
||||||
" random_state=100,\n",
|
"\n",
|
||||||
" stratify=y)\n",
|
"X_train = pd.DataFrame(X_train, columns=features)\n",
|
||||||
"\n",
|
"X_test = pd.DataFrame(X_test, columns=features)"
|
||||||
"X_train = pd.DataFrame(X_train, columns=features)\n",
|
]
|
||||||
"X_test = pd.DataFrame(X_test, columns=features)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"## Train\n",
|
||||||
"source": [
|
"\n",
|
||||||
"## Train\n",
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
"|Property|Description|\n",
|
||||||
"\n",
|
"|-|-|\n",
|
||||||
"|Property|Description|\n",
|
"|**task**|classification or regression|\n",
|
||||||
"|-|-|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**max_time_sec**|Time limit in minutes for each iterations|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
|
||||||
"|**max_time_sec**|Time limit in minutes for each iterations|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers. |\n",
|
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]|\n",
|
||||||
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
|
||||||
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
|
||||||
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
|
]
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"source": [
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" iteration_timeout_minutes = 200,\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" iterations = 10,\n",
|
||||||
" iteration_timeout_minutes = 200,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" iterations = 10,\n",
|
" X = X_train, \n",
|
||||||
" verbosity = logging.INFO,\n",
|
" y = y_train,\n",
|
||||||
" X = X_train, \n",
|
" X_valid = X_test,\n",
|
||||||
" y = y_train,\n",
|
" y_valid = y_test,\n",
|
||||||
" X_valid = X_test,\n",
|
" model_explainability=True,\n",
|
||||||
" y_valid = y_test,\n",
|
" path=project_folder)"
|
||||||
" model_explainability=True,\n",
|
]
|
||||||
" path=project_folder)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
||||||
"source": [
|
"You will see the currently running iterations printing to the console."
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
]
|
||||||
"You will see the currently running iterations printing to the console."
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
"source": [
|
]
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"local_run"
|
||||||
"source": [
|
]
|
||||||
"local_run"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"## Results"
|
||||||
"source": [
|
]
|
||||||
"## Results"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Widget for monitoring runs\n",
|
||||||
"source": [
|
"\n",
|
||||||
"### Widget for monitoring runs\n",
|
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
||||||
"\n",
|
]
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"from azureml.widgets import RunDetails\n",
|
||||||
"source": [
|
"RunDetails(local_run).show() "
|
||||||
"from azureml.widgets import RunDetails\n",
|
]
|
||||||
"RunDetails(local_run).show() "
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Retrieve the Best Model\n",
|
||||||
"source": [
|
"\n",
|
||||||
"### Retrieve the Best Model\n",
|
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
||||||
"\n",
|
]
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"source": [
|
"print(best_run)\n",
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"print(fitted_model)"
|
||||||
"print(best_run)\n",
|
]
|
||||||
"print(fitted_model)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Best Model 's explanation\n",
|
||||||
"source": [
|
"\n",
|
||||||
"### Best Model 's explanation\n",
|
"Retrieve the explanation from the best_run. And explanation information includes:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Retrieve the explanation from the best_run. And explanation information includes:\n",
|
"1.\tshap_values: The explanation information generated by shap lib\n",
|
||||||
"\n",
|
"2.\texpected_values: The expected value of the model applied to set of X_train data.\n",
|
||||||
"1.\tshap_values: The explanation information generated by shap lib\n",
|
"3.\toverall_summary: The model level feature importance values sorted in descending order\n",
|
||||||
"2.\texpected_values: The expected value of the model applied to set of X_train data.\n",
|
"4.\toverall_imp: The feature names sorted in the same order as in overall_summary\n",
|
||||||
"3.\toverall_summary: The model level feature importance values sorted in descending order\n",
|
"5.\tper_class_summary: The class level feature importance values sorted in descending order. Only available for the classification case\n",
|
||||||
"4.\toverall_imp: The feature names sorted in the same order as in overall_summary\n",
|
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case"
|
||||||
"5.\tper_class_summary: The class level feature importance values sorted in descending order. Only available for the classification case\n",
|
]
|
||||||
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
|
||||||
"source": [
|
"\n",
|
||||||
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
|
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
||||||
"\n",
|
" retrieve_model_explanation(best_run)"
|
||||||
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
]
|
||||||
" retrieve_model_explanation(best_run)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"print(overall_summary)\n",
|
||||||
"source": [
|
"print(overall_imp)"
|
||||||
"print(overall_summary)\n",
|
]
|
||||||
"print(overall_imp)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"print(per_class_summary)\n",
|
||||||
"source": [
|
"print(per_class_imp)"
|
||||||
"print(per_class_summary)\n",
|
]
|
||||||
"print(per_class_imp)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "markdown",
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
|
||||||
"source": [
|
]
|
||||||
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"from azureml.train.automl.automlexplainer import explain_model\n",
|
||||||
"source": [
|
"\n",
|
||||||
"from azureml.train.automl.automlexplainer import explain_model\n",
|
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
||||||
"\n",
|
" explain_model(fitted_model, X_train, X_test)"
|
||||||
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
]
|
||||||
" explain_model(fitted_model, X_train, X_test)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"cell_type": "code",
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"print(overall_summary)\n",
|
||||||
"source": [
|
"print(overall_imp)"
|
||||||
"print(overall_summary)\n",
|
]
|
||||||
"print(overall_imp)"
|
}
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "xif"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "xif"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,424 +1,417 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Regression with Local Compute**_\n",
|
"_**Regression with Local Compute**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/datasets/index.html#diabetes-dataset) to showcase how you can use AutoML for a simple regression problem.\n",
|
"In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/datasets/index.html#diabetes-dataset) to showcase how you can use AutoML for a simple regression problem.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to:\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"3. Train the model using local compute.\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"\n",
|
||||||
"import random\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"\n",
|
"import numpy as np\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"import pandas as pd\n",
|
||||||
"from matplotlib.pyplot import imshow\n",
|
"\n",
|
||||||
"import numpy as np\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from sklearn import datasets\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"import azureml.core\n",
|
]
|
||||||
"from azureml.core.experiment import Experiment\n",
|
},
|
||||||
"from azureml.core.workspace import Workspace\n",
|
{
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"cell_type": "code",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"execution_count": null,
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"ws = Workspace.from_config()\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"outputs": [],
|
"experiment_name = 'automl-local-regression'\n",
|
||||||
"source": [
|
"project_folder = './sample_projects/automl-local-regression'\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"\n",
|
||||||
"\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"\n",
|
||||||
"experiment_name = 'automl-local-regression'\n",
|
"output = {}\n",
|
||||||
"project_folder = './sample_projects/automl-local-regression'\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output = {}\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"output['Location'] = ws.location\n",
|
"outputDf.T"
|
||||||
"output['Project Directory'] = project_folder\n",
|
]
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
},
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
{
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"metadata": {},
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
"outputs": [],
|
]
|
||||||
"source": [
|
},
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
{
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Data\n",
|
||||||
"cell_type": "markdown",
|
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"## Data\n",
|
{
|
||||||
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
||||||
"metadata": {},
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
"outputs": [],
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"source": [
|
"\n",
|
||||||
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
"X, y = load_diabetes(return_X_y = True)\n",
|
||||||
"from sklearn.datasets import load_diabetes\n",
|
"\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
||||||
"\n",
|
"\n",
|
||||||
"X, y = load_diabetes(return_X_y = True)\n",
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
|
||||||
"\n",
|
]
|
||||||
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Train\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"source": [
|
"\n",
|
||||||
"## Train\n",
|
"|Property|Description|\n",
|
||||||
"\n",
|
"|-|-|\n",
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
"|**task**|classification or regression|\n",
|
||||||
"\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
"|Property|Description|\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
"|-|-|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
]
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
},
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
{
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"automl_config = AutoMLConfig(task = 'regression',\n",
|
||||||
"metadata": {},
|
" iteration_timeout_minutes = 10,\n",
|
||||||
"outputs": [],
|
" iterations = 10,\n",
|
||||||
"source": [
|
" primary_metric = 'spearman_correlation',\n",
|
||||||
"automl_config = AutoMLConfig(task = 'regression',\n",
|
" n_cross_validations = 5,\n",
|
||||||
" iteration_timeout_minutes = 10,\n",
|
" debug_log = 'automl.log',\n",
|
||||||
" iterations = 10,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" primary_metric = 'spearman_correlation',\n",
|
" X = X_train, \n",
|
||||||
" n_cross_validations = 5,\n",
|
" y = y_train,\n",
|
||||||
" debug_log = 'automl.log',\n",
|
" path = project_folder)"
|
||||||
" verbosity = logging.INFO,\n",
|
]
|
||||||
" X = X_train, \n",
|
},
|
||||||
" y = y_train,\n",
|
{
|
||||||
" path = project_folder)"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"cell_type": "markdown",
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
{
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"metadata": {},
|
]
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"local_run"
|
||||||
"metadata": {},
|
]
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"local_run"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Results"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Results"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"#### Widget for Monitoring Runs\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"source": [
|
"\n",
|
||||||
"#### Widget for Monitoring Runs\n",
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"\n",
|
]
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"from azureml.widgets import RunDetails\n",
|
||||||
"metadata": {},
|
"RunDetails(local_run).show() "
|
||||||
"outputs": [],
|
]
|
||||||
"source": [
|
},
|
||||||
"from azureml.widgets import RunDetails\n",
|
{
|
||||||
"RunDetails(local_run).show() "
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"#### Retrieve All Child Runs\n",
|
||||||
"metadata": {},
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"source": [
|
]
|
||||||
"\n",
|
},
|
||||||
"#### Retrieve All Child Runs\n",
|
{
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"children = list(local_run.get_children())\n",
|
||||||
"metadata": {},
|
"metricslist = {}\n",
|
||||||
"outputs": [],
|
"for run in children:\n",
|
||||||
"source": [
|
" properties = run.get_properties()\n",
|
||||||
"children = list(local_run.get_children())\n",
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
"metricslist = {}\n",
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
"for run in children:\n",
|
"\n",
|
||||||
" properties = run.get_properties()\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
"rundata"
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
{
|
||||||
"rundata"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"### Retrieve the Best Model\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"source": [
|
]
|
||||||
"### Retrieve the Best Model\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"metadata": {},
|
"print(best_run)\n",
|
||||||
"outputs": [],
|
"print(fitted_model)"
|
||||||
"source": [
|
]
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
},
|
||||||
"print(best_run)\n",
|
{
|
||||||
"print(fitted_model)"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"cell_type": "markdown",
|
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
{
|
||||||
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
"metadata": {},
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"outputs": [],
|
"print(best_run)\n",
|
||||||
"source": [
|
"print(fitted_model)"
|
||||||
"lookup_metric = \"root_mean_squared_error\"\n",
|
]
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
},
|
||||||
"print(best_run)\n",
|
{
|
||||||
"print(fitted_model)"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"#### Model from a Specific Iteration\n",
|
||||||
"cell_type": "markdown",
|
"Show the run and the model from the third iteration:"
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"#### Model from a Specific Iteration\n",
|
{
|
||||||
"Show the run and the model from the third iteration:"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"iteration = 3\n",
|
||||||
"metadata": {},
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"outputs": [],
|
"print(third_run)\n",
|
||||||
"source": [
|
"print(third_model)"
|
||||||
"iteration = 3\n",
|
]
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
},
|
||||||
"print(third_run)\n",
|
{
|
||||||
"print(third_model)"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Test"
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Test"
|
"cell_type": "markdown",
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"Predict on training and test set, and calculate residual values."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"Predict on training and test set, and calculate residual values."
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
"metadata": {},
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
"y_pred_train = fitted_model.predict(X_train)\n",
|
"y_residual_test = y_test - y_pred_test"
|
||||||
"y_residual_train = y_train - y_pred_train\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"y_pred_test = fitted_model.predict(X_test)\n",
|
{
|
||||||
"y_residual_test = y_test - y_pred_test"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"%matplotlib inline\n",
|
||||||
"metadata": {},
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"# Set up a multi-plot chart.\n",
|
||||||
"%matplotlib inline\n",
|
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
||||||
"import matplotlib.pyplot as plt\n",
|
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
||||||
"import numpy as np\n",
|
"f.set_figheight(6)\n",
|
||||||
"from sklearn import datasets\n",
|
"f.set_figwidth(16)\n",
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
"\n",
|
||||||
"\n",
|
"# Plot residual values of training set.\n",
|
||||||
"# Set up a multi-plot chart.\n",
|
"a0.axis([0, 360, -200, 200])\n",
|
||||||
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
||||||
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
"f.set_figheight(6)\n",
|
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
||||||
"f.set_figwidth(16)\n",
|
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
||||||
"\n",
|
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
||||||
"# Plot residual values of training set.\n",
|
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
||||||
"a0.axis([0, 360, -200, 200])\n",
|
"\n",
|
||||||
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
"# Plot a histogram.\n",
|
||||||
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
|
||||||
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
||||||
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
"\n",
|
||||||
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
"# Plot residual values of test set.\n",
|
||||||
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
"a1.axis([0, 90, -200, 200])\n",
|
||||||
"\n",
|
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
||||||
"# Plot a histogram.\n",
|
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n",
|
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
||||||
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n",
|
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
||||||
"\n",
|
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
||||||
"# Plot residual values of test set.\n",
|
"a1.set_yticklabels([])\n",
|
||||||
"a1.axis([0, 90, -200, 200])\n",
|
"\n",
|
||||||
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
"# Plot a histogram.\n",
|
||||||
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
|
||||||
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
||||||
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
"\n",
|
||||||
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
"plt.show()"
|
||||||
"a1.set_yticklabels([])\n",
|
]
|
||||||
"\n",
|
}
|
||||||
"# Plot a histogram.\n",
|
|
||||||
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
|
|
||||||
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
|
||||||
"\n",
|
|
||||||
"plt.show()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,260 +1,257 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Sample Weight**_\n",
|
"_**Sample Weight**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use sample weight with AutoML. Sample weight is used where some sample values are more important than others.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use sample weight with AutoML. Sample weight is used where some sample values are more important than others.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results."
|
"In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"\n",
|
||||||
"import random\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"\n",
|
"import numpy as np\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"import pandas as pd\n",
|
||||||
"from matplotlib.pyplot import imshow\n",
|
"from sklearn import datasets\n",
|
||||||
"import numpy as np\n",
|
"\n",
|
||||||
"import pandas as pd\n",
|
"import azureml.core\n",
|
||||||
"from sklearn import datasets\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import azureml.core\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"from azureml.core.experiment import Experiment\n",
|
]
|
||||||
"from azureml.core.workspace import Workspace\n",
|
},
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
{
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"cell_type": "code",
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"ws = Workspace.from_config()\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"# Choose names for the regular and the sample weight experiments.\n",
|
||||||
"source": [
|
"experiment_name = 'non_sample_weight_experiment'\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"sample_weight_experiment_name = 'sample_weight_experiment'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose names for the regular and the sample weight experiments.\n",
|
"project_folder = './sample_projects/automl-local-classification'\n",
|
||||||
"experiment_name = 'non_sample_weight_experiment'\n",
|
"\n",
|
||||||
"sample_weight_experiment_name = 'sample_weight_experiment'\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"\n",
|
"sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n",
|
||||||
"project_folder = './sample_projects/automl-local-classification'\n",
|
"\n",
|
||||||
"\n",
|
"output = {}\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"output = {}\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"output['Location'] = ws.location\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Location'] = ws.location\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"outputDf.T"
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
]
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
},
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
"metadata": {},
|
]
|
||||||
"source": [
|
},
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"outputs": [],
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
"source": [
|
]
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
},
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"## Train\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without."
|
||||||
"## Train\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"digits = datasets.load_digits()\n",
|
||||||
"outputs": [],
|
"X_train = digits.data[100:,:]\n",
|
||||||
"source": [
|
"y_train = digits.target[100:]\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.\n",
|
||||||
"y_train = digits.target[100:]\n",
|
"# This makes the model more likely to classify as 4 if the image it not clear.\n",
|
||||||
"\n",
|
"sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_train])\n",
|
||||||
"# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.\n",
|
"\n",
|
||||||
"# This makes the model more likely to classify as 4 if the image it not clear.\n",
|
"automl_classifier = AutoMLConfig(task = 'classification',\n",
|
||||||
"sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_train])\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"\n",
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"automl_classifier = AutoMLConfig(task = 'classification',\n",
|
" iteration_timeout_minutes = 60,\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" iterations = 10,\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" n_cross_validations = 2,\n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" iterations = 10,\n",
|
" X = X_train, \n",
|
||||||
" n_cross_validations = 2,\n",
|
" y = y_train,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" path = project_folder)\n",
|
||||||
" X = X_train, \n",
|
"\n",
|
||||||
" y = y_train,\n",
|
"automl_sample_weight = AutoMLConfig(task = 'classification',\n",
|
||||||
" path = project_folder)\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"\n",
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"automl_sample_weight = AutoMLConfig(task = 'classification',\n",
|
" iteration_timeout_minutes = 60,\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" iterations = 10,\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" n_cross_validations = 2,\n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" iterations = 10,\n",
|
" X = X_train, \n",
|
||||||
" n_cross_validations = 2,\n",
|
" y = y_train,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" sample_weight = sample_weight,\n",
|
||||||
" X = X_train, \n",
|
" path = project_folder)"
|
||||||
" y = y_train,\n",
|
]
|
||||||
" sample_weight = sample_weight,\n",
|
},
|
||||||
" path = project_folder)"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"metadata": {},
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"source": [
|
]
|
||||||
"Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
},
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"local_run = experiment.submit(automl_classifier, show_output = True)\n",
|
||||||
"outputs": [],
|
"sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output = True)\n",
|
||||||
"source": [
|
"\n",
|
||||||
"local_run = experiment.submit(automl_classifier, show_output = True)\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output = True)\n",
|
"best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()"
|
||||||
"\n",
|
]
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
},
|
||||||
"best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"## Test\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"#### Load Test Data"
|
||||||
"## Test\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"#### Load Test Data"
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"digits = datasets.load_digits()\n",
|
||||||
"outputs": [],
|
"X_test = digits.data[:100, :]\n",
|
||||||
"source": [
|
"y_test = digits.target[:100]\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"images = digits.images[:100]"
|
||||||
"X_test = digits.data[:100, :]\n",
|
]
|
||||||
"y_test = digits.target[:100]\n",
|
},
|
||||||
"images = digits.images[:100]"
|
{
|
||||||
]
|
"cell_type": "markdown",
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"#### Compare the Models\n",
|
||||||
"metadata": {},
|
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
|
||||||
"source": [
|
]
|
||||||
"#### Compare the Models\n",
|
},
|
||||||
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
|
{
|
||||||
]
|
"cell_type": "code",
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"# Randomly select digits and test.\n",
|
||||||
"outputs": [],
|
"for index in range(0,len(y_test)):\n",
|
||||||
"source": [
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
"# Randomly select digits and test.\n",
|
" predicted_sample_weight = fitted_model_sample_weight.predict(X_test[index:index + 1])[0]\n",
|
||||||
"for index in range(0,len(y_test)):\n",
|
" label = y_test[index]\n",
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
" if predicted == 4 or predicted_sample_weight == 4 or label == 4:\n",
|
||||||
" predicted_sample_weight = fitted_model_sample_weight.predict(X_test[index:index + 1])[0]\n",
|
" title = \"Label value = %d Predicted value = %d Prediced with sample weight = %d\" % (label, predicted, predicted_sample_weight)\n",
|
||||||
" label = y_test[index]\n",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
" if predicted == 4 or predicted_sample_weight == 4 or label == 4:\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
" title = \"Label value = %d Predicted value = %d Prediced with sample weight = %d\" % (label, predicted, predicted_sample_weight)\n",
|
" ax1.set_title(title)\n",
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
" plt.show()"
|
||||||
" ax1.set_title(title)\n",
|
]
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
}
|
||||||
" plt.show()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,403 +1,397 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Train Test Split and Handling Sparse Data**_\n",
|
"_**Train Test Split and Handling Sparse Data**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example we use the scikit-learn's [20newsgroup](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups.html) to showcase how you can use AutoML for handling sparse data and how to specify custom cross validations splits.\n",
|
"In this example we use the scikit-learn's [20newsgroup](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups.html) to showcase how you can use AutoML for handling sparse data and how to specify custom cross validations splits.\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to:\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"4. Train the model.\n",
|
"4. Train the model.\n",
|
||||||
"5. Explore the results.\n",
|
"5. Explore the results.\n",
|
||||||
"6. Test the best fitted model.\n",
|
"6. Test the best fitted model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In addition this notebook showcases the following features\n",
|
"In addition this notebook showcases the following features\n",
|
||||||
"- Explicit train test splits \n",
|
"- Explicit train test splits \n",
|
||||||
"- Handling **sparse data** in the input"
|
"- Handling **sparse data** in the input"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"\n",
|
||||||
"import random\n",
|
"import pandas as pd\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"import azureml.core\n",
|
||||||
"from matplotlib.pyplot import imshow\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"import numpy as np\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import pandas as pd\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"from sklearn import datasets\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"import azureml.core\n",
|
{
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"cell_type": "code",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"execution_count": null,
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"metadata": {},
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"outputs": [],
|
||||||
]
|
"source": [
|
||||||
},
|
"ws = Workspace.from_config()\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"# choose a name for the experiment\n",
|
||||||
"execution_count": null,
|
"experiment_name = 'automl-local-missing-data'\n",
|
||||||
"metadata": {},
|
"# project folder\n",
|
||||||
"outputs": [],
|
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
||||||
"source": [
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for the experiment\n",
|
"output = {}\n",
|
||||||
"experiment_name = 'automl-local-missing-data'\n",
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"# project folder\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
"output['Workspace'] = ws.name\n",
|
||||||
"\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"output['Location'] = ws.location\n",
|
||||||
"\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"output = {}\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"output['Workspace'] = ws.name\n",
|
"outputDf.T"
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
]
|
||||||
"output['Location'] = ws.location\n",
|
},
|
||||||
"output['Project Directory'] = project_folder\n",
|
{
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
"cell_type": "markdown",
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"metadata": {},
|
||||||
"pd.DataFrame(data=output, index=['']).T"
|
"source": [
|
||||||
]
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
},
|
]
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"cell_type": "code",
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"cell_type": "code",
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
"metadata": {},
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
"source": [
|
||||||
]
|
"## Data"
|
||||||
},
|
]
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"cell_type": "code",
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"## Data"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"from sklearn.datasets import fetch_20newsgroups\n",
|
||||||
"cell_type": "code",
|
"from sklearn.feature_extraction.text import HashingVectorizer\n",
|
||||||
"execution_count": null,
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"remove = ('headers', 'footers', 'quotes')\n",
|
||||||
"source": [
|
"categories = [\n",
|
||||||
"from sklearn.datasets import fetch_20newsgroups\n",
|
" 'alt.atheism',\n",
|
||||||
"from sklearn.feature_extraction.text import HashingVectorizer\n",
|
" 'talk.religion.misc',\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
" 'comp.graphics',\n",
|
||||||
"\n",
|
" 'sci.space',\n",
|
||||||
"remove = ('headers', 'footers', 'quotes')\n",
|
"]\n",
|
||||||
"categories = [\n",
|
"data_train = fetch_20newsgroups(subset = 'train', categories = categories,\n",
|
||||||
" 'alt.atheism',\n",
|
" shuffle = True, random_state = 42,\n",
|
||||||
" 'talk.religion.misc',\n",
|
" remove = remove)\n",
|
||||||
" 'comp.graphics',\n",
|
"\n",
|
||||||
" 'sci.space',\n",
|
"X_train, X_valid, y_train, y_valid = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n",
|
||||||
"]\n",
|
"\n",
|
||||||
"data_train = fetch_20newsgroups(subset = 'train', categories = categories,\n",
|
"\n",
|
||||||
" shuffle = True, random_state = 42,\n",
|
"vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
|
||||||
" remove = remove)\n",
|
" n_features = 2**16)\n",
|
||||||
"\n",
|
"X_train = vectorizer.transform(X_train)\n",
|
||||||
"X_train, X_valid, y_train, y_valid = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n",
|
"X_valid = vectorizer.transform(X_valid)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n",
|
||||||
"vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
|
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
|
||||||
" n_features = 2**16)\n",
|
"summary_df['Validation Set'] = [X_valid.shape[0], X_valid.shape[1]]\n",
|
||||||
"X_train = vectorizer.transform(X_train)\n",
|
"summary_df"
|
||||||
"X_valid = vectorizer.transform(X_valid)\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n",
|
{
|
||||||
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
|
"cell_type": "markdown",
|
||||||
"summary_df['Validation Set'] = [X_valid.shape[0], X_valid.shape[1]]\n",
|
"metadata": {},
|
||||||
"summary_df"
|
"source": [
|
||||||
]
|
"## Train\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"|Property|Description|\n",
|
||||||
"source": [
|
"|-|-|\n",
|
||||||
"## Train\n",
|
"|**task**|classification or regression|\n",
|
||||||
"\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
"\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|Property|Description|\n",
|
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.<br>**Note:** If input data is sparse, you cannot use *True*.|\n",
|
||||||
"|-|-|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom validation set.|\n",
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification for the custom validation set.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.<br>**Note:** If input data is sparse, you cannot use *True*.|\n",
|
]
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
},
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
{
|
||||||
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom validation set.|\n",
|
"cell_type": "code",
|
||||||
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification for the custom validation set.|\n",
|
"execution_count": null,
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"cell_type": "code",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"execution_count": null,
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"metadata": {},
|
" iteration_timeout_minutes = 60,\n",
|
||||||
"outputs": [],
|
" iterations = 5,\n",
|
||||||
"source": [
|
" preprocess = False,\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" X = X_train, \n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" y = y_train,\n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
" X_valid = X_valid, \n",
|
||||||
" iterations = 5,\n",
|
" y_valid = y_valid, \n",
|
||||||
" preprocess = False,\n",
|
" path = project_folder)"
|
||||||
" verbosity = logging.INFO,\n",
|
]
|
||||||
" X = X_train, \n",
|
},
|
||||||
" y = y_train,\n",
|
{
|
||||||
" X_valid = X_valid, \n",
|
"cell_type": "markdown",
|
||||||
" y_valid = y_valid, \n",
|
"metadata": {},
|
||||||
" path = project_folder)"
|
"source": [
|
||||||
]
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
},
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"cell_type": "code",
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"execution_count": null,
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
"cell_type": "code",
|
]
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"cell_type": "code",
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"local_run"
|
||||||
"cell_type": "code",
|
]
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"cell_type": "markdown",
|
||||||
"source": [
|
"metadata": {},
|
||||||
"local_run"
|
"source": [
|
||||||
]
|
"## Results"
|
||||||
},
|
]
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"cell_type": "markdown",
|
||||||
"source": [
|
"metadata": {},
|
||||||
"## Results"
|
"source": [
|
||||||
]
|
"#### Widget for Monitoring Runs\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"source": [
|
]
|
||||||
"#### Widget for Monitoring Runs\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"cell_type": "code",
|
||||||
"\n",
|
"execution_count": null,
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"from azureml.widgets import RunDetails\n",
|
||||||
"cell_type": "code",
|
"RunDetails(local_run).show() "
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"from azureml.widgets import RunDetails\n",
|
"metadata": {},
|
||||||
"RunDetails(local_run).show() "
|
"source": [
|
||||||
]
|
"\n",
|
||||||
},
|
"#### Retrieve All Child Runs\n",
|
||||||
{
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"\n",
|
"cell_type": "code",
|
||||||
"#### Retrieve All Child Runs\n",
|
"execution_count": null,
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"children = list(local_run.get_children())\n",
|
||||||
"cell_type": "code",
|
"metricslist = {}\n",
|
||||||
"execution_count": null,
|
"for run in children:\n",
|
||||||
"metadata": {},
|
" properties = run.get_properties()\n",
|
||||||
"outputs": [],
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
"source": [
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
"children = list(local_run.get_children())\n",
|
" \n",
|
||||||
"metricslist = {}\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"for run in children:\n",
|
"rundata"
|
||||||
" properties = run.get_properties()\n",
|
]
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
},
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
{
|
||||||
" \n",
|
"cell_type": "markdown",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"metadata": {},
|
||||||
"rundata"
|
"source": [
|
||||||
]
|
"### Retrieve the Best Model\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"cell_type": "markdown",
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"### Retrieve the Best Model\n",
|
"cell_type": "code",
|
||||||
"\n",
|
"execution_count": null,
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
"cell_type": "code",
|
]
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"cell_type": "markdown",
|
||||||
"source": [
|
"metadata": {},
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"source": [
|
||||||
]
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
},
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"cell_type": "code",
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"execution_count": null,
|
||||||
"Show the run and the model which has the smallest `accuracy` value:"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"# lookup_metric = \"accuracy\"\n",
|
||||||
"cell_type": "code",
|
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"# lookup_metric = \"accuracy\"\n",
|
"metadata": {},
|
||||||
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
"source": [
|
||||||
]
|
"#### Model from a Specific Iteration\n",
|
||||||
},
|
"Show the run and the model from the third iteration:"
|
||||||
{
|
]
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"cell_type": "code",
|
||||||
"#### Model from a Specific Iteration\n",
|
"execution_count": null,
|
||||||
"Show the run and the model from the third iteration:"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"# iteration = 3\n",
|
||||||
"cell_type": "code",
|
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
||||||
"execution_count": null,
|
]
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"cell_type": "markdown",
|
||||||
"# iteration = 3\n",
|
"metadata": {},
|
||||||
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
"source": [
|
||||||
]
|
"## Test"
|
||||||
},
|
]
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"cell_type": "code",
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"## Test"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"source": [
|
||||||
{
|
"# Load test data.\n",
|
||||||
"cell_type": "code",
|
"from pandas_ml import ConfusionMatrix\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
|
||||||
"outputs": [],
|
" shuffle = True, random_state = 42,\n",
|
||||||
"source": [
|
" remove = remove)\n",
|
||||||
"# Load test data.\n",
|
"\n",
|
||||||
"from pandas_ml import ConfusionMatrix\n",
|
"X_test = vectorizer.transform(data_test.data)\n",
|
||||||
"\n",
|
"y_test = data_test.target\n",
|
||||||
"data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
|
"\n",
|
||||||
" shuffle = True, random_state = 42,\n",
|
"# Test our best pipeline.\n",
|
||||||
" remove = remove)\n",
|
"\n",
|
||||||
"\n",
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
"X_test = vectorizer.transform(data_test.data)\n",
|
"y_pred_strings = [data_test.target_names[i] for i in y_pred]\n",
|
||||||
"y_test = data_test.target\n",
|
"y_test_strings = [data_test.target_names[i] for i in y_test]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Test our best pipeline.\n",
|
"cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n",
|
||||||
"\n",
|
"print(cm)\n",
|
||||||
"y_pred = fitted_model.predict(X_test)\n",
|
"cm.plot()"
|
||||||
"y_pred_strings = [data_test.target_names[i] for i in y_pred]\n",
|
]
|
||||||
"y_test_strings = [data_test.target_names[i] for i in y_test]\n",
|
}
|
||||||
"\n",
|
|
||||||
"cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n",
|
|
||||||
"print(cm)\n",
|
|
||||||
"cm.plot()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,495 +1,495 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Enabling App Insights for Services in Production\n",
|
"# Enabling App Insights for Services in Production\n",
|
||||||
"With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model. \n",
|
"With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## What does Application Insights monitor?\n",
|
"## What does Application Insights monitor?\n",
|
||||||
"It monitors request rates, response times, failure rates, etc. For more information visit [App Insights docs.](https://docs.microsoft.com/en-us/azure/application-insights/app-insights-overview)\n",
|
"It monitors request rates, response times, failure rates, etc. For more information visit [App Insights docs.](https://docs.microsoft.com/en-us/azure/application-insights/app-insights-overview)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## What is different compared to standard production deployment process?\n",
|
"## What is different compared to standard production deployment process?\n",
|
||||||
"If you want to enable generic App Insights for a service run:\n",
|
"If you want to enable generic App Insights for a service run:\n",
|
||||||
"```python\n",
|
"```python\n",
|
||||||
"aks_service= Webservice(ws, \"aks-w-dc2\")\n",
|
"aks_service= Webservice(ws, \"aks-w-dc2\")\n",
|
||||||
"aks_service.update(enable_app_insights=True)```\n",
|
"aks_service.update(enable_app_insights=True)```\n",
|
||||||
"Where \"aks-w-dc2\" is your service name. You can also do this from the Azure Portal under your Workspace--> deployments--> Select deployment--> Edit--> Advanced Settings--> Select \"Enable AppInsights diagnostics\"\n",
|
"Where \"aks-w-dc2\" is your service name. You can also do this from the Azure Portal under your Workspace--> deployments--> Select deployment--> Edit--> Advanced Settings--> Select \"Enable AppInsights diagnostics\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n",
|
"If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n",
|
||||||
"1. Update scoring file.\n",
|
"1. Update scoring file.\n",
|
||||||
"2. Update aks configuration.\n",
|
"2. Update aks configuration.\n",
|
||||||
"3. Build new image and deploy it. "
|
"3. Build new image and deploy it. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 1. Import your dependencies"
|
"## 1. Import your dependencies"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace, Run\n",
|
"from azureml.core import Workspace, Run\n",
|
||||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
"from azureml.core.image import Image\n",
|
"from azureml.core.image import Image\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"print(azureml.core.VERSION)"
|
"print(azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 2. Set up your configuration and create a workspace\n"
|
"## 2. Set up your configuration and create a workspace\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 3. Register Model\n",
|
"## 3. Register Model\n",
|
||||||
"Register an existing trained model, add descirption and tags."
|
"Register an existing trained model, add descirption and tags."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Register the model\n",
|
"#Register the model\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
|
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
|
||||||
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
|
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(model.name, model.description, model.version)"
|
"print(model.name, model.description, model.version)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 4. *Update your scoring file with custom print statements*\n",
|
"## 4. *Update your scoring file with custom print statements*\n",
|
||||||
"Here is an example:\n",
|
"Here is an example:\n",
|
||||||
"### a. In your init function add:\n",
|
"### a. In your init function add:\n",
|
||||||
"```python\n",
|
"```python\n",
|
||||||
"print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))```\n",
|
"print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))```\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### b. In your run function add:\n",
|
"### b. In your run function add:\n",
|
||||||
"```python\n",
|
"```python\n",
|
||||||
"print (\"Prediction created\" + time.strftime(\"%H:%M:%S\"))```"
|
"print (\"Prediction created\" + time.strftime(\"%H:%M:%S\"))```"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import numpy \n",
|
"import numpy \n",
|
||||||
"from sklearn.externals import joblib\n",
|
"from sklearn.externals import joblib\n",
|
||||||
"from sklearn.linear_model import Ridge\n",
|
"from sklearn.linear_model import Ridge\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"import time\n",
|
"import time\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def init():\n",
|
"def init():\n",
|
||||||
" global model\n",
|
" global model\n",
|
||||||
" #Print statement for appinsights custom traces:\n",
|
" #Print statement for appinsights custom traces:\n",
|
||||||
" print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n",
|
" print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n",
|
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n",
|
||||||
" # this call should return the path to the model.pkl file on the local disk.\n",
|
" # this call should return the path to the model.pkl file on the local disk.\n",
|
||||||
" model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n",
|
" model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # deserialize the model file back into a sklearn model\n",
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
" model = joblib.load(model_path)\n",
|
" model = joblib.load(model_path)\n",
|
||||||
" \n",
|
" \n",
|
||||||
"\n",
|
"\n",
|
||||||
"# note you can pass in multiple rows for scoring\n",
|
"# note you can pass in multiple rows for scoring\n",
|
||||||
"def run(raw_data):\n",
|
"def run(raw_data):\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" data = json.loads(raw_data)['data']\n",
|
" data = json.loads(raw_data)['data']\n",
|
||||||
" data = numpy.array(data)\n",
|
" data = numpy.array(data)\n",
|
||||||
" result = model.predict(data)\n",
|
" result = model.predict(data)\n",
|
||||||
" print (\"Prediction created\" + time.strftime(\"%H:%M:%S\"))\n",
|
" print (\"Prediction created\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
" # you can return any datatype as long as it is JSON-serializable\n",
|
" # you can return any datatype as long as it is JSON-serializable\n",
|
||||||
" return result.tolist()\n",
|
" return result.tolist()\n",
|
||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" error = str(e)\n",
|
" error = str(e)\n",
|
||||||
" print (error + time.strftime(\"%H:%M:%S\"))\n",
|
" print (error + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
" return error"
|
" return error"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 5. *Create myenv.yml file*"
|
"## 5. *Create myenv.yml file*"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 6. Create your new Image"
|
"## 6. Create your new Image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import ContainerImage\n",
|
"from azureml.core.image import ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
" runtime = \"python\",\n",
|
" runtime = \"python\",\n",
|
||||||
" conda_file = \"myenv.yml\",\n",
|
" conda_file = \"myenv.yml\",\n",
|
||||||
" description = \"Image with ridge regression model\",\n",
|
" description = \"Image with ridge regression model\",\n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image = ContainerImage.create(name = \"myimage1\",\n",
|
"image = ContainerImage.create(name = \"myimage1\",\n",
|
||||||
" # this is the model object\n",
|
" # this is the model object\n",
|
||||||
" models = [model],\n",
|
" models = [model],\n",
|
||||||
" image_config = image_config,\n",
|
" image_config = image_config,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image.wait_for_creation(show_output = True)"
|
"image.wait_for_creation(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy to ACI (Optional)"
|
"## Deploy to ACI (Optional)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"}, \n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"}, \n",
|
||||||
" description = 'Predict diabetes using regression model',\n",
|
" description = 'Predict diabetes using regression model',\n",
|
||||||
" enable_app_insights = True)"
|
" enable_app_insights = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service_name = 'my-aci-service-4'\n",
|
"aci_service_name = 'my-aci-service-4'\n",
|
||||||
"print(aci_service_name)\n",
|
"print(aci_service_name)\n",
|
||||||
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
" image = image,\n",
|
" image = image,\n",
|
||||||
" name = aci_service_name,\n",
|
" name = aci_service_name,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
"test_sample = json.dumps({'data': [\n",
|
"test_sample = json.dumps({'data': [\n",
|
||||||
" [1,28,13,45,54,6,57,8,8,10], \n",
|
" [1,28,13,45,54,6,57,8,8,10], \n",
|
||||||
" [101,9,8,37,6,45,4,3,2,41]\n",
|
" [101,9,8,37,6,45,4,3,2,41]\n",
|
||||||
"]})\n",
|
"]})\n",
|
||||||
"test_sample = bytes(test_sample,encoding='utf8')"
|
"test_sample = bytes(test_sample,encoding='utf8')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"if aci_service.state == \"Healthy\":\n",
|
"if aci_service.state == \"Healthy\":\n",
|
||||||
" prediction = aci_service.run(input_data=test_sample)\n",
|
" prediction = aci_service.run(input_data=test_sample)\n",
|
||||||
" print(prediction)\n",
|
" print(prediction)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" raise ValueError(\"Service deployment isn't healthy, can't call the service\")"
|
" raise ValueError(\"Service deployment isn't healthy, can't call the service\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 7. Deploy to AKS service"
|
"## 7. Deploy to AKS service"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create AKS compute if you haven't done so."
|
"### Create AKS compute if you haven't done so."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Use the default configuration (can also provide parameters to customize)\n",
|
"# Use the default configuration (can also provide parameters to customize)\n",
|
||||||
"prov_config = AksCompute.provisioning_configuration()\n",
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aks_name = 'my-aks-test3' \n",
|
"aks_name = 'my-aks-test3' \n",
|
||||||
"# Create the cluster\n",
|
"# Create the cluster\n",
|
||||||
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
||||||
" name = aks_name, \n",
|
" name = aks_name, \n",
|
||||||
" provisioning_configuration = prov_config)"
|
" provisioning_configuration = prov_config)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"aks_target.wait_for_completion(show_output = True)"
|
"aks_target.wait_for_completion(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(aks_target.provisioning_state)\n",
|
"print(aks_target.provisioning_state)\n",
|
||||||
"print(aks_target.provisioning_errors)"
|
"print(aks_target.provisioning_errors)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"If you already have a cluster you can attach the service to it:"
|
"If you already have a cluster you can attach the service to it:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"```python \n",
|
"```python \n",
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n",
|
"resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n",
|
||||||
"create_name= 'myaks4'\n",
|
"create_name= 'myaks4'\n",
|
||||||
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
|
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
|
||||||
"aks_target = ComputeTarget.attach(workspace = ws, \n",
|
"aks_target = ComputeTarget.attach(workspace = ws, \n",
|
||||||
" name = create_name, \n",
|
" name = create_name, \n",
|
||||||
" attach_configuration=attach_config)\n",
|
" attach_configuration=attach_config)\n",
|
||||||
"## Wait for the operation to complete\n",
|
"## Wait for the operation to complete\n",
|
||||||
"aks_target.wait_for_provisioning(True)```"
|
"aks_target.wait_for_provisioning(True)```"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### a. *Activate App Insights through updating AKS Webservice configuration*\n",
|
"### a. *Activate App Insights through updating AKS Webservice configuration*\n",
|
||||||
"In order to enable App Insights in your service you will need to update your AKS configuration file:"
|
"In order to enable App Insights in your service you will need to update your AKS configuration file:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Set the web service configuration\n",
|
"#Set the web service configuration\n",
|
||||||
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)"
|
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### b. Deploy your service"
|
"### b. Deploy your service"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"if aks_target.provisioning_state== \"Succeeded\": \n",
|
"if aks_target.provisioning_state== \"Succeeded\": \n",
|
||||||
" aks_service_name ='aks-w-dc5'\n",
|
" aks_service_name ='aks-w-dc5'\n",
|
||||||
" aks_service = Webservice.deploy_from_image(workspace = ws, \n",
|
" aks_service = Webservice.deploy_from_image(workspace = ws, \n",
|
||||||
" name = aks_service_name,\n",
|
" name = aks_service_name,\n",
|
||||||
" image = image,\n",
|
" image = image,\n",
|
||||||
" deployment_config = aks_config,\n",
|
" deployment_config = aks_config,\n",
|
||||||
" deployment_target = aks_target\n",
|
" deployment_target = aks_target\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
" aks_service.wait_for_deployment(show_output = True)\n",
|
" aks_service.wait_for_deployment(show_output = True)\n",
|
||||||
" print(aks_service.state)\n",
|
" print(aks_service.state)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" raise ValueError(\"AKS provisioning failed.\")"
|
" raise ValueError(\"AKS provisioning failed.\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 8. Test your service "
|
"## 8. Test your service "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
"test_sample = json.dumps({'data': [\n",
|
"test_sample = json.dumps({'data': [\n",
|
||||||
" [1,28,13,45,54,6,57,8,8,10], \n",
|
" [1,28,13,45,54,6,57,8,8,10], \n",
|
||||||
" [101,9,8,37,6,45,4,3,2,41]\n",
|
" [101,9,8,37,6,45,4,3,2,41]\n",
|
||||||
"]})\n",
|
"]})\n",
|
||||||
"test_sample = bytes(test_sample,encoding='utf8')\n",
|
"test_sample = bytes(test_sample,encoding='utf8')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"if aks_service.state == \"Healthy\":\n",
|
"if aks_service.state == \"Healthy\":\n",
|
||||||
" prediction = aks_service.run(input_data=test_sample)\n",
|
" prediction = aks_service.run(input_data=test_sample)\n",
|
||||||
" print(prediction)\n",
|
" print(prediction)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" raise ValueError(\"Service deployment isn't healthy, can't call the service\")"
|
" raise ValueError(\"Service deployment isn't healthy, can't call the service\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 9. See your service telemetry in App Insights\n",
|
"## 9. See your service telemetry in App Insights\n",
|
||||||
"1. Go to the [Azure Portal](https://portal.azure.com/)\n",
|
"1. Go to the [Azure Portal](https://portal.azure.com/)\n",
|
||||||
"2. All resources--> Select the subscription/resource group where you created your Workspace--> Select the App Insights type\n",
|
"2. All resources--> Select the subscription/resource group where you created your Workspace--> Select the App Insights type\n",
|
||||||
"3. Click on the AppInsights resource. You'll see a highlevel dashboard with information on Requests, Server response time and availability.\n",
|
"3. Click on the AppInsights resource. You'll see a highlevel dashboard with information on Requests, Server response time and availability.\n",
|
||||||
"4. Click on the top banner \"Analytics\"\n",
|
"4. Click on the top banner \"Analytics\"\n",
|
||||||
"5. In the \"Schema\" section select \"traces\" and run your query.\n",
|
"5. In the \"Schema\" section select \"traces\" and run your query.\n",
|
||||||
"6. Voila! All your custom traces should be there."
|
"6. Voila! All your custom traces should be there."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Disable App Insights"
|
"# Disable App Insights"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"aks_service.update(enable_app_insights=False)"
|
"aks_service.update(enable_app_insights=False)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Clean up"
|
"## Clean up"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"aks_service.delete()\n",
|
"aks_service.delete()\n",
|
||||||
"aci_service.delete()\n",
|
"aci_service.delete()\n",
|
||||||
"image.delete()\n",
|
"image.delete()\n",
|
||||||
"model.delete()"
|
"model.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "marthalc"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python [default]",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python3"
|
"name": "marthalc"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,477 +1,477 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Enabling Data Collection for Models in Production\n",
|
"# Enabling Data Collection for Models in Production\n",
|
||||||
"With this notebook, you can learn how to collect input model data from your Azure Machine Learning service in an Azure Blob storage. Once enabled, this data collected gives you the opportunity:\n",
|
"With this notebook, you can learn how to collect input model data from your Azure Machine Learning service in an Azure Blob storage. Once enabled, this data collected gives you the opportunity:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"* Monitor data drifts as production data enters your model\n",
|
"* Monitor data drifts as production data enters your model\n",
|
||||||
"* Make better decisions on when to retrain or optimize your model\n",
|
"* Make better decisions on when to retrain or optimize your model\n",
|
||||||
"* Retrain your model with the data collected\n",
|
"* Retrain your model with the data collected\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## What data is collected?\n",
|
"## What data is collected?\n",
|
||||||
"* Model input data (voice, images, and video are not supported) from services deployed in Azure Kubernetes Cluster (AKS)\n",
|
"* Model input data (voice, images, and video are not supported) from services deployed in Azure Kubernetes Cluster (AKS)\n",
|
||||||
"* Model predictions using production input data.\n",
|
"* Model predictions using production input data.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** pre-aggregation or pre-calculations on this data are done by user and not included in this version of the product.\n",
|
"**Note:** pre-aggregation or pre-calculations on this data are done by user and not included in this version of the product.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## What is different compared to standard production deployment process?\n",
|
"## What is different compared to standard production deployment process?\n",
|
||||||
"1. Update scoring file.\n",
|
"1. Update scoring file.\n",
|
||||||
"2. Update yml file with new dependency.\n",
|
"2. Update yml file with new dependency.\n",
|
||||||
"3. Update aks configuration.\n",
|
"3. Update aks configuration.\n",
|
||||||
"4. Build new image and deploy it. "
|
"4. Build new image and deploy it. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 1. Import your dependencies"
|
"## 1. Import your dependencies"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace, Run\n",
|
"from azureml.core import Workspace, Run\n",
|
||||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
"from azureml.core.image import Image\n",
|
"from azureml.core.image import Image\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"print(azureml.core.VERSION)"
|
"print(azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 2. Set up your configuration and create a workspace\n",
|
"## 2. Set up your configuration and create a workspace\n",
|
||||||
"Follow Notebook 00 instructions to do this.\n"
|
"Follow Notebook 00 instructions to do this.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 3. Register Model\n",
|
"## 3. Register Model\n",
|
||||||
"Register an existing trained model, add descirption and tags."
|
"Register an existing trained model, add descirption and tags."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Register the model\n",
|
"#Register the model\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
|
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
|
||||||
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
|
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(model.name, model.description, model.version)"
|
"print(model.name, model.description, model.version)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 4. *Update your scoring file with Data Collection*\n",
|
"## 4. *Update your scoring file with Data Collection*\n",
|
||||||
"The file below, compared to the file used in notebook 11, has the following changes:\n",
|
"The file below, compared to the file used in notebook 11, has the following changes:\n",
|
||||||
"### a. Import the module\n",
|
"### a. Import the module\n",
|
||||||
"```python \n",
|
"```python \n",
|
||||||
"from azureml.monitoring import ModelDataCollector```\n",
|
"from azureml.monitoring import ModelDataCollector```\n",
|
||||||
"### b. In your init function add:\n",
|
"### b. In your init function add:\n",
|
||||||
"```python \n",
|
"```python \n",
|
||||||
"global inputs_dc, prediction_d\n",
|
"global inputs_dc, prediction_d\n",
|
||||||
"inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\", \"feat4\", \"feat5\", \"Feat6\"])\n",
|
"inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\", \"feat4\", \"feat5\", \"Feat6\"])\n",
|
||||||
"prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n",
|
"prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n",
|
||||||
" \n",
|
" \n",
|
||||||
"* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n",
|
"* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n",
|
||||||
"* CorrelationId: is an optional parameter, you do not need to set it up if your model doesn't require it. Having a correlationId in place does help you for easier mapping with other data. (Examples include: LoanNumber, CustomerId, etc.)\n",
|
"* CorrelationId: is an optional parameter, you do not need to set it up if your model doesn't require it. Having a correlationId in place does help you for easier mapping with other data. (Examples include: LoanNumber, CustomerId, etc.)\n",
|
||||||
"* Feature Names: These need to be set up in the order of your features in order for them to have column names when the .csv is created.\n",
|
"* Feature Names: These need to be set up in the order of your features in order for them to have column names when the .csv is created.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### c. In your run function add:\n",
|
"### c. In your run function add:\n",
|
||||||
"```python\n",
|
"```python\n",
|
||||||
"inputs_dc.collect(data)\n",
|
"inputs_dc.collect(data)\n",
|
||||||
"prediction_dc.collect(result)```"
|
"prediction_dc.collect(result)```"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import numpy \n",
|
"import numpy \n",
|
||||||
"from sklearn.externals import joblib\n",
|
"from sklearn.externals import joblib\n",
|
||||||
"from sklearn.linear_model import Ridge\n",
|
"from sklearn.linear_model import Ridge\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"from azureml.monitoring import ModelDataCollector\n",
|
"from azureml.monitoring import ModelDataCollector\n",
|
||||||
"import time\n",
|
"import time\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def init():\n",
|
"def init():\n",
|
||||||
" global model\n",
|
" global model\n",
|
||||||
" print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n",
|
" print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n",
|
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n",
|
||||||
" # this call should return the path to the model.pkl file on the local disk.\n",
|
" # this call should return the path to the model.pkl file on the local disk.\n",
|
||||||
" model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n",
|
" model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n",
|
||||||
" # deserialize the model file back into a sklearn model\n",
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
" model = joblib.load(model_path)\n",
|
" model = joblib.load(model_path)\n",
|
||||||
" global inputs_dc, prediction_dc\n",
|
" global inputs_dc, prediction_dc\n",
|
||||||
" # this setup will help us save our inputs under the \"inputs\" path in our Azure Blob\n",
|
" # this setup will help us save our inputs under the \"inputs\" path in our Azure Blob\n",
|
||||||
" inputs_dc = ModelDataCollector(model_name=\"sklearn_regression_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\"]) \n",
|
" inputs_dc = ModelDataCollector(model_name=\"sklearn_regression_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\"]) \n",
|
||||||
" # this setup will help us save our ipredictions under the \"predictions\" path in our Azure Blob\n",
|
" # this setup will help us save our ipredictions under the \"predictions\" path in our Azure Blob\n",
|
||||||
" prediction_dc = ModelDataCollector(\"sklearn_regression_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"]) \n",
|
" prediction_dc = ModelDataCollector(\"sklearn_regression_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"]) \n",
|
||||||
" \n",
|
" \n",
|
||||||
"# note you can pass in multiple rows for scoring\n",
|
"# note you can pass in multiple rows for scoring\n",
|
||||||
"def run(raw_data):\n",
|
"def run(raw_data):\n",
|
||||||
" global inputs_dc, prediction_dc\n",
|
" global inputs_dc, prediction_dc\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" data = json.loads(raw_data)['data']\n",
|
" data = json.loads(raw_data)['data']\n",
|
||||||
" data = numpy.array(data)\n",
|
" data = numpy.array(data)\n",
|
||||||
" result = model.predict(data)\n",
|
" result = model.predict(data)\n",
|
||||||
" print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n",
|
" print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
" inputs_dc.collect(data) #this call is saving our input data into our blob\n",
|
" inputs_dc.collect(data) #this call is saving our input data into our blob\n",
|
||||||
" prediction_dc.collect(result)#this call is saving our prediction data into our blob\n",
|
" prediction_dc.collect(result)#this call is saving our prediction data into our blob\n",
|
||||||
" print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n",
|
" print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
" # you can return any data type as long as it is JSON-serializable\n",
|
" # you can return any data type as long as it is JSON-serializable\n",
|
||||||
" return result.tolist()\n",
|
" return result.tolist()\n",
|
||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" error = str(e)\n",
|
" error = str(e)\n",
|
||||||
" print (error + time.strftime(\"%H:%M:%S\"))\n",
|
" print (error + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
" return error"
|
" return error"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 5. *Update your myenv.yml file with the required module*"
|
"## 5. *Update your myenv.yml file with the required module*"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
||||||
"myenv.add_pip_package(\"azureml-monitoring\")\n",
|
"myenv.add_pip_package(\"azureml-monitoring\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 6. Create your new Image"
|
"## 6. Create your new Image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import ContainerImage\n",
|
"from azureml.core.image import ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
" runtime = \"python\",\n",
|
" runtime = \"python\",\n",
|
||||||
" conda_file = \"myenv.yml\",\n",
|
" conda_file = \"myenv.yml\",\n",
|
||||||
" description = \"Image with ridge regression model\",\n",
|
" description = \"Image with ridge regression model\",\n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image = ContainerImage.create(name = \"myimage1\",\n",
|
"image = ContainerImage.create(name = \"myimage1\",\n",
|
||||||
" # this is the model object\n",
|
" # this is the model object\n",
|
||||||
" models = [model],\n",
|
" models = [model],\n",
|
||||||
" image_config = image_config,\n",
|
" image_config = image_config,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image.wait_for_creation(show_output = True)"
|
"image.wait_for_creation(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(model.name, model.description, model.version)"
|
"print(model.name, model.description, model.version)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 7. Deploy to AKS service"
|
"## 7. Deploy to AKS service"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create AKS compute if you haven't done so."
|
"### Create AKS compute if you haven't done so."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Use the default configuration (can also provide parameters to customize)\n",
|
"# Use the default configuration (can also provide parameters to customize)\n",
|
||||||
"prov_config = AksCompute.provisioning_configuration()\n",
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aks_name = 'my-aks-test1' \n",
|
"aks_name = 'my-aks-test1' \n",
|
||||||
"# Create the cluster\n",
|
"# Create the cluster\n",
|
||||||
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
||||||
" name = aks_name, \n",
|
" name = aks_name, \n",
|
||||||
" provisioning_configuration = prov_config)"
|
" provisioning_configuration = prov_config)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"aks_target.wait_for_completion(show_output = True)\n",
|
"aks_target.wait_for_completion(show_output = True)\n",
|
||||||
"print(aks_target.provisioning_state)\n",
|
"print(aks_target.provisioning_state)\n",
|
||||||
"print(aks_target.provisioning_errors)"
|
"print(aks_target.provisioning_errors)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"If you already have a cluster you can attach the service to it:"
|
"If you already have a cluster you can attach the service to it:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"scrolled": true
|
"scrolled": true
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"```python \n",
|
"```python \n",
|
||||||
" %%time\n",
|
" %%time\n",
|
||||||
" resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n",
|
" resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n",
|
||||||
" create_name= 'myaks4'\n",
|
" create_name= 'myaks4'\n",
|
||||||
" attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
|
" attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
|
||||||
" aks_target = ComputeTarget.attach(workspace = ws, \n",
|
" aks_target = ComputeTarget.attach(workspace = ws, \n",
|
||||||
" name = create_name, \n",
|
" name = create_name, \n",
|
||||||
" attach_configuration=attach_config)\n",
|
" attach_configuration=attach_config)\n",
|
||||||
" ## Wait for the operation to complete\n",
|
" ## Wait for the operation to complete\n",
|
||||||
" aks_target.wait_for_provisioning(True)```"
|
" aks_target.wait_for_provisioning(True)```"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### a. *Activate Data Collection and App Insights through updating AKS Webservice configuration*\n",
|
"### a. *Activate Data Collection and App Insights through updating AKS Webservice configuration*\n",
|
||||||
"In order to enable Data Collection and App Insights in your service you will need to update your AKS configuration file:"
|
"In order to enable Data Collection and App Insights in your service you will need to update your AKS configuration file:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Set the web service configuration\n",
|
"#Set the web service configuration\n",
|
||||||
"aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)"
|
"aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### b. Deploy your service"
|
"### b. Deploy your service"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"if aks_target.provisioning_state== \"Succeeded\": \n",
|
"if aks_target.provisioning_state== \"Succeeded\": \n",
|
||||||
" aks_service_name ='aks-w-dc0'\n",
|
" aks_service_name ='aks-w-dc0'\n",
|
||||||
" aks_service = Webservice.deploy_from_image(workspace = ws, \n",
|
" aks_service = Webservice.deploy_from_image(workspace = ws, \n",
|
||||||
" name = aks_service_name,\n",
|
" name = aks_service_name,\n",
|
||||||
" image = image,\n",
|
" image = image,\n",
|
||||||
" deployment_config = aks_config,\n",
|
" deployment_config = aks_config,\n",
|
||||||
" deployment_target = aks_target\n",
|
" deployment_target = aks_target\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
" aks_service.wait_for_deployment(show_output = True)\n",
|
" aks_service.wait_for_deployment(show_output = True)\n",
|
||||||
" print(aks_service.state)\n",
|
" print(aks_service.state)\n",
|
||||||
"else: \n",
|
"else: \n",
|
||||||
" raise ValueError(\"aks provisioning failed, can't deploy service\")"
|
" raise ValueError(\"aks provisioning failed, can't deploy service\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 8. Test your service and send some data\n",
|
"## 8. Test your service and send some data\n",
|
||||||
"**Note**: It will take around 15 mins for your data to appear in your blob.\n",
|
"**Note**: It will take around 15 mins for your data to appear in your blob.\n",
|
||||||
"The data will appear in your Azure Blob following this format:\n",
|
"The data will appear in your Azure Blob following this format:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"/modeldata/subscriptionid/resourcegroupname/workspacename/webservicename/modelname/modelversion/identifier/year/month/day/data.csv "
|
"/modeldata/subscriptionid/resourcegroupname/workspacename/webservicename/modelname/modelversion/identifier/year/month/day/data.csv "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
"test_sample = json.dumps({'data': [\n",
|
"test_sample = json.dumps({'data': [\n",
|
||||||
" [1,2,3,4,54,6,7,8,88,10], \n",
|
" [1,2,3,4,54,6,7,8,88,10], \n",
|
||||||
" [10,9,8,37,36,45,4,33,2,1]\n",
|
" [10,9,8,37,36,45,4,33,2,1]\n",
|
||||||
"]})\n",
|
"]})\n",
|
||||||
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
|
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"if aks_service.state == \"Healthy\":\n",
|
"if aks_service.state == \"Healthy\":\n",
|
||||||
" prediction = aks_service.run(input_data=test_sample)\n",
|
" prediction = aks_service.run(input_data=test_sample)\n",
|
||||||
" print(prediction)\n",
|
" print(prediction)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" raise ValueError(\"Service deployment isn't healthy, can't call the service\")"
|
" raise ValueError(\"Service deployment isn't healthy, can't call the service\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 9. Validate you data and analyze it\n",
|
"## 9. Validate you data and analyze it\n",
|
||||||
"You can look into your data following this path format in your Azure Blob (it takes up to 15 minutes for the data to appear):\n",
|
"You can look into your data following this path format in your Azure Blob (it takes up to 15 minutes for the data to appear):\n",
|
||||||
"\n",
|
"\n",
|
||||||
"/modeldata/**subscriptionid>**/**resourcegroupname>**/**workspacename>**/**webservicename>**/**modelname>**/**modelversion>>**/**identifier>**/*year/month/day*/data.csv \n",
|
"/modeldata/**subscriptionid>**/**resourcegroupname>**/**workspacename>**/**webservicename>**/**modelname>**/**modelversion>>**/**identifier>**/*year/month/day*/data.csv \n",
|
||||||
"\n",
|
"\n",
|
||||||
"For doing further analysis you have multiple options:"
|
"For doing further analysis you have multiple options:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### a. Create DataBricks cluter and connect it to your blob\n",
|
"### a. Create DataBricks cluter and connect it to your blob\n",
|
||||||
"https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal or in your databricks workspace you can look for the template \"Azure Blob Storage Import Example Notebook\".\n",
|
"https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal or in your databricks workspace you can look for the template \"Azure Blob Storage Import Example Notebook\".\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here is an example for setting up the file location to extract the relevant data:\n",
|
"Here is an example for setting up the file location to extract the relevant data:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<code> file_location = \"wasbs://mycontainer@storageaccountname.blob.core.windows.net/unknown/unknown/unknown-bigdataset-unknown/my_iterate_parking_inputs/2018/°/°/data.csv\" \n",
|
"<code> file_location = \"wasbs://mycontainer@storageaccountname.blob.core.windows.net/unknown/unknown/unknown-bigdataset-unknown/my_iterate_parking_inputs/2018/°/°/data.csv\" \n",
|
||||||
"file_type = \"csv\"</code>\n"
|
"file_type = \"csv\"</code>\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### b. Connect Blob to Power Bi (Small Data only)\n",
|
"### b. Connect Blob to Power Bi (Small Data only)\n",
|
||||||
"1. Download and Open PowerBi Desktop\n",
|
"1. Download and Open PowerBi Desktop\n",
|
||||||
"2. Select “Get Data” and click on “Azure Blob Storage” >> Connect\n",
|
"2. Select \u201cGet Data\u201d and click on \u201cAzure Blob Storage\u201d >> Connect\n",
|
||||||
"3. Add your storage account and enter your storage key.\n",
|
"3. Add your storage account and enter your storage key.\n",
|
||||||
"4. Select the container where your Data Collection is stored and click on Edit. \n",
|
"4. Select the container where your Data Collection is stored and click on Edit. \n",
|
||||||
"5. In the query editor, click under “Name” column and add your Storage account Model path into the filter. Note: if you want to only look into files from a specific year or month, just expand the filter path. For example, just look into March data: /modeldata/subscriptionid>/resourcegroupname>/workspacename>/webservicename>/modelname>/modelversion>/identifier>/year>/3\n",
|
"5. In the query editor, click under \u201cName\u201d column and add your Storage account Model path into the filter. Note: if you want to only look into files from a specific year or month, just expand the filter path. For example, just look into March data: /modeldata/subscriptionid>/resourcegroupname>/workspacename>/webservicename>/modelname>/modelversion>/identifier>/year>/3\n",
|
||||||
"6. Click on the double arrow aside the “Content” column to combine the files. \n",
|
"6. Click on the double arrow aside the \u201cContent\u201d column to combine the files. \n",
|
||||||
"7. Click OK and the data will preload.\n",
|
"7. Click OK and the data will preload.\n",
|
||||||
"8. You can now click Close and Apply and start building your custom reports on your Model Input data."
|
"8. You can now click Close and Apply and start building your custom reports on your Model Input data."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Disable Data Collection"
|
"# Disable Data Collection"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"aks_service.update(collect_model_data=False)"
|
"aks_service.update(collect_model_data=False)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Clean up"
|
"## Clean up"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"aks_service.delete()\n",
|
"aks_service.delete()\n",
|
||||||
"image.delete()\n",
|
"image.delete()\n",
|
||||||
"model.delete()"
|
"model.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "marthalc"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python [default]",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python3"
|
"name": "marthalc"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -4,7 +4,7 @@ These tutorials show how to create and deploy Open Neural Network eXchange ([ONN
|
|||||||
|
|
||||||
## Tutorials
|
## Tutorials
|
||||||
|
|
||||||
0. [Configure your Azure Machine Learning Workspace](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb)
|
0. [Configure your Azure Machine Learning Workspace](../../../configuration.ipynb)
|
||||||
|
|
||||||
#### Obtain models from the [ONNX Model Zoo](https://github.com/onnx/models) and deploy with ONNX Runtime Inference
|
#### Obtain models from the [ONNX Model Zoo](https://github.com/onnx/models) and deploy with ONNX Runtime Inference
|
||||||
1. [Handwritten Digit Classification (MNIST)](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb)
|
1. [Handwritten Digit Classification (MNIST)](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb)
|
||||||
|
|||||||
@@ -1,435 +1,435 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# YOLO Real-time Object Detection using ONNX on AzureML\n",
|
"# YOLO Real-time Object Detection using ONNX on AzureML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This example shows how to convert the TinyYOLO model from CoreML to ONNX and operationalize it as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
|
"This example shows how to convert the TinyYOLO model from CoreML to ONNX and operationalize it as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## What is ONNX\n",
|
"## What is ONNX\n",
|
||||||
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
|
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## YOLO Details\n",
|
"## YOLO Details\n",
|
||||||
"You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. For more information about YOLO, please visit the [YOLO website](https://pjreddie.com/darknet/yolo/)."
|
"You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. For more information about YOLO, please visit the [YOLO website](https://pjreddie.com/darknet/yolo/)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To make the best use of your time, make sure you have done the following:\n",
|
"To make the best use of your time, make sure you have done the following:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
||||||
"* Go through the [00.configuration.ipynb](../00.configuration.ipynb) notebook to:\n",
|
"* Go through the [configuration](../../../configuration.ipynb) notebook to:\n",
|
||||||
" * install the AML SDK\n",
|
" * install the AML SDK\n",
|
||||||
" * create a workspace and its configuration file (config.json)"
|
" * create a workspace and its configuration file (config.json)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Install necessary packages\n",
|
"#### Install necessary packages\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You'll need to run the following commands to use this tutorial:\n",
|
"You'll need to run the following commands to use this tutorial:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```sh\n",
|
"```sh\n",
|
||||||
"pip install onnxmltools\n",
|
"pip install onnxmltools\n",
|
||||||
"pip install coremltools # use this on Linux and Mac\n",
|
"pip install coremltools # use this on Linux and Mac\n",
|
||||||
"pip install git+https://github.com/apple/coremltools # use this on Windows\n",
|
"pip install git+https://github.com/apple/coremltools # use this on Windows\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Convert model to ONNX\n",
|
"## Convert model to ONNX\n",
|
||||||
"\n",
|
"\n",
|
||||||
"First we download the CoreML model. We use the CoreML model listed at https://coreml.store/tinyyolo. This may take a few minutes."
|
"First we download the CoreML model. We use the CoreML model from [Matthijs Hollemans's tutorial](https://github.com/hollance/YOLO-CoreML-MPSNNGraph). This may take a few minutes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import urllib.request\n",
|
"import urllib.request\n",
|
||||||
"\n",
|
"\n",
|
||||||
"onnx_model_url = \"https://s3-us-west-2.amazonaws.com/coreml-models/TinyYOLO.mlmodel\"\n",
|
"coreml_model_url = \"https://github.com/hollance/YOLO-CoreML-MPSNNGraph/raw/master/TinyYOLO-CoreML/TinyYOLO-CoreML/TinyYOLO.mlmodel\"\n",
|
||||||
"urllib.request.urlretrieve(onnx_model_url, filename=\"TinyYOLO.mlmodel\")\n"
|
"urllib.request.urlretrieve(coreml_model_url, filename=\"TinyYOLO.mlmodel\")\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Then we use ONNXMLTools to convert the model."
|
"Then we use ONNXMLTools to convert the model."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import onnxmltools\n",
|
"import onnxmltools\n",
|
||||||
"import coremltools\n",
|
"import coremltools\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Load a CoreML model\n",
|
"# Load a CoreML model\n",
|
||||||
"coreml_model = coremltools.utils.load_spec('TinyYOLO.mlmodel')\n",
|
"coreml_model = coremltools.utils.load_spec('TinyYOLO.mlmodel')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Convert from CoreML into ONNX\n",
|
"# Convert from CoreML into ONNX\n",
|
||||||
"onnx_model = onnxmltools.convert_coreml(coreml_model, 'TinyYOLOv2')\n",
|
"onnx_model = onnxmltools.convert_coreml(coreml_model, 'TinyYOLOv2')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Save ONNX model\n",
|
"# Save ONNX model\n",
|
||||||
"onnxmltools.utils.save_model(onnx_model, 'tinyyolov2.onnx')\n",
|
"onnxmltools.utils.save_model(onnx_model, 'tinyyolov2.onnx')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"print(os.path.getsize('tinyyolov2.onnx'))"
|
"print(os.path.getsize('tinyyolov2.onnx'))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploying as a web service with Azure ML\n",
|
"## Deploying as a web service with Azure ML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Load Azure ML workspace\n",
|
"### Load Azure ML workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
|
"print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Registering your model with Azure ML\n",
|
"### Registering your model with Azure ML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now we upload the model and register it in the workspace."
|
"Now we upload the model and register it in the workspace."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"model = Model.register(model_path = \"tinyyolov2.onnx\",\n",
|
"model = Model.register(model_path = \"tinyyolov2.onnx\",\n",
|
||||||
" model_name = \"tinyyolov2\",\n",
|
" model_name = \"tinyyolov2\",\n",
|
||||||
" tags = {\"onnx\": \"demo\"},\n",
|
" tags = {\"onnx\": \"demo\"},\n",
|
||||||
" description = \"TinyYOLO\",\n",
|
" description = \"TinyYOLO\",\n",
|
||||||
" workspace = ws)"
|
" workspace = ws)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Displaying your registered models\n",
|
"#### Displaying your registered models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can optionally list out all the models that you have registered in this workspace."
|
"You can optionally list out all the models that you have registered in this workspace."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"models = ws.models\n",
|
"models = ws.models\n",
|
||||||
"for name, m in models.items():\n",
|
"for name, m in models.items():\n",
|
||||||
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Write scoring file\n",
|
"### Write scoring file\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object."
|
"We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import time\n",
|
"import time\n",
|
||||||
"import sys\n",
|
"import sys\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"import numpy as np # we're going to use numpy to process input and output data\n",
|
"import numpy as np # we're going to use numpy to process input and output data\n",
|
||||||
"import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n",
|
"import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def init():\n",
|
"def init():\n",
|
||||||
" global session\n",
|
" global session\n",
|
||||||
" model = Model.get_model_path(model_name = 'tinyyolov2')\n",
|
" model = Model.get_model_path(model_name = 'tinyyolov2')\n",
|
||||||
" session = onnxruntime.InferenceSession(model)\n",
|
" session = onnxruntime.InferenceSession(model)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def preprocess(input_data_json):\n",
|
"def preprocess(input_data_json):\n",
|
||||||
" # convert the JSON data into the tensor input\n",
|
" # convert the JSON data into the tensor input\n",
|
||||||
" return np.array(json.loads(input_data_json)['data']).astype('float32')\n",
|
" return np.array(json.loads(input_data_json)['data']).astype('float32')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def postprocess(result):\n",
|
"def postprocess(result):\n",
|
||||||
" return np.array(result).tolist()\n",
|
" return np.array(result).tolist()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def run(input_data_json):\n",
|
"def run(input_data_json):\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" start = time.time() # start timer\n",
|
" start = time.time() # start timer\n",
|
||||||
" input_data = preprocess(input_data_json)\n",
|
" input_data = preprocess(input_data_json)\n",
|
||||||
" input_name = session.get_inputs()[0].name # get the id of the first input of the model \n",
|
" input_name = session.get_inputs()[0].name # get the id of the first input of the model \n",
|
||||||
" result = session.run([], {input_name: input_data})\n",
|
" result = session.run([], {input_name: input_data})\n",
|
||||||
" end = time.time() # stop timer\n",
|
" end = time.time() # stop timer\n",
|
||||||
" return {\"result\": postprocess(result),\n",
|
" return {\"result\": postprocess(result),\n",
|
||||||
" \"time\": end - start}\n",
|
" \"time\": end - start}\n",
|
||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return {\"error\": result}"
|
" return {\"error\": result}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create container image\n",
|
"### Create container image\n",
|
||||||
"First we create a YAML file that specifies which dependencies we would like to see in our container."
|
"First we create a YAML file that specifies which dependencies we would like to see in our container."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n",
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Then we have Azure ML create the container. This step will likely take a few minutes."
|
"Then we have Azure ML create the container. This step will likely take a few minutes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import ContainerImage\n",
|
"from azureml.core.image import ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
" runtime = \"python\",\n",
|
" runtime = \"python\",\n",
|
||||||
" conda_file = \"myenv.yml\",\n",
|
" conda_file = \"myenv.yml\",\n",
|
||||||
" description = \"TinyYOLO ONNX Demo\",\n",
|
" description = \"TinyYOLO ONNX Demo\",\n",
|
||||||
" tags = {\"demo\": \"onnx\"}\n",
|
" tags = {\"demo\": \"onnx\"}\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image = ContainerImage.create(name = \"onnxyolo\",\n",
|
"image = ContainerImage.create(name = \"onnxyolo\",\n",
|
||||||
" models = [model],\n",
|
" models = [model],\n",
|
||||||
" image_config = image_config,\n",
|
" image_config = image_config,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image.wait_for_creation(show_output = True)"
|
"image.wait_for_creation(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"In case you need to debug your code, the next line of code accesses the log file."
|
"In case you need to debug your code, the next line of code accesses the log file."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(image.image_build_log_uri)"
|
"print(image.image_build_log_uri)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We're all set! Let's get our model chugging.\n",
|
"We're all set! Let's get our model chugging.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Deploy the container image"
|
"### Deploy the container image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'demo': 'onnx'}, \n",
|
" tags = {'demo': 'onnx'}, \n",
|
||||||
" description = 'web service for TinyYOLO ONNX model')"
|
" description = 'web service for TinyYOLO ONNX model')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The following cell will likely take a few minutes to run as well."
|
"The following cell will likely take a few minutes to run as well."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"from random import randint\n",
|
"from random import randint\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service_name = 'onnx-tinyyolo'+str(randint(0,100))\n",
|
"aci_service_name = 'onnx-tinyyolo'+str(randint(0,100))\n",
|
||||||
"print(\"Service\", aci_service_name)\n",
|
"print(\"Service\", aci_service_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
" image = image,\n",
|
" image = image,\n",
|
||||||
" name = aci_service_name,\n",
|
" name = aci_service_name,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
|
"In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"if aci_service.state != 'Healthy':\n",
|
"if aci_service.state != 'Healthy':\n",
|
||||||
" # run this command for debugging.\n",
|
" # run this command for debugging.\n",
|
||||||
" print(aci_service.get_logs())\n",
|
" print(aci_service.get_logs())\n",
|
||||||
" aci_service.delete()"
|
" aci_service.delete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Success!\n",
|
"## Success!\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you've made it this far, you've deployed a working web service that does object detection using an ONNX model. You can get the URL for the webservice with the code below."
|
"If you've made it this far, you've deployed a working web service that does object detection using an ONNX model. You can get the URL for the webservice with the code below."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(aci_service.scoring_uri)"
|
"print(aci_service.scoring_uri)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"When you are eventually done using the web service, remember to delete it."
|
"When you are eventually done using the web service, remember to delete it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"#aci_service.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "onnx"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "onnx"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.5.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.5.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,419 +1,419 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# ResNet50 Image Classification using ONNX and AzureML\n",
|
"# ResNet50 Image Classification using ONNX and AzureML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This example shows how to deploy the ResNet50 ONNX model as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
|
"This example shows how to deploy the ResNet50 ONNX model as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## What is ONNX\n",
|
"## What is ONNX\n",
|
||||||
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
|
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## ResNet50 Details\n",
|
"## ResNet50 Details\n",
|
||||||
"ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/models/image_classification/resnet). "
|
"ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/models/image_classification/resnet). "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To make the best use of your time, make sure you have done the following:\n",
|
"To make the best use of your time, make sure you have done the following:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
||||||
"* Go through the [00.configuration.ipynb](../00.configuration.ipynb) notebook to:\n",
|
"* Go through the [configuration notebook](../../../configuration.ipynb) to:\n",
|
||||||
" * install the AML SDK\n",
|
" * install the AML SDK\n",
|
||||||
" * create a workspace and its configuration file (config.json)"
|
" * create a workspace and its configuration file (config.json)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Download pre-trained ONNX model from ONNX Model Zoo.\n",
|
"#### Download pre-trained ONNX model from ONNX Model Zoo.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Download the [ResNet50v2 model and test data](https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz) and extract it in the same folder as this tutorial notebook.\n"
|
"Download the [ResNet50v2 model and test data](https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz) and extract it in the same folder as this tutorial notebook.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import urllib.request\n",
|
"import urllib.request\n",
|
||||||
"\n",
|
"\n",
|
||||||
"onnx_model_url = \"https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz\"\n",
|
"onnx_model_url = \"https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz\"\n",
|
||||||
"urllib.request.urlretrieve(onnx_model_url, filename=\"resnet50v2.tar.gz\")\n",
|
"urllib.request.urlretrieve(onnx_model_url, filename=\"resnet50v2.tar.gz\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"!tar xvzf resnet50v2.tar.gz"
|
"!tar xvzf resnet50v2.tar.gz"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploying as a web service with Azure ML"
|
"## Deploying as a web service with Azure ML"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Load your Azure ML workspace\n",
|
"### Load your Azure ML workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
|
"print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register your model with Azure ML\n",
|
"### Register your model with Azure ML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now we upload the model and register it in the workspace."
|
"Now we upload the model and register it in the workspace."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"model = Model.register(model_path = \"resnet50v2/resnet50v2.onnx\",\n",
|
"model = Model.register(model_path = \"resnet50v2/resnet50v2.onnx\",\n",
|
||||||
" model_name = \"resnet50v2\",\n",
|
" model_name = \"resnet50v2\",\n",
|
||||||
" tags = {\"onnx\": \"demo\"},\n",
|
" tags = {\"onnx\": \"demo\"},\n",
|
||||||
" description = \"ResNet50v2 from ONNX Model Zoo\",\n",
|
" description = \"ResNet50v2 from ONNX Model Zoo\",\n",
|
||||||
" workspace = ws)"
|
" workspace = ws)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Displaying your registered models\n",
|
"#### Displaying your registered models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can optionally list out all the models that you have registered in this workspace."
|
"You can optionally list out all the models that you have registered in this workspace."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"models = ws.models\n",
|
"models = ws.models\n",
|
||||||
"for name, m in models.items():\n",
|
"for name, m in models.items():\n",
|
||||||
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Write scoring file\n",
|
"### Write scoring file\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object."
|
"We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import time\n",
|
"import time\n",
|
||||||
"import sys\n",
|
"import sys\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"import numpy as np # we're going to use numpy to process input and output data\n",
|
"import numpy as np # we're going to use numpy to process input and output data\n",
|
||||||
"import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n",
|
"import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def softmax(x):\n",
|
"def softmax(x):\n",
|
||||||
" x = x.reshape(-1)\n",
|
" x = x.reshape(-1)\n",
|
||||||
" e_x = np.exp(x - np.max(x))\n",
|
" e_x = np.exp(x - np.max(x))\n",
|
||||||
" return e_x / e_x.sum(axis=0)\n",
|
" return e_x / e_x.sum(axis=0)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def init():\n",
|
"def init():\n",
|
||||||
" global session\n",
|
" global session\n",
|
||||||
" model = Model.get_model_path(model_name = 'resnet50v2')\n",
|
" model = Model.get_model_path(model_name = 'resnet50v2')\n",
|
||||||
" session = onnxruntime.InferenceSession(model, None)\n",
|
" session = onnxruntime.InferenceSession(model, None)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def preprocess(input_data_json):\n",
|
"def preprocess(input_data_json):\n",
|
||||||
" # convert the JSON data into the tensor input\n",
|
" # convert the JSON data into the tensor input\n",
|
||||||
" img_data = np.array(json.loads(input_data_json)['data']).astype('float32')\n",
|
" img_data = np.array(json.loads(input_data_json)['data']).astype('float32')\n",
|
||||||
" \n",
|
" \n",
|
||||||
" #normalize\n",
|
" #normalize\n",
|
||||||
" mean_vec = np.array([0.485, 0.456, 0.406])\n",
|
" mean_vec = np.array([0.485, 0.456, 0.406])\n",
|
||||||
" stddev_vec = np.array([0.229, 0.224, 0.225])\n",
|
" stddev_vec = np.array([0.229, 0.224, 0.225])\n",
|
||||||
" norm_img_data = np.zeros(img_data.shape).astype('float32')\n",
|
" norm_img_data = np.zeros(img_data.shape).astype('float32')\n",
|
||||||
" for i in range(img_data.shape[0]):\n",
|
" for i in range(img_data.shape[0]):\n",
|
||||||
" norm_img_data[i,:,:] = (img_data[i,:,:]/255 - mean_vec[i]) / stddev_vec[i]\n",
|
" norm_img_data[i,:,:] = (img_data[i,:,:]/255 - mean_vec[i]) / stddev_vec[i]\n",
|
||||||
"\n",
|
"\n",
|
||||||
" return norm_img_data\n",
|
" return norm_img_data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def postprocess(result):\n",
|
"def postprocess(result):\n",
|
||||||
" return softmax(np.array(result)).tolist()\n",
|
" return softmax(np.array(result)).tolist()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def run(input_data_json):\n",
|
"def run(input_data_json):\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" start = time.time()\n",
|
" start = time.time()\n",
|
||||||
" # load in our data which is expected as NCHW 224x224 image\n",
|
" # load in our data which is expected as NCHW 224x224 image\n",
|
||||||
" input_data = preprocess(input_data_json)\n",
|
" input_data = preprocess(input_data_json)\n",
|
||||||
" input_name = session.get_inputs()[0].name # get the id of the first input of the model \n",
|
" input_name = session.get_inputs()[0].name # get the id of the first input of the model \n",
|
||||||
" result = session.run([], {input_name: input_data})\n",
|
" result = session.run([], {input_name: input_data})\n",
|
||||||
" end = time.time() # stop timer\n",
|
" end = time.time() # stop timer\n",
|
||||||
" return {\"result\": postprocess(result),\n",
|
" return {\"result\": postprocess(result),\n",
|
||||||
" \"time\": end - start}\n",
|
" \"time\": end - start}\n",
|
||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return {\"error\": result}"
|
" return {\"error\": result}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create container image"
|
"### Create container image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"First we create a YAML file that specifies which dependencies we would like to see in our container."
|
"First we create a YAML file that specifies which dependencies we would like to see in our container."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n",
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Then we have Azure ML create the container. This step will likely take a few minutes."
|
"Then we have Azure ML create the container. This step will likely take a few minutes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import ContainerImage\n",
|
"from azureml.core.image import ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
" runtime = \"python\",\n",
|
" runtime = \"python\",\n",
|
||||||
" conda_file = \"myenv.yml\",\n",
|
" conda_file = \"myenv.yml\",\n",
|
||||||
" description = \"ONNX ResNet50 Demo\",\n",
|
" description = \"ONNX ResNet50 Demo\",\n",
|
||||||
" tags = {\"demo\": \"onnx\"}\n",
|
" tags = {\"demo\": \"onnx\"}\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image = ContainerImage.create(name = \"onnxresnet50v2\",\n",
|
"image = ContainerImage.create(name = \"onnxresnet50v2\",\n",
|
||||||
" models = [model],\n",
|
" models = [model],\n",
|
||||||
" image_config = image_config,\n",
|
" image_config = image_config,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image.wait_for_creation(show_output = True)"
|
"image.wait_for_creation(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"In case you need to debug your code, the next line of code accesses the log file."
|
"In case you need to debug your code, the next line of code accesses the log file."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(image.image_build_log_uri)"
|
"print(image.image_build_log_uri)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We're all set! Let's get our model chugging.\n",
|
"We're all set! Let's get our model chugging.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Deploy the container image"
|
"### Deploy the container image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'demo': 'onnx'}, \n",
|
" tags = {'demo': 'onnx'}, \n",
|
||||||
" description = 'web service for ResNet50 ONNX model')"
|
" description = 'web service for ResNet50 ONNX model')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The following cell will likely take a few minutes to run as well."
|
"The following cell will likely take a few minutes to run as well."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"from random import randint\n",
|
"from random import randint\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
|
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
|
||||||
"print(\"Service\", aci_service_name)\n",
|
"print(\"Service\", aci_service_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
" image = image,\n",
|
" image = image,\n",
|
||||||
" name = aci_service_name,\n",
|
" name = aci_service_name,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
|
"In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"if aci_service.state != 'Healthy':\n",
|
"if aci_service.state != 'Healthy':\n",
|
||||||
" # run this command for debugging.\n",
|
" # run this command for debugging.\n",
|
||||||
" print(aci_service.get_logs())\n",
|
" print(aci_service.get_logs())\n",
|
||||||
" aci_service.delete()"
|
" aci_service.delete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Success!\n",
|
"## Success!\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you've made it this far, you've deployed a working web service that does image classification using an ONNX model. You can get the URL for the webservice with the code below."
|
"If you've made it this far, you've deployed a working web service that does image classification using an ONNX model. You can get the URL for the webservice with the code below."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(aci_service.scoring_uri)"
|
"print(aci_service.scoring_uri)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"When you are eventually done using the web service, remember to delete it."
|
"When you are eventually done using the web service, remember to delete it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"#aci_service.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "onnx"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "onnx"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.5.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.5.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,343 +1,343 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Deploying a web service to Azure Kubernetes Service (AKS)\n",
|
"# Deploying a web service to Azure Kubernetes Service (AKS)\n",
|
||||||
"This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n",
|
"This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n",
|
||||||
"We then test and delete the service, image and model."
|
"We then test and delete the service, image and model."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
"from azureml.core.image import Image\n",
|
"from azureml.core.image import Image\n",
|
||||||
"from azureml.core.model import Model"
|
"from azureml.core.model import Model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"print(azureml.core.VERSION)"
|
"print(azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Get workspace\n",
|
"# Get workspace\n",
|
||||||
"Load existing workspace from the config file info."
|
"Load existing workspace from the config file info."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Register the model\n",
|
"# Register the model\n",
|
||||||
"Register an existing trained model, add descirption and tags."
|
"Register an existing trained model, add descirption and tags."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Register the model\n",
|
"#Register the model\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
|
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
|
||||||
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
|
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(model.name, model.description, model.version)"
|
"print(model.name, model.description, model.version)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Create an image\n",
|
"# Create an image\n",
|
||||||
"Create an image using the registered model the script that will load and run the model."
|
"Create an image using the registered model the script that will load and run the model."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import numpy\n",
|
"import numpy\n",
|
||||||
"from sklearn.externals import joblib\n",
|
"from sklearn.externals import joblib\n",
|
||||||
"from sklearn.linear_model import Ridge\n",
|
"from sklearn.linear_model import Ridge\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def init():\n",
|
"def init():\n",
|
||||||
" global model\n",
|
" global model\n",
|
||||||
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n",
|
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n",
|
||||||
" # this is a different behavior than before when the code is run locally, even though the code is the same.\n",
|
" # this is a different behavior than before when the code is run locally, even though the code is the same.\n",
|
||||||
" model_path = Model.get_model_path('sklearn_regression_model.pkl')\n",
|
" model_path = Model.get_model_path('sklearn_regression_model.pkl')\n",
|
||||||
" # deserialize the model file back into a sklearn model\n",
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
" model = joblib.load(model_path)\n",
|
" model = joblib.load(model_path)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# note you can pass in multiple rows for scoring\n",
|
"# note you can pass in multiple rows for scoring\n",
|
||||||
"def run(raw_data):\n",
|
"def run(raw_data):\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" data = json.loads(raw_data)['data']\n",
|
" data = json.loads(raw_data)['data']\n",
|
||||||
" data = numpy.array(data)\n",
|
" data = numpy.array(data)\n",
|
||||||
" result = model.predict(data)\n",
|
" result = model.predict(data)\n",
|
||||||
" # you can return any data type as long as it is JSON-serializable\n",
|
" # you can return any data type as long as it is JSON-serializable\n",
|
||||||
" return result.tolist()\n",
|
" return result.tolist()\n",
|
||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" error = str(e)\n",
|
" error = str(e)\n",
|
||||||
" return error"
|
" return error"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import ContainerImage\n",
|
"from azureml.core.image import ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
" runtime = \"python\",\n",
|
" runtime = \"python\",\n",
|
||||||
" conda_file = \"myenv.yml\",\n",
|
" conda_file = \"myenv.yml\",\n",
|
||||||
" description = \"Image with ridge regression model\",\n",
|
" description = \"Image with ridge regression model\",\n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image = ContainerImage.create(name = \"myimage1\",\n",
|
"image = ContainerImage.create(name = \"myimage1\",\n",
|
||||||
" # this is the model object\n",
|
" # this is the model object\n",
|
||||||
" models = [model],\n",
|
" models = [model],\n",
|
||||||
" image_config = image_config,\n",
|
" image_config = image_config,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image.wait_for_creation(show_output = True)"
|
"image.wait_for_creation(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Provision the AKS Cluster\n",
|
"# Provision the AKS Cluster\n",
|
||||||
"This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it."
|
"This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Use the default configuration (can also provide parameters to customize)\n",
|
"# Use the default configuration (can also provide parameters to customize)\n",
|
||||||
"prov_config = AksCompute.provisioning_configuration()\n",
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aks_name = 'my-aks-9' \n",
|
"aks_name = 'my-aks-9' \n",
|
||||||
"# Create the cluster\n",
|
"# Create the cluster\n",
|
||||||
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
||||||
" name = aks_name, \n",
|
" name = aks_name, \n",
|
||||||
" provisioning_configuration = prov_config)"
|
" provisioning_configuration = prov_config)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"aks_target.wait_for_completion(show_output = True)\n",
|
"aks_target.wait_for_completion(show_output = True)\n",
|
||||||
"print(aks_target.provisioning_state)\n",
|
"print(aks_target.provisioning_state)\n",
|
||||||
"print(aks_target.provisioning_errors)"
|
"print(aks_target.provisioning_errors)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Optional step: Attach existing AKS cluster\n",
|
"## Optional step: Attach existing AKS cluster\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you have existing AKS cluster in your Azure subscription, you can attach it to the Workspace."
|
"If you have existing AKS cluster in your Azure subscription, you can attach it to the Workspace."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"'''\n",
|
"'''\n",
|
||||||
"# Use the default configuration (can also provide parameters to customize)\n",
|
"# Use the default configuration (can also provide parameters to customize)\n",
|
||||||
"resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n",
|
"resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"create_name='my-existing-aks' \n",
|
"create_name='my-existing-aks' \n",
|
||||||
"# Create the cluster\n",
|
"# Create the cluster\n",
|
||||||
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
|
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
|
||||||
"aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n",
|
"aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n",
|
||||||
"# Wait for the operation to complete\n",
|
"# Wait for the operation to complete\n",
|
||||||
"aks_target.wait_for_completion(True)\n",
|
"aks_target.wait_for_completion(True)\n",
|
||||||
"'''"
|
"'''"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Deploy web service to AKS"
|
"# Deploy web service to AKS"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Set the web service configuration (using default here)\n",
|
"#Set the web service configuration (using default here)\n",
|
||||||
"aks_config = AksWebservice.deploy_configuration()"
|
"aks_config = AksWebservice.deploy_configuration()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"aks_service_name ='aks-service-1'\n",
|
"aks_service_name ='aks-service-1'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aks_service = Webservice.deploy_from_image(workspace = ws, \n",
|
"aks_service = Webservice.deploy_from_image(workspace = ws, \n",
|
||||||
" name = aks_service_name,\n",
|
" name = aks_service_name,\n",
|
||||||
" image = image,\n",
|
" image = image,\n",
|
||||||
" deployment_config = aks_config,\n",
|
" deployment_config = aks_config,\n",
|
||||||
" deployment_target = aks_target)\n",
|
" deployment_target = aks_target)\n",
|
||||||
"aks_service.wait_for_deployment(show_output = True)\n",
|
"aks_service.wait_for_deployment(show_output = True)\n",
|
||||||
"print(aks_service.state)"
|
"print(aks_service.state)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Test the web service\n",
|
"# Test the web service\n",
|
||||||
"We test the web sevice by passing data."
|
"We test the web sevice by passing data."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
"test_sample = json.dumps({'data': [\n",
|
"test_sample = json.dumps({'data': [\n",
|
||||||
" [1,2,3,4,5,6,7,8,9,10], \n",
|
" [1,2,3,4,5,6,7,8,9,10], \n",
|
||||||
" [10,9,8,7,6,5,4,3,2,1]\n",
|
" [10,9,8,7,6,5,4,3,2,1]\n",
|
||||||
"]})\n",
|
"]})\n",
|
||||||
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
|
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"prediction = aks_service.run(input_data = test_sample)\n",
|
"prediction = aks_service.run(input_data = test_sample)\n",
|
||||||
"print(prediction)"
|
"print(prediction)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Clean up\n",
|
"# Clean up\n",
|
||||||
"Delete the service, image and model."
|
"Delete the service, image and model."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"aks_service.delete()\n",
|
"aks_service.delete()\n",
|
||||||
"image.delete()\n",
|
"image.delete()\n",
|
||||||
"model.delete()"
|
"model.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "raymondl"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "raymondl"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,420 +1,420 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## 10. Register Model, Create Image and Deploy Service\n",
|
"## Register Model, Create Image and Deploy Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This example shows how to deploy a web service in step-by-step fashion:\n",
|
"This example shows how to deploy a web service in step-by-step fashion:\n",
|
||||||
"\n",
|
"\n",
|
||||||
" 1. Register model\n",
|
" 1. Register model\n",
|
||||||
" 2. Query versions of models and select one to deploy\n",
|
" 2. Query versions of models and select one to deploy\n",
|
||||||
" 3. Create Docker image\n",
|
" 3. Create Docker image\n",
|
||||||
" 4. Query versions of images\n",
|
" 4. Query versions of images\n",
|
||||||
" 5. Deploy the image as web service\n",
|
" 5. Deploy the image as web service\n",
|
||||||
" \n",
|
" \n",
|
||||||
"**IMPORTANT**:\n",
|
"**IMPORTANT**:\n",
|
||||||
" * This notebook requires you to first complete \"01.SDK-101-Train-and-Deploy-to-ACI.ipynb\" Notebook\n",
|
" * This notebook requires you to first complete [train-within-notebook](../../training/train-within-notebook/train-within-notebook.ipynb) example\n",
|
||||||
" \n",
|
" \n",
|
||||||
"The 101 Notebook taught you how to deploy a web service directly from model in one step. This Notebook shows a more advanced approach that gives you more control over model versions and Docker image versions. "
|
"The train-within-notebook example taught you how to deploy a web service directly from model in one step. This Notebook shows a more advanced approach that gives you more control over model versions and Docker image versions. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't."
|
"Make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration."
|
"Initialize a workspace object from persisted configuration."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register Model"
|
"### Register Model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can add tags and descriptions to your models. Note you need to have a `sklearn_linreg_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a model with the same name `sklearn_linreg_model.pkl` in the workspace.\n",
|
"You can add tags and descriptions to your models. Note you need to have a `sklearn_linreg_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a model with the same name `sklearn_linreg_model.pkl` in the workspace.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
|
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"register model from file"
|
"register model from file"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"import sklearn\n",
|
"import sklearn\n",
|
||||||
"\n",
|
"\n",
|
||||||
"library_version = \"sklearn\"+sklearn.__version__.replace(\".\",\"x\")\n",
|
"library_version = \"sklearn\"+sklearn.__version__.replace(\".\",\"x\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n",
|
"model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n",
|
||||||
" model_name = \"sklearn_regression_model.pkl\",\n",
|
" model_name = \"sklearn_regression_model.pkl\",\n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\", 'version': library_version},\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\", 'version': library_version},\n",
|
||||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||||
" workspace = ws)"
|
" workspace = ws)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers."
|
"You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"register model from file"
|
"register model from file"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"regression_models = Model.list(workspace=ws, tags=['area'])\n",
|
"regression_models = Model.list(workspace=ws, tags=['area'])\n",
|
||||||
"for m in regression_models:\n",
|
"for m in regression_models:\n",
|
||||||
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can pick a specific model to deploy"
|
"You can pick a specific model to deploy"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(model.name, model.description, model.version, sep = '\\t')"
|
"print(model.name, model.description, model.version, sep = '\\t')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Docker Image"
|
"### Create Docker Image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_linreg_model.pkl` registered under the workspace. It is NOT referenceing the local file."
|
"Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_linreg_model.pkl` registered under the workspace. It is NOT referenceing the local file."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import numpy\n",
|
"import numpy\n",
|
||||||
"from sklearn.externals import joblib\n",
|
"from sklearn.externals import joblib\n",
|
||||||
"from sklearn.linear_model import Ridge\n",
|
"from sklearn.linear_model import Ridge\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def init():\n",
|
"def init():\n",
|
||||||
" global model\n",
|
" global model\n",
|
||||||
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n",
|
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n",
|
||||||
" # this is a different behavior than before when the code is run locally, even though the code is the same.\n",
|
" # this is a different behavior than before when the code is run locally, even though the code is the same.\n",
|
||||||
" model_path = Model.get_model_path('sklearn_regression_model.pkl')\n",
|
" model_path = Model.get_model_path('sklearn_regression_model.pkl')\n",
|
||||||
" # deserialize the model file back into a sklearn model\n",
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
" model = joblib.load(model_path)\n",
|
" model = joblib.load(model_path)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# note you can pass in multiple rows for scoring\n",
|
"# note you can pass in multiple rows for scoring\n",
|
||||||
"def run(raw_data):\n",
|
"def run(raw_data):\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" data = json.loads(raw_data)['data']\n",
|
" data = json.loads(raw_data)['data']\n",
|
||||||
" data = numpy.array(data)\n",
|
" data = numpy.array(data)\n",
|
||||||
" result = model.predict(data)\n",
|
" result = model.predict(data)\n",
|
||||||
" # you can return any datatype as long as it is JSON-serializable\n",
|
" # you can return any datatype as long as it is JSON-serializable\n",
|
||||||
" return result.tolist()\n",
|
" return result.tolist()\n",
|
||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" error = str(e)\n",
|
" error = str(e)\n",
|
||||||
" return error"
|
" return error"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Note that following command can take few minutes. \n",
|
"Note that following command can take few minutes. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can add tags and descriptions to images. Also, an image can contain multiple models."
|
"You can add tags and descriptions to images. Also, an image can contain multiple models."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create image"
|
"create image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
" execution_script=\"score.py\",\n",
|
" execution_script=\"score.py\",\n",
|
||||||
" conda_file=\"myenv.yml\",\n",
|
" conda_file=\"myenv.yml\",\n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||||
" description = \"Image with ridge regression model\")\n",
|
" description = \"Image with ridge regression model\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image = Image.create(name = \"myimage1\",\n",
|
"image = Image.create(name = \"myimage1\",\n",
|
||||||
" # this is the model object \n",
|
" # this is the model object \n",
|
||||||
" models = [model],\n",
|
" models = [model],\n",
|
||||||
" image_config = image_config, \n",
|
" image_config = image_config, \n",
|
||||||
" workspace = ws)"
|
" workspace = ws)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create image"
|
"create image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"image.wait_for_creation(show_output = True)"
|
"image.wait_for_creation(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"List images by tag and find out the detailed build log for debugging."
|
"List images by tag and find out the detailed build log for debugging."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create image"
|
"create image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"for i in Image.list(workspace = ws,tags = [\"area\"]):\n",
|
"for i in Image.list(workspace = ws,tags = [\"area\"]):\n",
|
||||||
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy image as web service on Azure Container Instance\n",
|
"### Deploy image as web service on Azure Container Instance\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Note that the service creation can take few minutes."
|
"Note that the service creation can take few minutes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"deploy service",
|
"deploy service",
|
||||||
"aci"
|
"aci"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"}, \n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"}, \n",
|
||||||
" description = 'Predict diabetes using regression model')"
|
" description = 'Predict diabetes using regression model')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"deploy service",
|
"deploy service",
|
||||||
"aci"
|
"aci"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service_name = 'my-aci-service-2'\n",
|
"aci_service_name = 'my-aci-service-2'\n",
|
||||||
"print(aci_service_name)\n",
|
"print(aci_service_name)\n",
|
||||||
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
" image = image,\n",
|
" image = image,\n",
|
||||||
" name = aci_service_name,\n",
|
" name = aci_service_name,\n",
|
||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Test web service"
|
"### Test web service"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the web service with some dummy input data to get a prediction."
|
"Call the web service with some dummy input data to get a prediction."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"deploy service",
|
"deploy service",
|
||||||
"aci"
|
"aci"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
"test_sample = json.dumps({'data': [\n",
|
"test_sample = json.dumps({'data': [\n",
|
||||||
" [1,2,3,4,5,6,7,8,9,10], \n",
|
" [1,2,3,4,5,6,7,8,9,10], \n",
|
||||||
" [10,9,8,7,6,5,4,3,2,1]\n",
|
" [10,9,8,7,6,5,4,3,2,1]\n",
|
||||||
"]})\n",
|
"]})\n",
|
||||||
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
|
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"prediction = aci_service.run(input_data=test_sample)\n",
|
"prediction = aci_service.run(input_data=test_sample)\n",
|
||||||
"print(prediction)"
|
"print(prediction)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete ACI to clean up"
|
"### Delete ACI to clean up"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"deploy service",
|
"deploy service",
|
||||||
"aci"
|
"aci"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"aci_service.delete()"
|
"aci_service.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "raymondl"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "raymondl"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 24 KiB |
@@ -1,332 +1,332 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Azure Machine Learning Pipeline with DataTranferStep\n",
|
"# Azure Machine Learning Pipeline with DataTranferStep\n",
|
||||||
"This notebook is used to demonstrate the use of DataTranferStep in Azure Machine Learning Pipeline.\n",
|
"This notebook is used to demonstrate the use of DataTranferStep in Azure Machine Learning Pipeline.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In certain cases, you will need to transfer data from one data location to another. For example, your data may be in Files storage and you may want to move it to Blob storage. Or, if your data is in an ADLS account and you want to make it available in the Blob storage. The built-in **DataTransferStep** class helps you transfer data in these situations.\n",
|
"In certain cases, you will need to transfer data from one data location to another. For example, your data may be in Files storage and you may want to move it to Blob storage. Or, if your data is in an ADLS account and you want to make it available in the Blob storage. The built-in **DataTransferStep** class helps you transfer data in these situations.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The below example shows how to move data in an ADLS account to Blob storage."
|
"The below example shows how to move data in an ADLS account to Blob storage."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Azure Machine Learning and Pipeline SDK-specific imports"
|
"## Azure Machine Learning and Pipeline SDK-specific imports"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"from azureml.core.compute import ComputeTarget, DatabricksCompute, DataFactoryCompute\n",
|
"from azureml.core.compute import ComputeTarget, DatabricksCompute, DataFactoryCompute\n",
|
||||||
"from azureml.exceptions import ComputeTargetException\n",
|
"from azureml.exceptions import ComputeTargetException\n",
|
||||||
"from azureml.core import Workspace, Run, Experiment\n",
|
"from azureml.core import Workspace, Run, Experiment\n",
|
||||||
"from azureml.pipeline.core import Pipeline, PipelineData\n",
|
"from azureml.pipeline.core import Pipeline, PipelineData\n",
|
||||||
"from azureml.pipeline.steps import AdlaStep\n",
|
"from azureml.pipeline.steps import AdlaStep\n",
|
||||||
"from azureml.core.datastore import Datastore\n",
|
"from azureml.core.datastore import Datastore\n",
|
||||||
"from azureml.data.data_reference import DataReference\n",
|
"from azureml.data.data_reference import DataReference\n",
|
||||||
"from azureml.data.sql_data_reference import SqlDataReference\n",
|
"from azureml.data.sql_data_reference import SqlDataReference\n",
|
||||||
"from azureml.core import attach_legacy_compute_target\n",
|
"from azureml.core import attach_legacy_compute_target\n",
|
||||||
"from azureml.data.stored_procedure_parameter import StoredProcedureParameter, StoredProcedureParameterType\n",
|
"from azureml.data.stored_procedure_parameter import StoredProcedureParameter, StoredProcedureParameterType\n",
|
||||||
"from azureml.pipeline.steps import DataTransferStep\n",
|
"from azureml.pipeline.steps import DataTransferStep\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n",
|
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you don't have a config.json file, please go through the configuration Notebook located here:\n",
|
"If you don't have a config.json file, please go through the configuration Notebook located here:\n",
|
||||||
"https://github.com/Azure/MachineLearningNotebooks. \n",
|
"https://github.com/Azure/MachineLearningNotebooks. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"This sets you up with a working config file that has information on your workspace, subscription id, etc. "
|
"This sets you up with a working config file that has information on your workspace, subscription id, etc. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register Datastores\n",
|
"## Register Datastores\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In the code cell below, you will need to fill in the appropriate values for the workspace name, datastore name, subscription id, resource group, store name, tenant id, client id, and client secret that are associated with your ADLS datastore. \n",
|
"In the code cell below, you will need to fill in the appropriate values for the workspace name, datastore name, subscription id, resource group, store name, tenant id, client id, and client secret that are associated with your ADLS datastore. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"For background on registering your data store, consult this article:\n",
|
"For background on registering your data store, consult this article:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory"
|
"https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"workspace = ws.name\n",
|
"workspace = ws.name\n",
|
||||||
"datastore_name='MyAdlsDatastore'\n",
|
"datastore_name='MyAdlsDatastore'\n",
|
||||||
"subscription_id=os.getenv(\"ADL_SUBSCRIPTION_62\", \"<my-subscription-id>\") # subscription id of ADLS account\n",
|
"subscription_id=os.getenv(\"ADL_SUBSCRIPTION_62\", \"<my-subscription-id>\") # subscription id of ADLS account\n",
|
||||||
"resource_group=os.getenv(\"ADL_RESOURCE_GROUP_62\", \"<my-resource-group>\") # resource group of ADLS account\n",
|
"resource_group=os.getenv(\"ADL_RESOURCE_GROUP_62\", \"<my-resource-group>\") # resource group of ADLS account\n",
|
||||||
"store_name=os.getenv(\"ADL_STORENAME_62\", \"<my-datastore-name>\") # ADLS account name\n",
|
"store_name=os.getenv(\"ADL_STORENAME_62\", \"<my-datastore-name>\") # ADLS account name\n",
|
||||||
"tenant_id=os.getenv(\"ADL_TENANT_62\", \"<my-tenant-id>\") # tenant id of service principal\n",
|
"tenant_id=os.getenv(\"ADL_TENANT_62\", \"<my-tenant-id>\") # tenant id of service principal\n",
|
||||||
"client_id=os.getenv(\"ADL_CLIENTID_62\", \"<my-client-id>\") # client id of service principal\n",
|
"client_id=os.getenv(\"ADL_CLIENTID_62\", \"<my-client-id>\") # client id of service principal\n",
|
||||||
"client_secret=os.getenv(\"ADL_CLIENT_SECRET_62\", \"<my-client-secret>\") # the secret of service principal\n",
|
"client_secret=os.getenv(\"ADL_CLIENT_SECRET_62\", \"<my-client-secret>\") # the secret of service principal\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" adls_datastore = Datastore.get(ws, datastore_name)\n",
|
" adls_datastore = Datastore.get(ws, datastore_name)\n",
|
||||||
" print(\"found datastore with name: %s\" % datastore_name)\n",
|
" print(\"found datastore with name: %s\" % datastore_name)\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" adls_datastore = Datastore.register_azure_data_lake(\n",
|
" adls_datastore = Datastore.register_azure_data_lake(\n",
|
||||||
" workspace=ws,\n",
|
" workspace=ws,\n",
|
||||||
" datastore_name=datastore_name,\n",
|
" datastore_name=datastore_name,\n",
|
||||||
" subscription_id=subscription_id, # subscription id of ADLS account\n",
|
" subscription_id=subscription_id, # subscription id of ADLS account\n",
|
||||||
" resource_group=resource_group, # resource group of ADLS account\n",
|
" resource_group=resource_group, # resource group of ADLS account\n",
|
||||||
" store_name=store_name, # ADLS account name\n",
|
" store_name=store_name, # ADLS account name\n",
|
||||||
" tenant_id=tenant_id, # tenant id of service principal\n",
|
" tenant_id=tenant_id, # tenant id of service principal\n",
|
||||||
" client_id=client_id, # client id of service principal\n",
|
" client_id=client_id, # client id of service principal\n",
|
||||||
" client_secret=client_secret) # the secret of service principal\n",
|
" client_secret=client_secret) # the secret of service principal\n",
|
||||||
" print(\"registered datastore with name: %s\" % datastore_name)\n",
|
" print(\"registered datastore with name: %s\" % datastore_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"blob_datastore_name='MyBlobDatastore'\n",
|
"blob_datastore_name='MyBlobDatastore'\n",
|
||||||
"account_name=os.getenv(\"BLOB_ACCOUNTNAME_62\", \"<my-account-name>\") # Storage account name\n",
|
"account_name=os.getenv(\"BLOB_ACCOUNTNAME_62\", \"<my-account-name>\") # Storage account name\n",
|
||||||
"container_name=os.getenv(\"BLOB_CONTAINER_62\", \"<my-container-name>\") # Name of Azure blob container\n",
|
"container_name=os.getenv(\"BLOB_CONTAINER_62\", \"<my-container-name>\") # Name of Azure blob container\n",
|
||||||
"account_key=os.getenv(\"BLOB_ACCOUNT_KEY_62\", \"<my-account-key>\") # Storage account key\n",
|
"account_key=os.getenv(\"BLOB_ACCOUNT_KEY_62\", \"<my-account-key>\") # Storage account key\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" blob_datastore = Datastore.get(ws, blob_datastore_name)\n",
|
" blob_datastore = Datastore.get(ws, blob_datastore_name)\n",
|
||||||
" print(\"found blob datastore with name: %s\" % blob_datastore_name)\n",
|
" print(\"found blob datastore with name: %s\" % blob_datastore_name)\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" blob_datastore = Datastore.register_azure_blob_container(\n",
|
" blob_datastore = Datastore.register_azure_blob_container(\n",
|
||||||
" workspace=ws,\n",
|
" workspace=ws,\n",
|
||||||
" datastore_name=blob_datastore_name,\n",
|
" datastore_name=blob_datastore_name,\n",
|
||||||
" account_name=account_name, # Storage account name\n",
|
" account_name=account_name, # Storage account name\n",
|
||||||
" container_name=container_name, # Name of Azure blob container\n",
|
" container_name=container_name, # Name of Azure blob container\n",
|
||||||
" account_key=account_key) # Storage account key\"\n",
|
" account_key=account_key) # Storage account key\"\n",
|
||||||
" print(\"registered blob datastore with name: %s\" % blob_datastore_name)\n",
|
" print(\"registered blob datastore with name: %s\" % blob_datastore_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# CLI:\n",
|
"# CLI:\n",
|
||||||
"# az ml datastore register-blob -n <datastore-name> -a <account-name> -c <container-name> -k <account-key> [-t <sas-token>]"
|
"# az ml datastore register-blob -n <datastore-name> -a <account-name> -c <container-name> -k <account-key> [-t <sas-token>]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create DataReferences"
|
"## Create DataReferences"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"adls_datastore = Datastore(workspace=ws, name=\"MyAdlsDatastore\")\n",
|
"adls_datastore = Datastore(workspace=ws, name=\"MyAdlsDatastore\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# adls\n",
|
"# adls\n",
|
||||||
"adls_data_ref = DataReference(\n",
|
"adls_data_ref = DataReference(\n",
|
||||||
" datastore=adls_datastore,\n",
|
" datastore=adls_datastore,\n",
|
||||||
" data_reference_name=\"adls_test_data\",\n",
|
" data_reference_name=\"adls_test_data\",\n",
|
||||||
" path_on_datastore=\"testdata\")\n",
|
" path_on_datastore=\"testdata\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"blob_datastore = Datastore(workspace=ws, name=\"MyBlobDatastore\")\n",
|
"blob_datastore = Datastore(workspace=ws, name=\"MyBlobDatastore\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# blob data\n",
|
"# blob data\n",
|
||||||
"blob_data_ref = DataReference(\n",
|
"blob_data_ref = DataReference(\n",
|
||||||
" datastore=blob_datastore,\n",
|
" datastore=blob_datastore,\n",
|
||||||
" data_reference_name=\"blob_test_data\",\n",
|
" data_reference_name=\"blob_test_data\",\n",
|
||||||
" path_on_datastore=\"testdata\")\n",
|
" path_on_datastore=\"testdata\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"obtained adls, blob data references\")"
|
"print(\"obtained adls, blob data references\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup Data Factory Account"
|
"## Setup Data Factory Account"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"data_factory_name = 'adftest'\n",
|
"data_factory_name = 'adftest'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def get_or_create_data_factory(workspace, factory_name):\n",
|
"def get_or_create_data_factory(workspace, factory_name):\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" return DataFactoryCompute(workspace, factory_name)\n",
|
" return DataFactoryCompute(workspace, factory_name)\n",
|
||||||
" except ComputeTargetException as e:\n",
|
" except ComputeTargetException as e:\n",
|
||||||
" if 'ComputeTargetNotFound' in e.message:\n",
|
" if 'ComputeTargetNotFound' in e.message:\n",
|
||||||
" print('Data factory not found, creating...')\n",
|
" print('Data factory not found, creating...')\n",
|
||||||
" provisioning_config = DataFactoryCompute.provisioning_configuration()\n",
|
" provisioning_config = DataFactoryCompute.provisioning_configuration()\n",
|
||||||
" data_factory = ComputeTarget.create(workspace, factory_name, provisioning_config)\n",
|
" data_factory = ComputeTarget.create(workspace, factory_name, provisioning_config)\n",
|
||||||
" data_factory.wait_for_completion()\n",
|
" data_factory.wait_for_completion()\n",
|
||||||
" return data_factory\n",
|
" return data_factory\n",
|
||||||
" else:\n",
|
" else:\n",
|
||||||
" raise e\n",
|
" raise e\n",
|
||||||
" \n",
|
" \n",
|
||||||
"data_factory_compute = get_or_create_data_factory(ws, data_factory_name)\n",
|
"data_factory_compute = get_or_create_data_factory(ws, data_factory_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"setup data factory account complete\")\n",
|
"print(\"setup data factory account complete\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# CLI:\n",
|
"# CLI:\n",
|
||||||
"# Create: az ml computetarget setup datafactory -n <name>\n",
|
"# Create: az ml computetarget setup datafactory -n <name>\n",
|
||||||
"# BYOC: az ml computetarget attach datafactory -n <name> -i <resource-id>"
|
"# BYOC: az ml computetarget attach datafactory -n <name> -i <resource-id>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create a DataTransferStep"
|
"## Create a DataTransferStep"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"**DataTransferStep** is used to transfer data between Azure Blob, Azure Data Lake Store, and Azure SQL database.\n",
|
"**DataTransferStep** is used to transfer data between Azure Blob, Azure Data Lake Store, and Azure SQL database.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- **name:** Name of module\n",
|
"- **name:** Name of module\n",
|
||||||
"- **source_data_reference:** Input connection that serves as source of data transfer operation.\n",
|
"- **source_data_reference:** Input connection that serves as source of data transfer operation.\n",
|
||||||
"- **destination_data_reference:** Input connection that serves as destination of data transfer operation.\n",
|
"- **destination_data_reference:** Input connection that serves as destination of data transfer operation.\n",
|
||||||
"- **compute_target:** Azure Data Factory to use for transferring data.\n",
|
"- **compute_target:** Azure Data Factory to use for transferring data.\n",
|
||||||
"- **allow_reuse:** Whether the step should reuse results of previous DataTransferStep when run with same inputs. Set as False to force data to be transferred again.\n",
|
"- **allow_reuse:** Whether the step should reuse results of previous DataTransferStep when run with same inputs. Set as False to force data to be transferred again.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Optional arguments to explicitly specify whether a path corresponds to a file or a directory. These are useful when storage contains both file and directory with the same name or when creating a new destination path.\n",
|
"Optional arguments to explicitly specify whether a path corresponds to a file or a directory. These are useful when storage contains both file and directory with the same name or when creating a new destination path.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- **source_reference_type:** An optional string specifying the type of source_data_reference. Possible values include: 'file', 'directory'. When not specified, we use the type of existing path or directory if it's a new path.\n",
|
"- **source_reference_type:** An optional string specifying the type of source_data_reference. Possible values include: 'file', 'directory'. When not specified, we use the type of existing path or directory if it's a new path.\n",
|
||||||
"- **destination_reference_type:** An optional string specifying the type of destination_data_reference. Possible values include: 'file', 'directory'. When not specified, we use the type of existing path or directory if it's a new path."
|
"- **destination_reference_type:** An optional string specifying the type of destination_data_reference. Possible values include: 'file', 'directory'. When not specified, we use the type of existing path or directory if it's a new path."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"transfer_adls_to_blob = DataTransferStep(\n",
|
"transfer_adls_to_blob = DataTransferStep(\n",
|
||||||
" name=\"transfer_adls_to_blob\",\n",
|
" name=\"transfer_adls_to_blob\",\n",
|
||||||
" source_data_reference=adls_data_ref,\n",
|
" source_data_reference=adls_data_ref,\n",
|
||||||
" destination_data_reference=blob_data_ref,\n",
|
" destination_data_reference=blob_data_ref,\n",
|
||||||
" compute_target=data_factory_compute)\n",
|
" compute_target=data_factory_compute)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"data transfer step created\")"
|
"print(\"data transfer step created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Build and Submit the Experiment"
|
"## Build and Submit the Experiment"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"pipeline = Pipeline(\n",
|
"pipeline = Pipeline(\n",
|
||||||
" description=\"data_transfer_101\",\n",
|
" description=\"data_transfer_101\",\n",
|
||||||
" workspace=ws,\n",
|
" workspace=ws,\n",
|
||||||
" steps=[transfer_adls_to_blob])\n",
|
" steps=[transfer_adls_to_blob])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"pipeline_run = Experiment(ws, \"Data_Transfer_example\").submit(pipeline)\n",
|
"pipeline_run = Experiment(ws, \"Data_Transfer_example\").submit(pipeline)\n",
|
||||||
"pipeline_run.wait_for_completion()"
|
"pipeline_run.wait_for_completion()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### View Run Details"
|
"### View Run Details"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(pipeline_run).show()"
|
"RunDetails(pipeline_run).show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Next: Databricks as a Compute Target\n",
|
"# Next: Databricks as a Compute Target\n",
|
||||||
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. This [notebook](./aml-pipelines-use-databricks-as-compute-target.ipynb) demonstrates the use of a DatabricksStep in an Azure Machine Learning Pipeline."
|
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. This [notebook](./aml-pipelines-use-databricks-as-compute-target.ipynb) demonstrates the use of a DatabricksStep in an Azure Machine Learning Pipeline."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "diray"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python3"
|
"name": "diray"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.7"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -1,368 +1,368 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# How to Publish a Pipeline and Invoke the REST endpoint\n",
|
"# How to Publish a Pipeline and Invoke the REST endpoint\n",
|
||||||
"In this notebook, we will see how we can publish a pipeline and then invoke the REST endpoint."
|
"In this notebook, we will see how we can publish a pipeline and then invoke the REST endpoint."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites and Azure Machine Learning Basics\n",
|
"## Prerequisites and Azure Machine Learning Basics\n",
|
||||||
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Initialization Steps"
|
"### Initialization Steps"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"from azureml.core import Workspace, Run, Experiment, Datastore\n",
|
"from azureml.core import Workspace, Run, Experiment, Datastore\n",
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
"from azureml.core.compute import DataFactoryCompute\n",
|
"from azureml.core.compute import DataFactoryCompute\n",
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)\n",
|
"print(\"SDK version:\", azureml.core.VERSION)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.data.data_reference import DataReference\n",
|
"from azureml.data.data_reference import DataReference\n",
|
||||||
"from azureml.pipeline.core import Pipeline, PipelineData, StepSequence\n",
|
"from azureml.pipeline.core import Pipeline, PipelineData, StepSequence\n",
|
||||||
"from azureml.pipeline.steps import PythonScriptStep\n",
|
"from azureml.pipeline.steps import PythonScriptStep\n",
|
||||||
"from azureml.pipeline.steps import DataTransferStep\n",
|
"from azureml.pipeline.steps import DataTransferStep\n",
|
||||||
"from azureml.pipeline.core import PublishedPipeline\n",
|
"from azureml.pipeline.core import PublishedPipeline\n",
|
||||||
"from azureml.pipeline.core.graph import PipelineParameter\n",
|
"from azureml.pipeline.core.graph import PipelineParameter\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"Pipeline SDK-specific imports completed\")\n",
|
"print(\"Pipeline SDK-specific imports completed\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Default datastore (Azure file storage)\n",
|
"# Default datastore (Azure file storage)\n",
|
||||||
"def_file_store = ws.get_default_datastore() \n",
|
"def_file_store = ws.get_default_datastore() \n",
|
||||||
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
|
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||||
"print(\"Blobstore's name: {}\".format(def_blob_store.name))\n",
|
"print(\"Blobstore's name: {}\".format(def_blob_store.name))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# project folder\n",
|
"# project folder\n",
|
||||||
"project_folder = '.'"
|
"project_folder = '.'"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Compute Targets\n",
|
"### Compute Targets\n",
|
||||||
"#### Retrieve an already attached Azure Machine Learning Compute"
|
"#### Retrieve an already attached Azure Machine Learning Compute"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"aml_compute_target = \"aml-compute\"\n",
|
"aml_compute_target = \"aml-compute\"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
|
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
|
||||||
" print(\"found existing compute target.\")\n",
|
" print(\"found existing compute target.\")\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" print(\"creating new compute target\")\n",
|
" print(\"creating new compute target\")\n",
|
||||||
" \n",
|
" \n",
|
||||||
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
|
||||||
" min_nodes = 1, \n",
|
" min_nodes = 1, \n",
|
||||||
" max_nodes = 4) \n",
|
" max_nodes = 4) \n",
|
||||||
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
|
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
|
||||||
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n"
|
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# For a more detailed view of current Azure Machine Learning Compute status, use the 'status' property\n",
|
"# For a more detailed view of current Azure Machine Learning Compute status, use get_status()\n",
|
||||||
"# example: un-comment the following line.\n",
|
"# example: un-comment the following line.\n",
|
||||||
"# print(aml_compute.status.serialize())"
|
"# print(aml_compute.get_status().serialize())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Building Pipeline Steps with Inputs and Outputs\n",
|
"## Building Pipeline Steps with Inputs and Outputs\n",
|
||||||
"As mentioned earlier, a step in the pipeline can take data as input. This data can be a data source that lives in one of the accessible data locations, or intermediate data produced by a previous step in the pipeline."
|
"As mentioned earlier, a step in the pipeline can take data as input. This data can be a data source that lives in one of the accessible data locations, or intermediate data produced by a previous step in the pipeline."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Reference the data uploaded to blob storage using DataReference\n",
|
"# Reference the data uploaded to blob storage using DataReference\n",
|
||||||
"# Assign the datasource to blob_input_data variable\n",
|
"# Assign the datasource to blob_input_data variable\n",
|
||||||
"blob_input_data = DataReference(\n",
|
"blob_input_data = DataReference(\n",
|
||||||
" datastore=def_blob_store,\n",
|
" datastore=def_blob_store,\n",
|
||||||
" data_reference_name=\"test_data\",\n",
|
" data_reference_name=\"test_data\",\n",
|
||||||
" path_on_datastore=\"20newsgroups/20news.pkl\")\n",
|
" path_on_datastore=\"20newsgroups/20news.pkl\")\n",
|
||||||
"print(\"DataReference object created\")"
|
"print(\"DataReference object created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Define intermediate data using PipelineData\n",
|
"# Define intermediate data using PipelineData\n",
|
||||||
"processed_data1 = PipelineData(\"processed_data1\",datastore=def_blob_store)\n",
|
"processed_data1 = PipelineData(\"processed_data1\",datastore=def_blob_store)\n",
|
||||||
"print(\"PipelineData object created\")"
|
"print(\"PipelineData object created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Define a Step that consumes a datasource and produces intermediate data.\n",
|
"#### Define a Step that consumes a datasource and produces intermediate data.\n",
|
||||||
"In this step, we define a step that consumes a datasource and produces intermediate data.\n",
|
"In this step, we define a step that consumes a datasource and produces intermediate data.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
"**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# trainStep consumes the datasource (Datareference) in the previous step\n",
|
"# trainStep consumes the datasource (Datareference) in the previous step\n",
|
||||||
"# and produces processed_data1\n",
|
"# and produces processed_data1\n",
|
||||||
"trainStep = PythonScriptStep(\n",
|
"trainStep = PythonScriptStep(\n",
|
||||||
" script_name=\"train.py\", \n",
|
" script_name=\"train.py\", \n",
|
||||||
" arguments=[\"--input_data\", blob_input_data, \"--output_train\", processed_data1],\n",
|
" arguments=[\"--input_data\", blob_input_data, \"--output_train\", processed_data1],\n",
|
||||||
" inputs=[blob_input_data],\n",
|
" inputs=[blob_input_data],\n",
|
||||||
" outputs=[processed_data1],\n",
|
" outputs=[processed_data1],\n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory=project_folder\n",
|
" source_directory=project_folder\n",
|
||||||
")\n",
|
")\n",
|
||||||
"print(\"trainStep created\")"
|
"print(\"trainStep created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Define a Step that consumes intermediate data and produces intermediate data\n",
|
"#### Define a Step that consumes intermediate data and produces intermediate data\n",
|
||||||
"In this step, we define a step that consumes an intermediate data and produces intermediate data.\n",
|
"In this step, we define a step that consumes an intermediate data and produces intermediate data.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Open `extract.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
"**Open `extract.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# extractStep to use the intermediate data produced by step4\n",
|
"# extractStep to use the intermediate data produced by step4\n",
|
||||||
"# This step also produces an output processed_data2\n",
|
"# This step also produces an output processed_data2\n",
|
||||||
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"extractStep = PythonScriptStep(\n",
|
"extractStep = PythonScriptStep(\n",
|
||||||
" script_name=\"extract.py\",\n",
|
" script_name=\"extract.py\",\n",
|
||||||
" arguments=[\"--input_extract\", processed_data1, \"--output_extract\", processed_data2],\n",
|
" arguments=[\"--input_extract\", processed_data1, \"--output_extract\", processed_data2],\n",
|
||||||
" inputs=[processed_data1],\n",
|
" inputs=[processed_data1],\n",
|
||||||
" outputs=[processed_data2],\n",
|
" outputs=[processed_data2],\n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory=project_folder)\n",
|
" source_directory=project_folder)\n",
|
||||||
"print(\"extractStep created\")"
|
"print(\"extractStep created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Define a Step that consumes multiple intermediate data and produces intermediate data\n",
|
"#### Define a Step that consumes multiple intermediate data and produces intermediate data\n",
|
||||||
"In this step, we define a step that consumes multiple intermediate data and produces intermediate data."
|
"In this step, we define a step that consumes multiple intermediate data and produces intermediate data."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### PipelineParameter"
|
"### PipelineParameter"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This step also has a [PipelineParameter](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.graph.pipelineparameter?view=azure-ml-py) argument that help with calling the REST endpoint of the published pipeline."
|
"This step also has a [PipelineParameter](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.graph.pipelineparameter?view=azure-ml-py) argument that help with calling the REST endpoint of the published pipeline."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# We will use this later in publishing pipeline\n",
|
"# We will use this later in publishing pipeline\n",
|
||||||
"pipeline_param = PipelineParameter(name=\"pipeline_arg\", default_value=10)\n",
|
"pipeline_param = PipelineParameter(name=\"pipeline_arg\", default_value=10)\n",
|
||||||
"print(\"pipeline parameter created\")"
|
"print(\"pipeline parameter created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"**Open `compare.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.**"
|
"**Open `compare.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.**"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Now define step6 that takes two inputs (both intermediate data), and produce an output\n",
|
"# Now define step6 that takes two inputs (both intermediate data), and produce an output\n",
|
||||||
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"compareStep = PythonScriptStep(\n",
|
"compareStep = PythonScriptStep(\n",
|
||||||
" script_name=\"compare.py\",\n",
|
" script_name=\"compare.py\",\n",
|
||||||
" arguments=[\"--compare_data1\", processed_data1, \"--compare_data2\", processed_data2, \"--output_compare\", processed_data3, \"--pipeline_param\", pipeline_param],\n",
|
" arguments=[\"--compare_data1\", processed_data1, \"--compare_data2\", processed_data2, \"--output_compare\", processed_data3, \"--pipeline_param\", pipeline_param],\n",
|
||||||
" inputs=[processed_data1, processed_data2],\n",
|
" inputs=[processed_data1, processed_data2],\n",
|
||||||
" outputs=[processed_data3], \n",
|
" outputs=[processed_data3], \n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory=project_folder)\n",
|
" source_directory=project_folder)\n",
|
||||||
"print(\"compareStep created\")"
|
"print(\"compareStep created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Build the pipeline"
|
"#### Build the pipeline"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"pipeline1 = Pipeline(workspace=ws, steps=[compareStep])\n",
|
"pipeline1 = Pipeline(workspace=ws, steps=[compareStep])\n",
|
||||||
"print (\"Pipeline is built\")\n",
|
"print (\"Pipeline is built\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"pipeline1.validate()\n",
|
"pipeline1.validate()\n",
|
||||||
"print(\"Simple validation complete\") "
|
"print(\"Simple validation complete\") "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Publish the pipeline"
|
"## Publish the pipeline"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"published_pipeline1 = pipeline1.publish(name=\"My_New_Pipeline\", description=\"My Published Pipeline Description\")\n",
|
"published_pipeline1 = pipeline1.publish(name=\"My_New_Pipeline\", description=\"My Published Pipeline Description\")\n",
|
||||||
"print(published_pipeline1.id)"
|
"print(published_pipeline1.id)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Run published pipeline using its REST endpoint"
|
"### Run published pipeline using its REST endpoint"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.authentication import AzureCliAuthentication\n",
|
"from azureml.core.authentication import AzureCliAuthentication\n",
|
||||||
"import requests\n",
|
"import requests\n",
|
||||||
"\n",
|
"\n",
|
||||||
"cli_auth = AzureCliAuthentication()\n",
|
"cli_auth = AzureCliAuthentication()\n",
|
||||||
"aad_token = cli_auth.get_authentication_header()\n",
|
"aad_token = cli_auth.get_authentication_header()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"rest_endpoint1 = published_pipeline1.endpoint\n",
|
"rest_endpoint1 = published_pipeline1.endpoint\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(rest_endpoint1)\n",
|
"print(rest_endpoint1)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# specify the param when running the pipeline\n",
|
"# specify the param when running the pipeline\n",
|
||||||
"response = requests.post(rest_endpoint1, \n",
|
"response = requests.post(rest_endpoint1, \n",
|
||||||
" headers=aad_token, \n",
|
" headers=aad_token, \n",
|
||||||
" json={\"ExperimentName\": \"My_Pipeline1\",\n",
|
" json={\"ExperimentName\": \"My_Pipeline1\",\n",
|
||||||
" \"RunSource\": \"SDK\",\n",
|
" \"RunSource\": \"SDK\",\n",
|
||||||
" \"ParameterAssignments\": {\"pipeline_arg\": 45}})\n",
|
" \"ParameterAssignments\": {\"pipeline_arg\": 45}})\n",
|
||||||
"run_id = response.json()[\"Id\"]\n",
|
"run_id = response.json()[\"Id\"]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(run_id)"
|
"print(run_id)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Next: Data Transfer\n",
|
"# Next: Data Transfer\n",
|
||||||
"The next [notebook](./aml-pipelines-data-transfer.ipynb) will showcase data transfer steps between different types of data stores."
|
"The next [notebook](./aml-pipelines-data-transfer.ipynb) will showcase data transfer steps between different types of data stores."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "diray"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python3"
|
"name": "diray"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.7"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,368 +1,368 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AML Pipeline with AdlaStep\n",
|
"# AML Pipeline with AdlaStep\n",
|
||||||
"This notebook is used to demonstrate the use of AdlaStep in AML Pipeline."
|
"This notebook is used to demonstrate the use of AdlaStep in AML Pipeline."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## AML and Pipeline SDK-specific imports"
|
"## AML and Pipeline SDK-specific imports"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"from azureml.core.compute import ComputeTarget, DatabricksCompute\n",
|
"from azureml.core.compute import ComputeTarget, DatabricksCompute\n",
|
||||||
"from azureml.exceptions import ComputeTargetException\n",
|
"from azureml.exceptions import ComputeTargetException\n",
|
||||||
"from azureml.core import Workspace, Run, Experiment\n",
|
"from azureml.core import Workspace, Run, Experiment\n",
|
||||||
"from azureml.pipeline.core import Pipeline, PipelineData\n",
|
"from azureml.pipeline.core import Pipeline, PipelineData\n",
|
||||||
"from azureml.pipeline.steps import AdlaStep\n",
|
"from azureml.pipeline.steps import AdlaStep\n",
|
||||||
"from azureml.core.datastore import Datastore\n",
|
"from azureml.core.datastore import Datastore\n",
|
||||||
"from azureml.data.data_reference import DataReference\n",
|
"from azureml.data.data_reference import DataReference\n",
|
||||||
"from azureml.core import attach_legacy_compute_target\n",
|
"from azureml.core import attach_legacy_compute_target\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json"
|
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"script_folder = '.'\n",
|
"script_folder = '.'\n",
|
||||||
"experiment_name = \"adla_101_experiment\"\n",
|
"experiment_name = \"adla_101_experiment\"\n",
|
||||||
"ws._initialize_folder(experiment_name=experiment_name, directory=script_folder)"
|
"ws._initialize_folder(experiment_name=experiment_name, directory=script_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register Datastore"
|
"## Register Datastore"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"workspace = ws.name\n",
|
"workspace = ws.name\n",
|
||||||
"datastore_name='MyAdlsDatastore'\n",
|
"datastore_name='MyAdlsDatastore'\n",
|
||||||
"subscription_id=os.getenv(\"ADL_SUBSCRIPTION_62\", \"<my-subscription-id>\") # subscription id of ADLS account\n",
|
"subscription_id=os.getenv(\"ADL_SUBSCRIPTION_62\", \"<my-subscription-id>\") # subscription id of ADLS account\n",
|
||||||
"resource_group=os.getenv(\"ADL_RESOURCE_GROUP_62\", \"<my-resource-group>\") # resource group of ADLS account\n",
|
"resource_group=os.getenv(\"ADL_RESOURCE_GROUP_62\", \"<my-resource-group>\") # resource group of ADLS account\n",
|
||||||
"store_name=os.getenv(\"ADL_STORENAME_62\", \"<my-datastore-name>\") # ADLS account name\n",
|
"store_name=os.getenv(\"ADL_STORENAME_62\", \"<my-datastore-name>\") # ADLS account name\n",
|
||||||
"tenant_id=os.getenv(\"ADL_TENANT_62\", \"<my-tenant-id>\") # tenant id of service principal\n",
|
"tenant_id=os.getenv(\"ADL_TENANT_62\", \"<my-tenant-id>\") # tenant id of service principal\n",
|
||||||
"client_id=os.getenv(\"ADL_CLIENTID_62\", \"<my-client-id>\") # client id of service principal\n",
|
"client_id=os.getenv(\"ADL_CLIENTID_62\", \"<my-client-id>\") # client id of service principal\n",
|
||||||
"client_secret=os.getenv(\"ADL_CLIENT_62_SECRET\", \"<my-client-secret>\") # the secret of service principal\n",
|
"client_secret=os.getenv(\"ADL_CLIENT_62_SECRET\", \"<my-client-secret>\") # the secret of service principal\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" adls_datastore = Datastore.get(ws, datastore_name)\n",
|
" adls_datastore = Datastore.get(ws, datastore_name)\n",
|
||||||
" print(\"found datastore with name: %s\" % datastore_name)\n",
|
" print(\"found datastore with name: %s\" % datastore_name)\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" adls_datastore = Datastore.register_azure_data_lake(\n",
|
" adls_datastore = Datastore.register_azure_data_lake(\n",
|
||||||
" workspace=ws,\n",
|
" workspace=ws,\n",
|
||||||
" datastore_name=datastore_name,\n",
|
" datastore_name=datastore_name,\n",
|
||||||
" subscription_id=subscription_id, # subscription id of ADLS account\n",
|
" subscription_id=subscription_id, # subscription id of ADLS account\n",
|
||||||
" resource_group=resource_group, # resource group of ADLS account\n",
|
" resource_group=resource_group, # resource group of ADLS account\n",
|
||||||
" store_name=store_name, # ADLS account name\n",
|
" store_name=store_name, # ADLS account name\n",
|
||||||
" tenant_id=tenant_id, # tenant id of service principal\n",
|
" tenant_id=tenant_id, # tenant id of service principal\n",
|
||||||
" client_id=client_id, # client id of service principal\n",
|
" client_id=client_id, # client id of service principal\n",
|
||||||
" client_secret=client_secret) # the secret of service principal\n",
|
" client_secret=client_secret) # the secret of service principal\n",
|
||||||
" print(\"registered datastore with name: %s\" % datastore_name)\n"
|
" print(\"registered datastore with name: %s\" % datastore_name)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create DataReferences and PipelineData\n",
|
"## Create DataReferences and PipelineData\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In the code cell below, replace datastorename with your default datastore name. Copy the file `testdata.txt` (located in the pipeline folder that this notebook is in) to the path on the datastore."
|
"In the code cell below, replace datastorename with your default datastore name. Copy the file `testdata.txt` (located in the pipeline folder that this notebook is in) to the path on the datastore."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"datastorename = \"MyAdlsDatastore\"\n",
|
"datastorename = \"MyAdlsDatastore\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"adls_datastore = Datastore(workspace=ws, name=datastorename)\n",
|
"adls_datastore = Datastore(workspace=ws, name=datastorename)\n",
|
||||||
"script_input = DataReference(\n",
|
"script_input = DataReference(\n",
|
||||||
" datastore=adls_datastore,\n",
|
" datastore=adls_datastore,\n",
|
||||||
" data_reference_name=\"script_input\",\n",
|
" data_reference_name=\"script_input\",\n",
|
||||||
" path_on_datastore=\"testdata/testdata.txt\")\n",
|
" path_on_datastore=\"testdata/testdata.txt\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"script_output = PipelineData(\"script_output\", datastore=adls_datastore)\n",
|
"script_output = PipelineData(\"script_output\", datastore=adls_datastore)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"Created Pipeline Data\")"
|
"print(\"Created Pipeline Data\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup Data Lake Account\n",
|
"## Setup Data Lake Account\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ADLA can only use data that is located in the default data store associated with that ADLA account. Through Azure portal, check the name of the default data store corresponding to the ADLA account you are using below. Replace the value associated with `adla_compute_name` in the code cell below accordingly."
|
"ADLA can only use data that is located in the default data store associated with that ADLA account. Through Azure portal, check the name of the default data store corresponding to the ADLA account you are using below. Replace the value associated with `adla_compute_name` in the code cell below accordingly."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"adla_compute_name = 'testadl' # Replace this with your default compute\n",
|
"adla_compute_name = 'testadl' # Replace this with your default compute\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.core.compute import ComputeTarget, AdlaCompute\n",
|
"from azureml.core.compute import ComputeTarget, AdlaCompute\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def get_or_create_adla_compute(workspace, compute_name):\n",
|
"def get_or_create_adla_compute(workspace, compute_name):\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" return AdlaCompute(workspace, compute_name)\n",
|
" return AdlaCompute(workspace, compute_name)\n",
|
||||||
" except ComputeTargetException as e:\n",
|
" except ComputeTargetException as e:\n",
|
||||||
" if 'ComputeTargetNotFound' in e.message:\n",
|
" if 'ComputeTargetNotFound' in e.message:\n",
|
||||||
" print('adla compute not found, creating...')\n",
|
" print('adla compute not found, creating...')\n",
|
||||||
" provisioning_config = AdlaCompute.provisioning_configuration()\n",
|
" provisioning_config = AdlaCompute.provisioning_configuration()\n",
|
||||||
" adla_compute = ComputeTarget.create(workspace, compute_name, provisioning_config)\n",
|
" adla_compute = ComputeTarget.create(workspace, compute_name, provisioning_config)\n",
|
||||||
" adla_compute.wait_for_completion()\n",
|
" adla_compute.wait_for_completion()\n",
|
||||||
" return adla_compute\n",
|
" return adla_compute\n",
|
||||||
" else:\n",
|
" else:\n",
|
||||||
" raise e\n",
|
" raise e\n",
|
||||||
" \n",
|
" \n",
|
||||||
"adla_compute = get_or_create_adla_compute(ws, adla_compute_name)\n",
|
"adla_compute = get_or_create_adla_compute(ws, adla_compute_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# CLI:\n",
|
"# CLI:\n",
|
||||||
"# Create: az ml computetarget setup adla -n <name>\n",
|
"# Create: az ml computetarget setup adla -n <name>\n",
|
||||||
"# BYOC: az ml computetarget attach adla -n <name> -i <resource-id>"
|
"# BYOC: az ml computetarget attach adla -n <name> -i <resource-id>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Once the above code cell completes, run the below to check your ADLA compute status:"
|
"Once the above code cell completes, run the below to check your ADLA compute status:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"ADLA compute state:{}\".format(adla_compute.provisioning_state))\n",
|
"print(\"ADLA compute state:{}\".format(adla_compute.provisioning_state))\n",
|
||||||
"print(\"ADLA compute state:{}\".format(adla_compute.provisioning_errors))\n",
|
"print(\"ADLA compute state:{}\".format(adla_compute.provisioning_errors))\n",
|
||||||
"print(\"Using ADLA compute:{}\".format(adla_compute.cluster_resource_id))"
|
"print(\"Using ADLA compute:{}\".format(adla_compute.cluster_resource_id))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create an AdlaStep"
|
"## Create an AdlaStep"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"**AdlaStep** is used to run U-SQL script using Azure Data Lake Analytics.\n",
|
"**AdlaStep** is used to run U-SQL script using Azure Data Lake Analytics.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- **name:** Name of module\n",
|
"- **name:** Name of module\n",
|
||||||
"- **script_name:** name of U-SQL script\n",
|
"- **script_name:** name of U-SQL script\n",
|
||||||
"- **inputs:** List of input port bindings\n",
|
"- **inputs:** List of input port bindings\n",
|
||||||
"- **outputs:** List of output port bindings\n",
|
"- **outputs:** List of output port bindings\n",
|
||||||
"- **adla_compute:** the ADLA compute to use for this job\n",
|
"- **adla_compute:** the ADLA compute to use for this job\n",
|
||||||
"- **params:** Dictionary of name-value pairs to pass to U-SQL job *(optional)*\n",
|
"- **params:** Dictionary of name-value pairs to pass to U-SQL job *(optional)*\n",
|
||||||
"- **degree_of_parallelism:** the degree of parallelism to use for this job *(optional)*\n",
|
"- **degree_of_parallelism:** the degree of parallelism to use for this job *(optional)*\n",
|
||||||
"- **priority:** the priority value to use for the current job *(optional)*\n",
|
"- **priority:** the priority value to use for the current job *(optional)*\n",
|
||||||
"- **runtime_version:** the runtime version of the Data Lake Analytics engine *(optional)*\n",
|
"- **runtime_version:** the runtime version of the Data Lake Analytics engine *(optional)*\n",
|
||||||
"- **root_folder:** folder that contains the script, assemblies etc. *(optional)*\n",
|
"- **root_folder:** folder that contains the script, assemblies etc. *(optional)*\n",
|
||||||
"- **hash_paths:** list of paths to hash to detect a change (script file is always hashed) *(optional)*\n",
|
"- **hash_paths:** list of paths to hash to detect a change (script file is always hashed) *(optional)*\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Remarks\n",
|
"### Remarks\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can use `@@name@@` syntax in your script to refer to inputs, outputs, and params.\n",
|
"You can use `@@name@@` syntax in your script to refer to inputs, outputs, and params.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"* if `name` is the name of an input or output port binding, any occurences of `@@name@@` in the script\n",
|
"* if `name` is the name of an input or output port binding, any occurences of `@@name@@` in the script\n",
|
||||||
"are replaced with actual data path of corresponding port binding.\n",
|
"are replaced with actual data path of corresponding port binding.\n",
|
||||||
"* if `name` matches any key in `params` dict, any occurences of `@@name@@` will be replaced with\n",
|
"* if `name` matches any key in `params` dict, any occurences of `@@name@@` will be replaced with\n",
|
||||||
"corresponding value in dict.\n",
|
"corresponding value in dict.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Sample script\n",
|
"#### Sample script\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"@resourcereader =\n",
|
"@resourcereader =\n",
|
||||||
" EXTRACT query string\n",
|
" EXTRACT query string\n",
|
||||||
" FROM \"@@script_input@@\"\n",
|
" FROM \"@@script_input@@\"\n",
|
||||||
" USING Extractors.Csv();\n",
|
" USING Extractors.Csv();\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"OUTPUT @resourcereader\n",
|
"OUTPUT @resourcereader\n",
|
||||||
"TO \"@@script_output@@\"\n",
|
"TO \"@@script_output@@\"\n",
|
||||||
"USING Outputters.Csv();\n",
|
"USING Outputters.Csv();\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"adla_step = AdlaStep(\n",
|
"adla_step = AdlaStep(\n",
|
||||||
" name='adla_script_step',\n",
|
" name='adla_script_step',\n",
|
||||||
" script_name='test_adla_script.usql',\n",
|
" script_name='test_adla_script.usql',\n",
|
||||||
" inputs=[script_input],\n",
|
" inputs=[script_input],\n",
|
||||||
" outputs=[script_output],\n",
|
" outputs=[script_output],\n",
|
||||||
" compute_target=adla_compute)"
|
" compute_target=adla_compute)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Build and Submit the Experiment"
|
"## Build and Submit the Experiment"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"pipeline = Pipeline(\n",
|
"pipeline = Pipeline(\n",
|
||||||
" description=\"adla_102\",\n",
|
" description=\"adla_102\",\n",
|
||||||
" workspace=ws, \n",
|
" workspace=ws, \n",
|
||||||
" steps=[adla_step],\n",
|
" steps=[adla_step],\n",
|
||||||
" default_source_directory=script_folder)\n",
|
" default_source_directory=script_folder)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"pipeline_run = Experiment(workspace, experiment_name).submit(pipeline)\n",
|
"pipeline_run = Experiment(workspace, experiment_name).submit(pipeline)\n",
|
||||||
"pipeline_run.wait_for_completion()"
|
"pipeline_run.wait_for_completion()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### View Run Details"
|
"### View Run Details"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(pipeline_run).show()"
|
"RunDetails(pipeline_run).show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Examine the run\n",
|
"### Examine the run\n",
|
||||||
"You can cycle through the node_run objects and examine job logs, stdout, and stderr of each of the steps."
|
"You can cycle through the node_run objects and examine job logs, stdout, and stderr of each of the steps."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"step_runs = pipeline_run.get_children()\n",
|
"step_runs = pipeline_run.get_children()\n",
|
||||||
"for step_run in step_runs:\n",
|
"for step_run in step_runs:\n",
|
||||||
" status = step_run.get_status()\n",
|
" status = step_run.get_status()\n",
|
||||||
" print('node', step_run.name, 'status:', status)\n",
|
" print('node', step_run.name, 'status:', status)\n",
|
||||||
" if status == \"Failed\":\n",
|
" if status == \"Failed\":\n",
|
||||||
" joblog = step_run.get_job_log()\n",
|
" joblog = step_run.get_job_log()\n",
|
||||||
" print('job log:', joblog)\n",
|
" print('job log:', joblog)\n",
|
||||||
" stdout_log = step_run.get_stdout_log()\n",
|
" stdout_log = step_run.get_stdout_log()\n",
|
||||||
" print('stdout log:', stdout_log)\n",
|
" print('stdout log:', stdout_log)\n",
|
||||||
" stderr_log = step_run.get_stderr_log()\n",
|
" stderr_log = step_run.get_stderr_log()\n",
|
||||||
" print('stderr log:', stderr_log)\n",
|
" print('stderr log:', stderr_log)\n",
|
||||||
" with open(\"logs-\" + step_run.name + \".txt\", \"w\") as f:\n",
|
" with open(\"logs-\" + step_run.name + \".txt\", \"w\") as f:\n",
|
||||||
" f.write(joblog)\n",
|
" f.write(joblog)\n",
|
||||||
" print(\"Job log written to logs-\"+ step_run.name + \".txt\")\n",
|
" print(\"Job log written to logs-\"+ step_run.name + \".txt\")\n",
|
||||||
" if status == \"Finished\":\n",
|
" if status == \"Finished\":\n",
|
||||||
" stdout_log = step_run.get_stdout_log()\n",
|
" stdout_log = step_run.get_stdout_log()\n",
|
||||||
" print('stdout log:', stdout_log)"
|
" print('stdout log:', stdout_log)"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "diray"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python [default]",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python3"
|
"name": "diray"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -1,418 +1,418 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Azure Machine Learning Pipelines with Data Dependency\n",
|
"# Azure Machine Learning Pipelines with Data Dependency\n",
|
||||||
"In this notebook, we will see how we can build a pipeline with implicit data dependancy."
|
"In this notebook, we will see how we can build a pipeline with implicit data dependancy."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites and Azure Machine Learning Basics\n",
|
"## Prerequisites and Azure Machine Learning Basics\n",
|
||||||
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
"Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Azure Machine Learning and Pipeline SDK-specific Imports"
|
"### Azure Machine Learning and Pipeline SDK-specific Imports"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"from azureml.core import Workspace, Run, Experiment, Datastore\n",
|
"from azureml.core import Workspace, Run, Experiment, Datastore\n",
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
"from azureml.core.compute import DataFactoryCompute\n",
|
"from azureml.core.compute import DataFactoryCompute\n",
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)\n",
|
"print(\"SDK version:\", azureml.core.VERSION)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.data.data_reference import DataReference\n",
|
"from azureml.data.data_reference import DataReference\n",
|
||||||
"from azureml.pipeline.core import Pipeline, PipelineData, StepSequence\n",
|
"from azureml.pipeline.core import Pipeline, PipelineData, StepSequence\n",
|
||||||
"from azureml.pipeline.steps import PythonScriptStep\n",
|
"from azureml.pipeline.steps import PythonScriptStep\n",
|
||||||
"from azureml.pipeline.steps import DataTransferStep\n",
|
"from azureml.pipeline.steps import DataTransferStep\n",
|
||||||
"from azureml.pipeline.core import PublishedPipeline\n",
|
"from azureml.pipeline.core import PublishedPipeline\n",
|
||||||
"from azureml.pipeline.core.graph import PipelineParameter\n",
|
"from azureml.pipeline.core.graph import PipelineParameter\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"Pipeline SDK-specific imports completed\")"
|
"print(\"Pipeline SDK-specific imports completed\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Initialize Workspace\n",
|
"### Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class%29) object from persisted configuration."
|
"Initialize a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class%29) object from persisted configuration."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Default datastore (Azure file storage)\n",
|
"# Default datastore (Azure file storage)\n",
|
||||||
"def_file_store = ws.get_default_datastore() \n",
|
"def_file_store = ws.get_default_datastore() \n",
|
||||||
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
|
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||||
"print(\"Blobstore's name: {}\".format(def_blob_store.name))"
|
"print(\"Blobstore's name: {}\".format(def_blob_store.name))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# project folder\n",
|
"# project folder\n",
|
||||||
"project_folder = '.'\n",
|
"project_folder = '.'\n",
|
||||||
" \n",
|
" \n",
|
||||||
"print('Sample projects will be created in {}.'.format(project_folder))"
|
"print('Sample projects will be created in {}.'.format(project_folder))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Required data and script files for the the tutorial\n",
|
"### Required data and script files for the the tutorial\n",
|
||||||
"Sample files required to finish this tutorial are already copied to the project folder specified above. Even though the .py provided in the samples don't have much \"ML work,\" as a data scientist, you will work on this extensively as part of your work. To complete this tutorial, the contents of these files are not very important. The one-line files are for demostration purpose only."
|
"Sample files required to finish this tutorial are already copied to the project folder specified above. Even though the .py provided in the samples don't have much \"ML work,\" as a data scientist, you will work on this extensively as part of your work. To complete this tutorial, the contents of these files are not very important. The one-line files are for demostration purpose only."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Compute Targets\n",
|
"### Compute Targets\n",
|
||||||
"See the list of Compute Targets on the workspace."
|
"See the list of Compute Targets on the workspace."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"cts = ws.compute_targets\n",
|
"cts = ws.compute_targets\n",
|
||||||
"for ct in cts:\n",
|
"for ct in cts:\n",
|
||||||
" print(ct)"
|
" print(ct)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve or create a Aml compute\n",
|
"#### Retrieve or create a Aml compute\n",
|
||||||
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Aml Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target."
|
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Aml Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"aml_compute_target = \"aml-compute\"\n",
|
"aml_compute_target = \"aml-compute\"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
|
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
|
||||||
" print(\"found existing compute target.\")\n",
|
" print(\"found existing compute target.\")\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" print(\"creating new compute target\")\n",
|
" print(\"creating new compute target\")\n",
|
||||||
" \n",
|
" \n",
|
||||||
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
|
||||||
" min_nodes = 1, \n",
|
" min_nodes = 1, \n",
|
||||||
" max_nodes = 4) \n",
|
" max_nodes = 4) \n",
|
||||||
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
|
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
|
||||||
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
||||||
" \n",
|
" \n",
|
||||||
"print(\"Aml Compute attached\")\n"
|
"print(\"Aml Compute attached\")\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# For a more detailed view of current Azure Machine Learning Compute status, use the 'status' property\n",
|
"# For a more detailed view of current Azure Machine Learning Compute status, use get_status()\n",
|
||||||
"# example: un-comment the following line.\n",
|
"# example: un-comment the following line.\n",
|
||||||
"# print(aml_compute.status.serialize())"
|
"# print(aml_compute.get_status().serialize())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"**Wait for this call to finish before proceeding (you will see the asterisk turning to a number).**\n",
|
"**Wait for this call to finish before proceeding (you will see the asterisk turning to a number).**\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that you have created the compute target, let's see what the workspace's compute_targets() function returns. You should now see one entry named 'amlcompute' of type AmlCompute."
|
"Now that you have created the compute target, let's see what the workspace's compute_targets() function returns. You should now see one entry named 'amlcompute' of type AmlCompute."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Building Pipeline Steps with Inputs and Outputs\n",
|
"## Building Pipeline Steps with Inputs and Outputs\n",
|
||||||
"As mentioned earlier, a step in the pipeline can take data as input. This data can be a data source that lives in one of the accessible data locations, or intermediate data produced by a previous step in the pipeline.\n",
|
"As mentioned earlier, a step in the pipeline can take data as input. This data can be a data source that lives in one of the accessible data locations, or intermediate data produced by a previous step in the pipeline.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Datasources\n",
|
"### Datasources\n",
|
||||||
"Datasource is represented by **[DataReference](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.data_reference.datareference?view=azure-ml-py)** object and points to data that lives in or is accessible from Datastore. DataReference could be a pointer to a file or a directory."
|
"Datasource is represented by **[DataReference](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.data_reference.datareference?view=azure-ml-py)** object and points to data that lives in or is accessible from Datastore. DataReference could be a pointer to a file or a directory."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Reference the data uploaded to blob storage using DataReference\n",
|
"# Reference the data uploaded to blob storage using DataReference\n",
|
||||||
"# Assign the datasource to blob_input_data variable\n",
|
"# Assign the datasource to blob_input_data variable\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# DataReference(datastore, \n",
|
"# DataReference(datastore, \n",
|
||||||
"# data_reference_name=None, \n",
|
"# data_reference_name=None, \n",
|
||||||
"# path_on_datastore=None, \n",
|
"# path_on_datastore=None, \n",
|
||||||
"# mode='mount', \n",
|
"# mode='mount', \n",
|
||||||
"# path_on_compute=None, \n",
|
"# path_on_compute=None, \n",
|
||||||
"# overwrite=False)\n",
|
"# overwrite=False)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"blob_input_data = DataReference(\n",
|
"blob_input_data = DataReference(\n",
|
||||||
" datastore=def_blob_store,\n",
|
" datastore=def_blob_store,\n",
|
||||||
" data_reference_name=\"test_data\",\n",
|
" data_reference_name=\"test_data\",\n",
|
||||||
" path_on_datastore=\"20newsgroups/20news.pkl\")\n",
|
" path_on_datastore=\"20newsgroups/20news.pkl\")\n",
|
||||||
"print(\"DataReference object created\")"
|
"print(\"DataReference object created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Intermediate/Output Data\n",
|
"### Intermediate/Output Data\n",
|
||||||
"Intermediate data (or output of a Step) is represented by **[PipelineData](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py)** object. PipelineData can be produced by one step and consumed in another step by providing the PipelineData object as an output of one step and the input of one or more steps.\n",
|
"Intermediate data (or output of a Step) is represented by **[PipelineData](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py)** object. PipelineData can be produced by one step and consumed in another step by providing the PipelineData object as an output of one step and the input of one or more steps.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Constructing PipelineData\n",
|
"#### Constructing PipelineData\n",
|
||||||
"- **name:** [*Required*] Name of the data item within the pipeline graph\n",
|
"- **name:** [*Required*] Name of the data item within the pipeline graph\n",
|
||||||
"- **datastore_name:** Name of the Datastore to write this output to\n",
|
"- **datastore_name:** Name of the Datastore to write this output to\n",
|
||||||
"- **output_name:** Name of the output\n",
|
"- **output_name:** Name of the output\n",
|
||||||
"- **output_mode:** Specifies \"upload\" or \"mount\" modes for producing output (default: mount)\n",
|
"- **output_mode:** Specifies \"upload\" or \"mount\" modes for producing output (default: mount)\n",
|
||||||
"- **output_path_on_compute:** For \"upload\" mode, the path to which the module writes this output during execution\n",
|
"- **output_path_on_compute:** For \"upload\" mode, the path to which the module writes this output during execution\n",
|
||||||
"- **output_overwrite:** Flag to overwrite pre-existing data"
|
"- **output_overwrite:** Flag to overwrite pre-existing data"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Define intermediate data using PipelineData\n",
|
"# Define intermediate data using PipelineData\n",
|
||||||
"# Syntax\n",
|
"# Syntax\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# PipelineData(name, \n",
|
"# PipelineData(name, \n",
|
||||||
"# datastore=None, \n",
|
"# datastore=None, \n",
|
||||||
"# output_name=None, \n",
|
"# output_name=None, \n",
|
||||||
"# output_mode='mount', \n",
|
"# output_mode='mount', \n",
|
||||||
"# output_path_on_compute=None, \n",
|
"# output_path_on_compute=None, \n",
|
||||||
"# output_overwrite=None, \n",
|
"# output_overwrite=None, \n",
|
||||||
"# data_type=None, \n",
|
"# data_type=None, \n",
|
||||||
"# is_directory=None)\n",
|
"# is_directory=None)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Naming the intermediate data as processed_data1 and assigning it to the variable processed_data1.\n",
|
"# Naming the intermediate data as processed_data1 and assigning it to the variable processed_data1.\n",
|
||||||
"processed_data1 = PipelineData(\"processed_data1\",datastore=def_blob_store)\n",
|
"processed_data1 = PipelineData(\"processed_data1\",datastore=def_blob_store)\n",
|
||||||
"print(\"PipelineData object created\")"
|
"print(\"PipelineData object created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pipelines steps using datasources and intermediate data\n",
|
"### Pipelines steps using datasources and intermediate data\n",
|
||||||
"Machine learning pipelines can have many steps and these steps could use or reuse datasources and intermediate data. Here's how we construct such a pipeline:"
|
"Machine learning pipelines can have many steps and these steps could use or reuse datasources and intermediate data. Here's how we construct such a pipeline:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Define a Step that consumes a datasource and produces intermediate data.\n",
|
"#### Define a Step that consumes a datasource and produces intermediate data.\n",
|
||||||
"In this step, we define a step that consumes a datasource and produces intermediate data.\n",
|
"In this step, we define a step that consumes a datasource and produces intermediate data.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
"**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# step4 consumes the datasource (Datareference) in the previous step\n",
|
"# step4 consumes the datasource (Datareference) in the previous step\n",
|
||||||
"# and produces processed_data1\n",
|
"# and produces processed_data1\n",
|
||||||
"trainStep = PythonScriptStep(\n",
|
"trainStep = PythonScriptStep(\n",
|
||||||
" script_name=\"train.py\", \n",
|
" script_name=\"train.py\", \n",
|
||||||
" arguments=[\"--input_data\", blob_input_data, \"--output_train\", processed_data1],\n",
|
" arguments=[\"--input_data\", blob_input_data, \"--output_train\", processed_data1],\n",
|
||||||
" inputs=[blob_input_data],\n",
|
" inputs=[blob_input_data],\n",
|
||||||
" outputs=[processed_data1],\n",
|
" outputs=[processed_data1],\n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory=project_folder\n",
|
" source_directory=project_folder\n",
|
||||||
")\n",
|
")\n",
|
||||||
"print(\"trainStep created\")"
|
"print(\"trainStep created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Define a Step that consumes intermediate data and produces intermediate data\n",
|
"#### Define a Step that consumes intermediate data and produces intermediate data\n",
|
||||||
"In this step, we define a step that consumes an intermediate data and produces intermediate data.\n",
|
"In this step, we define a step that consumes an intermediate data and produces intermediate data.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Open `extract.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
"**Open `extract.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# step5 to use the intermediate data produced by step4\n",
|
"# step5 to use the intermediate data produced by step4\n",
|
||||||
"# This step also produces an output processed_data2\n",
|
"# This step also produces an output processed_data2\n",
|
||||||
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"extractStep = PythonScriptStep(\n",
|
"extractStep = PythonScriptStep(\n",
|
||||||
" script_name=\"extract.py\",\n",
|
" script_name=\"extract.py\",\n",
|
||||||
" arguments=[\"--input_extract\", processed_data1, \"--output_extract\", processed_data2],\n",
|
" arguments=[\"--input_extract\", processed_data1, \"--output_extract\", processed_data2],\n",
|
||||||
" inputs=[processed_data1],\n",
|
" inputs=[processed_data1],\n",
|
||||||
" outputs=[processed_data2],\n",
|
" outputs=[processed_data2],\n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory=project_folder)\n",
|
" source_directory=project_folder)\n",
|
||||||
"print(\"extractStep created\")"
|
"print(\"extractStep created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Define a Step that consumes multiple intermediate data and produces intermediate data\n",
|
"#### Define a Step that consumes multiple intermediate data and produces intermediate data\n",
|
||||||
"In this step, we define a step that consumes multiple intermediate data and produces intermediate data.\n",
|
"In this step, we define a step that consumes multiple intermediate data and produces intermediate data.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Open `compare.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.**"
|
"**Open `compare.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.**"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Now define step6 that takes two inputs (both intermediate data), and produce an output\n",
|
"# Now define step6 that takes two inputs (both intermediate data), and produce an output\n",
|
||||||
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"compareStep = PythonScriptStep(\n",
|
"compareStep = PythonScriptStep(\n",
|
||||||
" script_name=\"compare.py\",\n",
|
" script_name=\"compare.py\",\n",
|
||||||
" arguments=[\"--compare_data1\", processed_data1, \"--compare_data2\", processed_data2, \"--output_compare\", processed_data3],\n",
|
" arguments=[\"--compare_data1\", processed_data1, \"--compare_data2\", processed_data2, \"--output_compare\", processed_data3],\n",
|
||||||
" inputs=[processed_data1, processed_data2],\n",
|
" inputs=[processed_data1, processed_data2],\n",
|
||||||
" outputs=[processed_data3], \n",
|
" outputs=[processed_data3], \n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory=project_folder)\n",
|
" source_directory=project_folder)\n",
|
||||||
"print(\"compareStep created\")"
|
"print(\"compareStep created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Build the pipeline"
|
"#### Build the pipeline"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"pipeline1 = Pipeline(workspace=ws, steps=[compareStep])\n",
|
"pipeline1 = Pipeline(workspace=ws, steps=[compareStep])\n",
|
||||||
"print (\"Pipeline is built\")\n",
|
"print (\"Pipeline is built\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"pipeline1.validate()\n",
|
"pipeline1.validate()\n",
|
||||||
"print(\"Simple validation complete\") "
|
"print(\"Simple validation complete\") "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"pipeline_run1 = Experiment(ws, 'Data_dependency').submit(pipeline1)\n",
|
"pipeline_run1 = Experiment(ws, 'Data_dependency').submit(pipeline1)\n",
|
||||||
"print(\"Pipeline is submitted for execution\")"
|
"print(\"Pipeline is submitted for execution\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"RunDetails(pipeline_run1).show()"
|
"RunDetails(pipeline_run1).show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Next: Publishing the Pipeline and calling it from the REST endpoint\n",
|
"# Next: Publishing the Pipeline and calling it from the REST endpoint\n",
|
||||||
"See this [notebook](./aml-pipelines-publish-and-run-using-rest-endpoint.ipynb) to understand how the pipeline is published and you can call the REST endpoint to run the pipeline."
|
"See this [notebook](./aml-pipelines-publish-and-run-using-rest-endpoint.ipynb) to understand how the pipeline is published and you can call the REST endpoint to run the pipeline."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "diray"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python3"
|
"name": "diray"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.7"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,394 +1,394 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Distributed CNTK using custom docker images\n",
|
"# Distributed CNTK using custom docker images\n",
|
||||||
"In this tutorial, you will train a CNTK model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using a custom docker image and distributed training."
|
"In this tutorial, you will train a CNTK model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using a custom docker image and distributed training."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
||||||
"* Go through the [00.configuration.ipynb]() notebook to:\n",
|
"* Go through the [configuration notebook](../../../configuration.ipynb) to:\n",
|
||||||
" * install the AML SDK\n",
|
" * install the AML SDK\n",
|
||||||
" * create a workspace and its configuration file (`config.json`)"
|
" * create a workspace and its configuration file (`config.json`)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"Diagnostics"
|
"Diagnostics"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"\n",
|
"\n",
|
||||||
"set_diagnostics_collection(send_diagnostics=True)"
|
"set_diagnostics_collection(send_diagnostics=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize workspace\n",
|
"## Initialize workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a [Workspace](https://review.docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture?branch=release-ignite-aml#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
"Initialize a [Workspace](https://review.docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture?branch=release-ignite-aml#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print('Workspace name: ' + ws.name, \n",
|
"print('Workspace name: ' + ws.name, \n",
|
||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create or Attach existing AmlCompute\n",
|
"## Create or Attach existing AmlCompute\n",
|
||||||
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# choose a name for your cluster\n",
|
||||||
"cluster_name = \"gpucluster\"\n",
|
"cluster_name = \"gpucluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||||
" print('Found existing compute target.')\n",
|
" print('Found existing compute target.')\n",
|
||||||
"except ComputeTargetException:\n",
|
"except ComputeTargetException:\n",
|
||||||
" print('Creating a new compute target...')\n",
|
" print('Creating a new compute target...')\n",
|
||||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
|
||||||
" max_nodes=4)\n",
|
" max_nodes=4)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # create the cluster\n",
|
" # create the cluster\n",
|
||||||
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" compute_target.wait_for_completion(show_output=True)\n",
|
" compute_target.wait_for_completion(show_output=True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Use the 'status' property to get a detailed status for the current AmlCompute. \n",
|
"# use get_status() to get a detailed status for the current AmlCompute. \n",
|
||||||
"print(compute_target.status.serialize())"
|
"print(compute_target.get_status().serialize())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Upload training data\n",
|
"## Upload training data\n",
|
||||||
"For this tutorial, we will be using the MNIST dataset.\n",
|
"For this tutorial, we will be using the MNIST dataset.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"First, let's download the dataset. We've included the `install_mnist.py` script to download the data and convert it to a CNTK-supported format. Our data files will get written to a directory named `'mnist'`."
|
"First, let's download the dataset. We've included the `install_mnist.py` script to download the data and convert it to a CNTK-supported format. Our data files will get written to a directory named `'mnist'`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import install_mnist\n",
|
"import install_mnist\n",
|
||||||
"\n",
|
"\n",
|
||||||
"install_mnist.main('mnist')"
|
"install_mnist.main('mnist')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n",
|
"To make the data accessible for remote training, you will need to upload the data from your local machine to the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore, which we will then mount on the remote compute for training in the next section."
|
"Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore, which we will then mount on the remote compute for training in the next section."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ds = ws.get_default_datastore()\n",
|
"ds = ws.get_default_datastore()\n",
|
||||||
"print(ds.datastore_type, ds.account_name, ds.container_name)"
|
"print(ds.datastore_type, ds.account_name, ds.container_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The following code will upload the training data to the path `./mnist` on the default datastore."
|
"The following code will upload the training data to the path `./mnist` on the default datastore."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ds.upload(src_dir='./mnist', target_path='./mnist')"
|
"ds.upload(src_dir='./mnist', target_path='./mnist')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. "
|
"Now let's get a reference to the path on the datastore with the training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--data_dir` argument. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"path_on_datastore = 'mnist'\n",
|
"path_on_datastore = 'mnist'\n",
|
||||||
"ds_data = ds.path(path_on_datastore)\n",
|
"ds_data = ds.path(path_on_datastore)\n",
|
||||||
"print(ds_data)"
|
"print(ds_data)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train model on the remote compute\n",
|
"## Train model on the remote compute\n",
|
||||||
"Now that we have the cluster ready to go, let's run our distributed training job."
|
"Now that we have the cluster ready to go, let's run our distributed training job."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a project directory\n",
|
"### Create a project directory\n",
|
||||||
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on."
|
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
"project_folder = './cntk-distr'\n",
|
"project_folder = './cntk-distr'\n",
|
||||||
"os.makedirs(project_folder, exist_ok=True)"
|
"os.makedirs(project_folder, exist_ok=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copy the training script `cntk_distr_mnist.py` into this project directory."
|
"Copy the training script `cntk_distr_mnist.py` into this project directory."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import shutil\n",
|
"import shutil\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shutil.copy('cntk_distr_mnist.py', project_folder)"
|
"shutil.copy('cntk_distr_mnist.py', project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create an experiment\n",
|
"### Create an experiment\n",
|
||||||
"Create an [experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed CNTK tutorial. "
|
"Create an [experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed CNTK tutorial. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment_name = 'cntk-distr'\n",
|
"experiment_name = 'cntk-distr'\n",
|
||||||
"experiment = Experiment(ws, name=experiment_name)"
|
"experiment = Experiment(ws, name=experiment_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create an Estimator\n",
|
"### Create an Estimator\n",
|
||||||
"The AML SDK's base Estimator enables you to easily submit custom scripts for both single-node and distributed runs. You should this generic estimator for training code using frameworks such as sklearn or CNTK that don't have corresponding custom estimators. For more information on using the generic estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models)."
|
"The AML SDK's base Estimator enables you to easily submit custom scripts for both single-node and distributed runs. You should this generic estimator for training code using frameworks such as sklearn or CNTK that don't have corresponding custom estimators. For more information on using the generic estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.estimator import *\n",
|
"from azureml.train.estimator import *\n",
|
||||||
"\n",
|
"\n",
|
||||||
"script_params = {\n",
|
"script_params = {\n",
|
||||||
" '--num_epochs': 20,\n",
|
" '--num_epochs': 20,\n",
|
||||||
" '--data_dir': ds_data.as_mount(),\n",
|
" '--data_dir': ds_data.as_mount(),\n",
|
||||||
" '--output_dir': './outputs'\n",
|
" '--output_dir': './outputs'\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"estimator = Estimator(source_directory=project_folder,\n",
|
"estimator = Estimator(source_directory=project_folder,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" entry_script='cntk_distr_mnist.py',\n",
|
" entry_script='cntk_distr_mnist.py',\n",
|
||||||
" script_params=script_params,\n",
|
" script_params=script_params,\n",
|
||||||
" node_count=2,\n",
|
" node_count=2,\n",
|
||||||
" process_count_per_node=1,\n",
|
" process_count_per_node=1,\n",
|
||||||
" distributed_backend='mpi', \n",
|
" distributed_backend='mpi', \n",
|
||||||
" pip_packages=['cntk-gpu==2.6'],\n",
|
" pip_packages=['cntk-gpu==2.6'],\n",
|
||||||
" custom_docker_base_image='microsoft/mmlspark:gpu-0.12',\n",
|
" custom_docker_base_image='microsoft/mmlspark:gpu-0.12',\n",
|
||||||
" use_gpu=True)"
|
" use_gpu=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_base_image`. You can only provide images available in public docker repositories such as Docker Hub using this argument. To use an image from a private docker repository, use the constructor's `environment_definition` parameter instead. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n",
|
"We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_base_image`. You can only provide images available in public docker repositories such as Docker Hub using this argument. To use an image from a private docker repository, use the constructor's `environment_definition` parameter instead. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_backend='mpi'`."
|
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_backend='mpi'`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Submit job\n",
|
"### Submit job\n",
|
||||||
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run = experiment.submit(estimator)\n",
|
"run = experiment.submit(estimator)\n",
|
||||||
"print(run)"
|
"print(run)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Monitor your run\n",
|
"### Monitor your run\n",
|
||||||
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"\n",
|
||||||
"RunDetails(run).show()"
|
"RunDetails(run).show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Alternatively, you can block until the script has completed training before running more code."
|
"Alternatively, you can block until the script has completed training before running more code."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output=True)"
|
"run.wait_for_completion(show_output=True)"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "minxia"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "minxia"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,335 +1,335 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Distributed PyTorch with Horovod\n",
|
"# Distributed PyTorch with Horovod\n",
|
||||||
"In this tutorial, you will train a PyTorch model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using distributed training via [Horovod](https://github.com/uber/horovod) across a GPU cluster."
|
"In this tutorial, you will train a PyTorch model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using distributed training via [Horovod](https://github.com/uber/horovod) across a GPU cluster."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"* Go through the [Configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`\n",
|
"* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`\n",
|
||||||
"* Review the [tutorial](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) on single-node PyTorch training using Azure Machine Learning"
|
"* Review the [tutorial](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) on single-node PyTorch training using Azure Machine Learning"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"Diagnostics"
|
"Diagnostics"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"\n",
|
"\n",
|
||||||
"set_diagnostics_collection(send_diagnostics=True)"
|
"set_diagnostics_collection(send_diagnostics=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize workspace\n",
|
"## Initialize workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print('Workspace name: ' + ws.name, \n",
|
"print('Workspace name: ' + ws.name, \n",
|
||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create or attach existing AmlCompute\n",
|
"## Create or attach existing AmlCompute\n",
|
||||||
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# choose a name for your cluster\n",
|
||||||
"cluster_name = \"gpucluster\"\n",
|
"cluster_name = \"gpucluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||||
" print('Found existing compute target.')\n",
|
" print('Found existing compute target.')\n",
|
||||||
"except ComputeTargetException:\n",
|
"except ComputeTargetException:\n",
|
||||||
" print('Creating a new compute target...')\n",
|
" print('Creating a new compute target...')\n",
|
||||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
|
||||||
" max_nodes=4)\n",
|
" max_nodes=4)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # create the cluster\n",
|
" # create the cluster\n",
|
||||||
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" compute_target.wait_for_completion(show_output=True)\n",
|
" compute_target.wait_for_completion(show_output=True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Use the 'status' property to get a detailed status for the current AmlCompute. \n",
|
"# use get_status() to get a detailed status for the current AmlCompute. \n",
|
||||||
"print(compute_target.status.serialize())"
|
"print(compute_target.get_status().serialize())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
"The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train model on the remote compute\n",
|
"## Train model on the remote compute\n",
|
||||||
"Now that we have the AmlCompute ready to go, let's run our distributed training job."
|
"Now that we have the AmlCompute ready to go, let's run our distributed training job."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a project directory\n",
|
"### Create a project directory\n",
|
||||||
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
|
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
"project_folder = './pytorch-distr-hvd'\n",
|
"project_folder = './pytorch-distr-hvd'\n",
|
||||||
"os.makedirs(project_folder, exist_ok=True)"
|
"os.makedirs(project_folder, exist_ok=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Prepare training script\n",
|
"### Prepare training script\n",
|
||||||
"Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `pytorch_horovod_mnist.py`. In practice, you should be able to take any custom PyTorch training script as is and run it with Azure ML without having to modify your code.\n",
|
"Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `pytorch_horovod_mnist.py`. In practice, you should be able to take any custom PyTorch training script as is and run it with Azure ML without having to modify your code.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"However, if you would like to use Azure ML's [metric logging](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#logging) capabilities, you will have to add a small amount of Azure ML logic inside your training script. In this example, at each logging interval, we will log the loss for that minibatch to our Azure ML run.\n",
|
"However, if you would like to use Azure ML's [metric logging](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#logging) capabilities, you will have to add a small amount of Azure ML logic inside your training script. In this example, at each logging interval, we will log the loss for that minibatch to our Azure ML run.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To do so, in `pytorch_horovod_mnist.py`, we will first access the Azure ML `Run` object within the script:\n",
|
"To do so, in `pytorch_horovod_mnist.py`, we will first access the Azure ML `Run` object within the script:\n",
|
||||||
"```Python\n",
|
"```Python\n",
|
||||||
"from azureml.core.run import Run\n",
|
"from azureml.core.run import Run\n",
|
||||||
"run = Run.get_context()\n",
|
"run = Run.get_context()\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"Later within the script, we log the loss metric to our run:\n",
|
"Later within the script, we log the loss metric to our run:\n",
|
||||||
"```Python\n",
|
"```Python\n",
|
||||||
"run.log('loss', loss.item())\n",
|
"run.log('loss', loss.item())\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Once your script is ready, copy the training script `pytorch_horovod_mnist.py` into the project directory."
|
"Once your script is ready, copy the training script `pytorch_horovod_mnist.py` into the project directory."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import shutil\n",
|
"import shutil\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shutil.copy('pytorch_horovod_mnist.py', project_folder)"
|
"shutil.copy('pytorch_horovod_mnist.py', project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create an experiment\n",
|
"### Create an experiment\n",
|
||||||
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial. "
|
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment_name = 'pytorch-distr-hvd'\n",
|
"experiment_name = 'pytorch-distr-hvd'\n",
|
||||||
"experiment = Experiment(ws, name=experiment_name)"
|
"experiment = Experiment(ws, name=experiment_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a PyTorch estimator\n",
|
"### Create a PyTorch estimator\n",
|
||||||
"The Azure ML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch)."
|
"The Azure ML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.dnn import PyTorch\n",
|
"from azureml.train.dnn import PyTorch\n",
|
||||||
"\n",
|
"\n",
|
||||||
"estimator = PyTorch(source_directory=project_folder,\n",
|
"estimator = PyTorch(source_directory=project_folder,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" entry_script='pytorch_horovod_mnist.py',\n",
|
" entry_script='pytorch_horovod_mnist.py',\n",
|
||||||
" node_count=2,\n",
|
" node_count=2,\n",
|
||||||
" process_count_per_node=1,\n",
|
" process_count_per_node=1,\n",
|
||||||
" distributed_backend='mpi',\n",
|
" distributed_backend='mpi',\n",
|
||||||
" use_gpu=True)"
|
" use_gpu=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters."
|
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Submit job\n",
|
"### Submit job\n",
|
||||||
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run = experiment.submit(estimator)\n",
|
"run = experiment.submit(estimator)\n",
|
||||||
"print(run)"
|
"print(run)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Monitor your run\n",
|
"### Monitor your run\n",
|
||||||
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run."
|
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"\n",
|
||||||
"RunDetails(run).show()"
|
"RunDetails(run).show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Alternatively, you can block until the script has completed training before running more code."
|
"Alternatively, you can block until the script has completed training before running more code."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output=True) # this provides a verbose log"
|
"run.wait_for_completion(show_output=True) # this provides a verbose log"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "minxia"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "minxia"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"msauthor": "minxia"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"msauthor": "minxia"
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,404 +1,404 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Distributed Tensorflow with Horovod\n",
|
"# Distributed Tensorflow with Horovod\n",
|
||||||
"In this tutorial, you will train a word2vec model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)."
|
"In this tutorial, you will train a word2vec model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n",
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n",
|
||||||
"* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n",
|
"* Go through the [configuration notebook](../../../configuration.ipynb) to:\n",
|
||||||
" * install the AML SDK\n",
|
" * install the AML SDK\n",
|
||||||
" * create a workspace and its configuration file (`config.json`)\n",
|
" * create a workspace and its configuration file (`config.json`)\n",
|
||||||
"* Review the [tutorial](https://aka.ms/aml-notebook-hyperdrive) on single-node TensorFlow training using the SDK"
|
"* Review the [tutorial](https://aka.ms/aml-notebook-hyperdrive) on single-node TensorFlow training using the SDK"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"Diagnostics"
|
"Diagnostics"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"\n",
|
"\n",
|
||||||
"set_diagnostics_collection(send_diagnostics=True)"
|
"set_diagnostics_collection(send_diagnostics=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize workspace\n",
|
"## Initialize workspace\n",
|
||||||
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print('Workspace name: ' + ws.name, \n",
|
"print('Workspace name: ' + ws.name, \n",
|
||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create or Attach existing AmlCompute\n",
|
"## Create or Attach existing AmlCompute\n",
|
||||||
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# choose a name for your cluster\n",
|
||||||
"cluster_name = \"gpucluster\"\n",
|
"cluster_name = \"gpucluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||||
" print('Found existing compute target')\n",
|
" print('Found existing compute target')\n",
|
||||||
"except ComputeTargetException:\n",
|
"except ComputeTargetException:\n",
|
||||||
" print('Creating a new compute target...')\n",
|
" print('Creating a new compute target...')\n",
|
||||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
||||||
" max_nodes=4)\n",
|
" max_nodes=4)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # create the cluster\n",
|
" # create the cluster\n",
|
||||||
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" compute_target.wait_for_completion(show_output=True)\n",
|
" compute_target.wait_for_completion(show_output=True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Use the 'status' property to get a detailed status for the current cluster. \n",
|
"# use get_status() to get a detailed status for the current cluster. \n",
|
||||||
"print(compute_target.status.serialize())"
|
"print(compute_target.get_status().serialize())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Upload data to datastore\n",
|
"## Upload data to datastore\n",
|
||||||
"To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n",
|
"To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality."
|
"If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"First, download the training data from [here](http://mattmahoney.net/dc/text8.zip) to your local machine:"
|
"First, download the training data from [here](http://mattmahoney.net/dc/text8.zip) to your local machine:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import urllib\n",
|
"import urllib\n",
|
||||||
"\n",
|
"\n",
|
||||||
"os.makedirs('./data', exist_ok=True)\n",
|
"os.makedirs('./data', exist_ok=True)\n",
|
||||||
"download_url = 'http://mattmahoney.net/dc/text8.zip'\n",
|
"download_url = 'http://mattmahoney.net/dc/text8.zip'\n",
|
||||||
"urllib.request.urlretrieve(download_url, filename='./data/text8.zip')"
|
"urllib.request.urlretrieve(download_url, filename='./data/text8.zip')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore."
|
"Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ds = ws.get_default_datastore()\n",
|
"ds = ws.get_default_datastore()\n",
|
||||||
"print(ds.datastore_type, ds.account_name, ds.container_name)"
|
"print(ds.datastore_type, ds.account_name, ds.container_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Upload the contents of the data directory to the path `./data` on the default datastore."
|
"Upload the contents of the data directory to the path `./data` on the default datastore."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)"
|
"ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"For convenience, let's get a reference to the path on the datastore with the zip file of training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--input_data` argument. "
|
"For convenience, let's get a reference to the path on the datastore with the zip file of training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--input_data` argument. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"path_on_datastore = 'data/text8.zip'\n",
|
"path_on_datastore = 'data/text8.zip'\n",
|
||||||
"ds_data = ds.path(path_on_datastore)\n",
|
"ds_data = ds.path(path_on_datastore)\n",
|
||||||
"print(ds_data)"
|
"print(ds_data)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train model on the remote compute"
|
"## Train model on the remote compute"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a project directory\n",
|
"### Create a project directory\n",
|
||||||
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on."
|
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
"project_folder = './tf-distr-hvd'\n",
|
"project_folder = './tf-distr-hvd'\n",
|
||||||
"os.makedirs(project_folder, exist_ok=True)"
|
"os.makedirs(project_folder, exist_ok=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copy the training script `tf_horovod_word2vec.py` into this project directory."
|
"Copy the training script `tf_horovod_word2vec.py` into this project directory."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import shutil\n",
|
"import shutil\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shutil.copy('tf_horovod_word2vec.py', project_folder)"
|
"shutil.copy('tf_horovod_word2vec.py', project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create an experiment\n",
|
"### Create an experiment\n",
|
||||||
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. "
|
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment_name = 'tf-distr-hvd'\n",
|
"experiment_name = 'tf-distr-hvd'\n",
|
||||||
"experiment = Experiment(ws, name=experiment_name)"
|
"experiment = Experiment(ws, name=experiment_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a TensorFlow estimator\n",
|
"### Create a TensorFlow estimator\n",
|
||||||
"The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)."
|
"The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.dnn import TensorFlow\n",
|
"from azureml.train.dnn import TensorFlow\n",
|
||||||
"\n",
|
"\n",
|
||||||
"script_params={\n",
|
"script_params={\n",
|
||||||
" '--input_data': ds_data\n",
|
" '--input_data': ds_data\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"estimator= TensorFlow(source_directory=project_folder,\n",
|
"estimator= TensorFlow(source_directory=project_folder,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" script_params=script_params,\n",
|
" script_params=script_params,\n",
|
||||||
" entry_script='tf_horovod_word2vec.py',\n",
|
" entry_script='tf_horovod_word2vec.py',\n",
|
||||||
" node_count=2,\n",
|
" node_count=2,\n",
|
||||||
" process_count_per_node=1,\n",
|
" process_count_per_node=1,\n",
|
||||||
" distributed_backend='mpi',\n",
|
" distributed_backend='mpi',\n",
|
||||||
" use_gpu=True)"
|
" use_gpu=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n",
|
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore."
|
"Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Submit job\n",
|
"### Submit job\n",
|
||||||
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run = experiment.submit(estimator)\n",
|
"run = experiment.submit(estimator)\n",
|
||||||
"print(run)"
|
"print(run)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Monitor your run\n",
|
"### Monitor your run\n",
|
||||||
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(run).show()"
|
"RunDetails(run).show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Alternatively, you can block until the script has completed training before running more code."
|
"Alternatively, you can block until the script has completed training before running more code."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output=True)"
|
"run.wait_for_completion(show_output=True)"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "roastala"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"msauthor": "minxia"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"msauthor": "minxia"
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,317 +1,317 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Distributed TensorFlow with parameter server\n",
|
"# Distributed TensorFlow with parameter server\n",
|
||||||
"In this tutorial, you will train a TensorFlow model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using native [distributed TensorFlow](https://www.tensorflow.org/deploy/distributed)."
|
"In this tutorial, you will train a TensorFlow model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using native [distributed TensorFlow](https://www.tensorflow.org/deploy/distributed)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n",
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n",
|
||||||
"* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n",
|
"* Go through the [configuration notebook](../../../configuration.ipynb) to:\n",
|
||||||
" * install the AML SDK\n",
|
" * install the AML SDK\n",
|
||||||
" * create a workspace and its configuration file (`config.json`)\n",
|
" * create a workspace and its configuration file (`config.json`)\n",
|
||||||
"* Review the [tutorial](https://aka.ms/aml-notebook-hyperdrive) on single-node TensorFlow training using the SDK"
|
"* Review the [tutorial](https://aka.ms/aml-notebook-hyperdrive) on single-node TensorFlow training using the SDK"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"Diagnostics"
|
"Diagnostics"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"\n",
|
"\n",
|
||||||
"set_diagnostics_collection(send_diagnostics=True)"
|
"set_diagnostics_collection(send_diagnostics=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize workspace\n",
|
"## Initialize workspace\n",
|
||||||
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print('Workspace name: ' + ws.name, \n",
|
"print('Workspace name: ' + ws.name, \n",
|
||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create or Attach existing AmlCompute\n",
|
"## Create or Attach existing AmlCompute\n",
|
||||||
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# choose a name for your cluster\n",
|
||||||
"cluster_name = \"gpucluster\"\n",
|
"cluster_name = \"gpucluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||||
" print('Found existing compute target.')\n",
|
" print('Found existing compute target.')\n",
|
||||||
"except ComputeTargetException:\n",
|
"except ComputeTargetException:\n",
|
||||||
" print('Creating a new compute target...')\n",
|
" print('Creating a new compute target...')\n",
|
||||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
||||||
" max_nodes=4)\n",
|
" max_nodes=4)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # create the cluster\n",
|
" # create the cluster\n",
|
||||||
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" compute_target.wait_for_completion(show_output=True)\n",
|
" compute_target.wait_for_completion(show_output=True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Use the 'status' property to get a detailed status for the current cluster. \n",
|
"# use get_status() to get a detailed status for the current cluster. \n",
|
||||||
"print(compute_target.status.serialize())"
|
"print(compute_target.get_status().serialize())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train model on the remote compute\n",
|
"## Train model on the remote compute\n",
|
||||||
"Now that we have the cluster ready to go, let's run our distributed training job."
|
"Now that we have the cluster ready to go, let's run our distributed training job."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a project directory\n",
|
"### Create a project directory\n",
|
||||||
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on."
|
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
"project_folder = './tf-distr-ps'\n",
|
"project_folder = './tf-distr-ps'\n",
|
||||||
"os.makedirs(project_folder, exist_ok=True)"
|
"os.makedirs(project_folder, exist_ok=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copy the training script `tf_mnist_replica.py` into this project directory."
|
"Copy the training script `tf_mnist_replica.py` into this project directory."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import shutil\n",
|
"import shutil\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shutil.copy('tf_mnist_replica.py', project_folder)"
|
"shutil.copy('tf_mnist_replica.py', project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create an experiment\n",
|
"### Create an experiment\n",
|
||||||
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. "
|
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment_name = 'tf-distr-ps'\n",
|
"experiment_name = 'tf-distr-ps'\n",
|
||||||
"experiment = Experiment(ws, name=experiment_name)"
|
"experiment = Experiment(ws, name=experiment_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a TensorFlow estimator\n",
|
"### Create a TensorFlow estimator\n",
|
||||||
"The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)."
|
"The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.dnn import TensorFlow\n",
|
"from azureml.train.dnn import TensorFlow\n",
|
||||||
"\n",
|
"\n",
|
||||||
"script_params={\n",
|
"script_params={\n",
|
||||||
" '--num_gpus': 1,\n",
|
" '--num_gpus': 1,\n",
|
||||||
" '--train_steps': 500\n",
|
" '--train_steps': 500\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"estimator = TensorFlow(source_directory=project_folder,\n",
|
"estimator = TensorFlow(source_directory=project_folder,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" script_params=script_params,\n",
|
" script_params=script_params,\n",
|
||||||
" entry_script='tf_mnist_replica.py',\n",
|
" entry_script='tf_mnist_replica.py',\n",
|
||||||
" node_count=2,\n",
|
" node_count=2,\n",
|
||||||
" worker_count=2,\n",
|
" worker_count=2,\n",
|
||||||
" parameter_server_count=1, \n",
|
" parameter_server_count=1, \n",
|
||||||
" distributed_backend='ps',\n",
|
" distributed_backend='ps',\n",
|
||||||
" use_gpu=True)"
|
" use_gpu=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters."
|
"The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Submit job\n",
|
"### Submit job\n",
|
||||||
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run = experiment.submit(estimator)\n",
|
"run = experiment.submit(estimator)\n",
|
||||||
"print(run)"
|
"print(run)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Monitor your run\n",
|
"### Monitor your run\n",
|
||||||
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"\n",
|
||||||
"RunDetails(run).show()"
|
"RunDetails(run).show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Alternatively, you can block until the script has completed training before running more code."
|
"Alternatively, you can block until the script has completed training before running more code."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output=True) # this provides a verbose log"
|
"run.wait_for_completion(show_output=True) # this provides a verbose log"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "minxia"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "minxia"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"msauthor": "minxia"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"msauthor": "minxia"
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,267 +1,267 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Export Run History as Tensorboard logs\n",
|
"# Export Run History as Tensorboard logs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. Run some training and log some metrics into Run History\n",
|
"1. Run some training and log some metrics into Run History\n",
|
||||||
"2. Export the run history to some directory as Tensorboard logs\n",
|
"2. Export the run history to some directory as Tensorboard logs\n",
|
||||||
"3. Launch a local Tensorboard to view the run history"
|
"3. Launch a local Tensorboard to view the run history"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
||||||
"* Go through the [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook to:\n",
|
"* Go through the [configuration notebook](../../../configuration.ipynb) notebook to:\n",
|
||||||
" * install the AML SDK\n",
|
" * install the AML SDK\n",
|
||||||
" * create a workspace and its configuration file (`config.json`)"
|
" * create a workspace and its configuration file (`config.json`)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Install the Azure ML TensorBoard integration package if you haven't already."
|
"Install the Azure ML TensorBoard integration package if you haven't already."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"!pip install azureml-contrib-tensorboard"
|
"!pip install azureml-contrib-tensorboard"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration."
|
"Initialize a workspace object from persisted configuration."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace, Run, Experiment\n",
|
"from azureml.core import Workspace, Run, Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print('Workspace name: ' + ws.name, \n",
|
"print('Workspace name: ' + ws.name, \n",
|
||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Set experiment name and start the run"
|
"## Set experiment name and start the run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"experiment_name = 'export-to-tensorboard'\n",
|
"experiment_name = 'export-to-tensorboard'\n",
|
||||||
"exp = Experiment(ws, experiment_name)\n",
|
"exp = Experiment(ws, experiment_name)\n",
|
||||||
"root_run = exp.start_logging()"
|
"root_run = exp.start_logging()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n",
|
"# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n",
|
||||||
"from sklearn.datasets import load_diabetes\n",
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
"from sklearn.linear_model import Ridge\n",
|
"from sklearn.linear_model import Ridge\n",
|
||||||
"from sklearn.metrics import mean_squared_error\n",
|
"from sklearn.metrics import mean_squared_error\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"\n",
|
"\n",
|
||||||
"X, y = load_diabetes(return_X_y=True)\n",
|
"X, y = load_diabetes(return_X_y=True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
||||||
"\n",
|
"\n",
|
||||||
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n",
|
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n",
|
||||||
"data = {\n",
|
"data = {\n",
|
||||||
" \"train\":{\"x\":x_train, \"y\":y_train}, \n",
|
" \"train\":{\"x\":x_train, \"y\":y_train}, \n",
|
||||||
" \"test\":{\"x\":x_test, \"y\":y_test}\n",
|
" \"test\":{\"x\":x_test, \"y\":y_test}\n",
|
||||||
"}"
|
"}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Example experiment\n",
|
"# Example experiment\n",
|
||||||
"from tqdm import tqdm\n",
|
"from tqdm import tqdm\n",
|
||||||
"\n",
|
"\n",
|
||||||
"alphas = [.1, .2, .3, .4, .5, .6 , .7]\n",
|
"alphas = [.1, .2, .3, .4, .5, .6 , .7]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# try a bunch of alpha values in a Linear Regression (Ridge) model\n",
|
"# try a bunch of alpha values in a Linear Regression (Ridge) model\n",
|
||||||
"for alpha in tqdm(alphas):\n",
|
"for alpha in tqdm(alphas):\n",
|
||||||
" # create a bunch of child runs\n",
|
" # create a bunch of child runs\n",
|
||||||
" with root_run.child_run(\"alpha\" + str(alpha)) as run:\n",
|
" with root_run.child_run(\"alpha\" + str(alpha)) as run:\n",
|
||||||
" # More data science stuff\n",
|
" # More data science stuff\n",
|
||||||
" reg = Ridge(alpha=alpha)\n",
|
" reg = Ridge(alpha=alpha)\n",
|
||||||
" reg.fit(data[\"train\"][\"x\"], data[\"train\"][\"y\"])\n",
|
" reg.fit(data[\"train\"][\"x\"], data[\"train\"][\"y\"])\n",
|
||||||
" # TODO save model\n",
|
" # TODO save model\n",
|
||||||
" preds = reg.predict(data[\"test\"][\"x\"])\n",
|
" preds = reg.predict(data[\"test\"][\"x\"])\n",
|
||||||
" mse = mean_squared_error(preds, data[\"test\"][\"y\"])\n",
|
" mse = mean_squared_error(preds, data[\"test\"][\"y\"])\n",
|
||||||
" # End train and eval\n",
|
" # End train and eval\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # log alpha, mean_squared_error and feature names in run history\n",
|
" # log alpha, mean_squared_error and feature names in run history\n",
|
||||||
" root_run.log(\"alpha\", alpha)\n",
|
" root_run.log(\"alpha\", alpha)\n",
|
||||||
" root_run.log(\"mse\", mse)"
|
" root_run.log(\"mse\", mse)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Export Run History to Tensorboard logs"
|
"## Export Run History to Tensorboard logs"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Export Run History to Tensorboard logs\n",
|
"# Export Run History to Tensorboard logs\n",
|
||||||
"from azureml.contrib.tensorboard.export import export_to_tensorboard\n",
|
"from azureml.contrib.tensorboard.export import export_to_tensorboard\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import tensorflow as tf\n",
|
"import tensorflow as tf\n",
|
||||||
"\n",
|
"\n",
|
||||||
"logdir = 'exportedTBlogs'\n",
|
"logdir = 'exportedTBlogs'\n",
|
||||||
"log_path = os.path.join(os.getcwd(), logdir)\n",
|
"log_path = os.path.join(os.getcwd(), logdir)\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" os.stat(log_path)\n",
|
" os.stat(log_path)\n",
|
||||||
"except os.error:\n",
|
"except os.error:\n",
|
||||||
" os.mkdir(log_path)\n",
|
" os.mkdir(log_path)\n",
|
||||||
"print(logdir)\n",
|
"print(logdir)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# export run history for the project\n",
|
"# export run history for the project\n",
|
||||||
"export_to_tensorboard(root_run, logdir)\n",
|
"export_to_tensorboard(root_run, logdir)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# or export a particular run\n",
|
"# or export a particular run\n",
|
||||||
"# export_to_tensorboard(run, logdir)"
|
"# export_to_tensorboard(run, logdir)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"root_run.complete()"
|
"root_run.complete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Start Tensorboard\n",
|
"## Start Tensorboard\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Or you can start the Tensorboard outside this notebook to view the result"
|
"Or you can start the Tensorboard outside this notebook to view the result"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.contrib.tensorboard import Tensorboard\n",
|
"from azureml.contrib.tensorboard import Tensorboard\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
|
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
|
||||||
"tb = Tensorboard([], local_root=logdir, port=6006)\n",
|
"tb = Tensorboard([], local_root=logdir, port=6006)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# If successful, start() returns a string with the URI of the instance.\n",
|
"# If successful, start() returns a string with the URI of the instance.\n",
|
||||||
"tb.start()"
|
"tb.start()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Stop Tensorboard\n",
|
"## Stop Tensorboard\n",
|
||||||
"\n",
|
"\n",
|
||||||
"When you're done, make sure to call the `stop()` method of the Tensorboard object."
|
"When you're done, make sure to call the `stop()` method of the Tensorboard object."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"tb.stop()"
|
"tb.stop()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "roastala"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,328 +1,328 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 06. Logging APIs\n",
|
"# 06. Logging APIs\n",
|
||||||
"This notebook showcase various ways to use the Azure Machine Learning service run logging APIs, and view the results in the Azure portal."
|
"This notebook showcase various ways to use the Azure Machine Learning service run logging APIs, and view the results in the Azure portal."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"Make sure you go through the [00. Installation and Configuration](../../00.configuration.ipynb) Notebook first if you haven't. Also make sure you have tqdm and matplotlib installed in the current kernel.\n",
|
"Make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't. Also make sure you have tqdm and matplotlib installed in the current kernel.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"(myenv) $ conda install -y tqdm matplotlib\n",
|
"(myenv) $ conda install -y tqdm matplotlib\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Validate Azure ML SDK installation and get version number for debugging purposes"
|
"## Validate Azure ML SDK installation and get version number for debugging purposes"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"install"
|
"install"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Experiment, Run, Workspace\n",
|
"from azureml.core import Experiment, Run, Workspace\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration."
|
"Initialize a workspace object from persisted configuration."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print('Workspace name: ' + ws.name, \n",
|
"print('Workspace name: ' + ws.name, \n",
|
||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep='\\n')"
|
" 'Resource group: ' + ws.resource_group, sep='\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Set experiment\n",
|
"## Set experiment\n",
|
||||||
"Create a new experiment (or get the one with such name)."
|
"Create a new experiment (or get the one with such name)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"exp = Experiment(workspace=ws, name='logging-api-test')"
|
"exp = Experiment(workspace=ws, name='logging-api-test')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Log metrics\n",
|
"## Log metrics\n",
|
||||||
"We will start a run, and use the various logging APIs to record different types of metrics during the run."
|
"We will start a run, and use the various logging APIs to record different types of metrics during the run."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from tqdm import tqdm\n",
|
"from tqdm import tqdm\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# start logging for the run\n",
|
"# start logging for the run\n",
|
||||||
"run = exp.start_logging()\n",
|
"run = exp.start_logging()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# log a string value\n",
|
"# log a string value\n",
|
||||||
"run.log(name='Name', value='Logging API run')\n",
|
"run.log(name='Name', value='Logging API run')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# log a numerical value\n",
|
"# log a numerical value\n",
|
||||||
"run.log(name='Magic Number', value=42)\n",
|
"run.log(name='Magic Number', value=42)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Log a list of values. Note this will generate a single-variable line chart.\n",
|
"# Log a list of values. Note this will generate a single-variable line chart.\n",
|
||||||
"run.log_list(name='Fibonacci', value=[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89])\n",
|
"run.log_list(name='Fibonacci', value=[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# create a dictionary to hold a table of values\n",
|
"# create a dictionary to hold a table of values\n",
|
||||||
"sines = {}\n",
|
"sines = {}\n",
|
||||||
"sines['angle'] = []\n",
|
"sines['angle'] = []\n",
|
||||||
"sines['sine'] = []\n",
|
"sines['sine'] = []\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for i in tqdm(range(-10, 10)):\n",
|
"for i in tqdm(range(-10, 10)):\n",
|
||||||
" # log a metric value repeatedly, this will generate a single-variable line chart.\n",
|
" # log a metric value repeatedly, this will generate a single-variable line chart.\n",
|
||||||
" run.log(name='Sigmoid', value=1 / (1 + np.exp(-i)))\n",
|
" run.log(name='Sigmoid', value=1 / (1 + np.exp(-i)))\n",
|
||||||
" angle = i / 2.0\n",
|
" angle = i / 2.0\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # log a 2 (or more) values as a metric repeatedly. This will generate a 2-variable line chart if you have 2 numerical columns.\n",
|
" # log a 2 (or more) values as a metric repeatedly. This will generate a 2-variable line chart if you have 2 numerical columns.\n",
|
||||||
" run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle))\n",
|
" run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle))\n",
|
||||||
" \n",
|
" \n",
|
||||||
" sines['angle'].append(angle)\n",
|
" sines['angle'].append(angle)\n",
|
||||||
" sines['sine'].append(np.sin(angle))\n",
|
" sines['sine'].append(np.sin(angle))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# log a dictionary as a table, this will generate a 2-variable chart if you have 2 numerical columns\n",
|
"# log a dictionary as a table, this will generate a 2-variable chart if you have 2 numerical columns\n",
|
||||||
"run.log_table(name='Sine Wave', value=sines)\n",
|
"run.log_table(name='Sine Wave', value=sines)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"run.complete()"
|
"run.complete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Even after the run is marked completed, you can still log things."
|
"Even after the run is marked completed, you can still log things."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Log an image\n",
|
"## Log an image\n",
|
||||||
"This is how to log a _matplotlib_ pyplot object."
|
"This is how to log a _matplotlib_ pyplot object."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib inline\n",
|
"%matplotlib inline\n",
|
||||||
"import matplotlib.pyplot as plt\n",
|
"import matplotlib.pyplot as plt\n",
|
||||||
"angle = np.linspace(-3, 3, 50)\n",
|
"angle = np.linspace(-3, 3, 50)\n",
|
||||||
"plt.plot(angle, np.tanh(angle), label='tanh')\n",
|
"plt.plot(angle, np.tanh(angle), label='tanh')\n",
|
||||||
"plt.legend(fontsize=12)\n",
|
"plt.legend(fontsize=12)\n",
|
||||||
"plt.title('Hyperbolic Tangent', fontsize=16)\n",
|
"plt.title('Hyperbolic Tangent', fontsize=16)\n",
|
||||||
"plt.grid(True)\n",
|
"plt.grid(True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"run.log_image(name='Hyperbolic Tangent', plot=plt)"
|
"run.log_image(name='Hyperbolic Tangent', plot=plt)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Upload a file"
|
"## Upload a file"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can also upload an abitrary file. First, let's create a dummy file locally."
|
"You can also upload an abitrary file. First, let's create a dummy file locally."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile myfile.txt\n",
|
"%%writefile myfile.txt\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This is a dummy file."
|
"This is a dummy file."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now let's upload this file into the run record as a run artifact, and display the properties after the upload."
|
"Now let's upload this file into the run record as a run artifact, and display the properties after the upload."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"props = run.upload_file(name='myfile_in_the_cloud.txt', path_or_stream='./myfile.txt')\n",
|
"props = run.upload_file(name='myfile_in_the_cloud.txt', path_or_stream='./myfile.txt')\n",
|
||||||
"props.serialize()"
|
"props.serialize()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Examine the run"
|
"## Examine the run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now let's take a look at the run detail page in Azure portal. Make sure you checkout the various charts and plots generated/uploaded."
|
"Now let's take a look at the run detail page in Azure portal. Make sure you checkout the various charts and plots generated/uploaded."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run"
|
"run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can get all the metrics in that run back."
|
"You can get all the metrics in that run back."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.get_metrics()"
|
"run.get_metrics()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can also see the files uploaded for this run."
|
"You can also see the files uploaded for this run."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.get_file_names()"
|
"run.get_file_names()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can also download all the files locally."
|
"You can also download all the files locally."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"os.makedirs('files', exist_ok=True)\n",
|
"os.makedirs('files', exist_ok=True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for f in run.get_file_names():\n",
|
"for f in run.get_file_names():\n",
|
||||||
" dest = os.path.join('files', f.split('/')[-1])\n",
|
" dest = os.path.join('files', f.split('/')[-1])\n",
|
||||||
" print('Downloading file {} to {}...'.format(f, dest))\n",
|
" print('Downloading file {} to {}...'.format(f, dest))\n",
|
||||||
" run.download_file(f, dest) "
|
" run.download_file(f, dest) "
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "haining"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "haining"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -1,478 +1,478 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 02. Train locally\n",
|
"# 02. Train locally\n",
|
||||||
"* Create or load workspace.\n",
|
"* Create or load workspace.\n",
|
||||||
"* Create scripts locally.\n",
|
"* Create scripts locally.\n",
|
||||||
"* Create `train.py` in a folder, along with a `my.lib` file.\n",
|
"* Create `train.py` in a folder, along with a `my.lib` file.\n",
|
||||||
"* Configure & execute a local run in a user-managed Python environment.\n",
|
"* Configure & execute a local run in a user-managed Python environment.\n",
|
||||||
"* Configure & execute a local run in a system-managed Python environment.\n",
|
"* Configure & execute a local run in a system-managed Python environment.\n",
|
||||||
"* Configure & execute a local run in a Docker environment.\n",
|
"* Configure & execute a local run in a Docker environment.\n",
|
||||||
"* Query run metrics to find the best model\n",
|
"* Query run metrics to find the best model\n",
|
||||||
"* Register model for operationalization."
|
"* Register model for operationalization."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't."
|
"Make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration."
|
"Initialize a workspace object from persisted configuration."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create An Experiment\n",
|
"## Create An Experiment\n",
|
||||||
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
|
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"experiment_name = 'train-on-local'\n",
|
"experiment_name = 'train-on-local'\n",
|
||||||
"exp = Experiment(workspace=ws, name=experiment_name)"
|
"exp = Experiment(workspace=ws, name=experiment_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## View `train.py`\n",
|
"## View `train.py`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`train.py` is already created for you."
|
"`train.py` is already created for you."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"with open('./train.py', 'r') as f:\n",
|
"with open('./train.py', 'r') as f:\n",
|
||||||
" print(f.read())"
|
" print(f.read())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Note `train.py` also references a `mylib.py` file."
|
"Note `train.py` also references a `mylib.py` file."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"with open('./mylib.py', 'r') as f:\n",
|
"with open('./mylib.py', 'r') as f:\n",
|
||||||
" print(f.read())"
|
" print(f.read())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configure & Run\n",
|
"## Configure & Run\n",
|
||||||
"### User-managed environment\n",
|
"### User-managed environment\n",
|
||||||
"Below, we use a user-managed run, which means you are responsible to ensure all the necessary packages are available in the Python environment you choose to run the script."
|
"Below, we use a user-managed run, which means you are responsible to ensure all the necessary packages are available in the Python environment you choose to run the script."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Editing a run configuration property on-fly.\n",
|
"# Editing a run configuration property on-fly.\n",
|
||||||
"run_config_user_managed = RunConfiguration()\n",
|
"run_config_user_managed = RunConfiguration()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"run_config_user_managed.environment.python.user_managed_dependencies = True\n",
|
"run_config_user_managed.environment.python.user_managed_dependencies = True\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# You can choose a specific Python environment by pointing to a Python path \n",
|
"# You can choose a specific Python environment by pointing to a Python path \n",
|
||||||
"#run_config.environment.python.interpreter_path = '/home/johndoe/miniconda3/envs/sdk2/bin/python'"
|
"#run_config.environment.python.interpreter_path = '/home/johndoe/miniconda3/envs/sdk2/bin/python'"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Submit script to run in the user-managed environment\n",
|
"#### Submit script to run in the user-managed environment\n",
|
||||||
"Note whole script folder is submitted for execution, including the `mylib.py` file."
|
"Note whole script folder is submitted for execution, including the `mylib.py` file."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import ScriptRunConfig\n",
|
"from azureml.core import ScriptRunConfig\n",
|
||||||
"\n",
|
"\n",
|
||||||
"src = ScriptRunConfig(source_directory='./', script='train.py', run_config=run_config_user_managed)\n",
|
"src = ScriptRunConfig(source_directory='./', script='train.py', run_config=run_config_user_managed)\n",
|
||||||
"run = exp.submit(src)"
|
"run = exp.submit(src)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Get run history details"
|
"#### Get run history details"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run"
|
"run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Block to wait till run finishes."
|
"Block to wait till run finishes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output=True)"
|
"run.wait_for_completion(show_output=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### System-managed environment\n",
|
"### System-managed environment\n",
|
||||||
"You can also ask the system to build a new conda environment and execute your scripts in it. The environment is built once and will be reused in subsequent executions as long as the conda dependencies remain unchanged. "
|
"You can also ask the system to build a new conda environment and execute your scripts in it. The environment is built once and will be reused in subsequent executions as long as the conda dependencies remain unchanged. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"\n",
|
"\n",
|
||||||
"run_config_system_managed = RunConfiguration()\n",
|
"run_config_system_managed = RunConfiguration()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"run_config_system_managed.environment.python.user_managed_dependencies = False\n",
|
"run_config_system_managed.environment.python.user_managed_dependencies = False\n",
|
||||||
"run_config_system_managed.auto_prepare_environment = True\n",
|
"run_config_system_managed.auto_prepare_environment = True\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Specify conda dependencies with scikit-learn\n",
|
"# Specify conda dependencies with scikit-learn\n",
|
||||||
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
|
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
|
||||||
"run_config_system_managed.environment.python.conda_dependencies = cd"
|
"run_config_system_managed.environment.python.conda_dependencies = cd"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Submit script to run in the system-managed environment\n",
|
"#### Submit script to run in the system-managed environment\n",
|
||||||
"A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies."
|
"A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_system_managed)\n",
|
"src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_system_managed)\n",
|
||||||
"run = exp.submit(src)"
|
"run = exp.submit(src)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Get run history details"
|
"#### Get run history details"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run"
|
"run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Block and wait till run finishes."
|
"Block and wait till run finishes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output = True)"
|
"run.wait_for_completion(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Docker-based execution\n",
|
"### Docker-based execution\n",
|
||||||
"**IMPORTANT**: You must have Docker engine installed locally in order to use this execution mode. If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.\n",
|
"**IMPORTANT**: You must have Docker engine installed locally in order to use this execution mode. If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.\n",
|
||||||
"NOTE: The GPU base image must be used on Microsoft Azure Services only such as ACI, AML Compute, Azure VMs, and AKS.\n",
|
"NOTE: The GPU base image must be used on Microsoft Azure Services only such as ACI, AML Compute, Azure VMs, and AKS.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can also ask the system to pull down a Docker image and execute your scripts in it."
|
"You can also ask the system to pull down a Docker image and execute your scripts in it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run_config_docker = RunConfiguration()\n",
|
"run_config_docker = RunConfiguration()\n",
|
||||||
"run_config_docker.environment.python.user_managed_dependencies = False\n",
|
"run_config_docker.environment.python.user_managed_dependencies = False\n",
|
||||||
"run_config_docker.auto_prepare_environment = True\n",
|
"run_config_docker.auto_prepare_environment = True\n",
|
||||||
"run_config_docker.environment.docker.enabled = True\n",
|
"run_config_docker.environment.docker.enabled = True\n",
|
||||||
"run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
"run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Specify conda dependencies with scikit-learn\n",
|
"# Specify conda dependencies with scikit-learn\n",
|
||||||
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
|
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
|
||||||
"run_config_docker.environment.python.conda_dependencies = cd\n",
|
"run_config_docker.environment.python.conda_dependencies = cd\n",
|
||||||
"\n",
|
"\n",
|
||||||
"src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_docker)"
|
"src = ScriptRunConfig(source_directory=\"./\", script='train.py', run_config=run_config_docker)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Submit script to run in the system-managed environment\n",
|
"Submit script to run in the system-managed environment\n",
|
||||||
"A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies.\n",
|
"A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import subprocess\n",
|
"import subprocess\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check if Docker is installed and Linux containers are enables\n",
|
"# Check if Docker is installed and Linux containers are enables\n",
|
||||||
"if subprocess.run(\"docker -v\", shell=True) == 0:\n",
|
"if subprocess.run(\"docker -v\", shell=True) == 0:\n",
|
||||||
" out = subprocess.check_output(\"docker system info\", shell=True, encoding=\"ascii\").split(\"\\n\")\n",
|
" out = subprocess.check_output(\"docker system info\", shell=True, encoding=\"ascii\").split(\"\\n\")\n",
|
||||||
" if not \"OSType: linux\" in out:\n",
|
" if not \"OSType: linux\" in out:\n",
|
||||||
" print(\"Switch Docker engine to use Linux containers.\")\n",
|
" print(\"Switch Docker engine to use Linux containers.\")\n",
|
||||||
" else:\n",
|
" else:\n",
|
||||||
" run = exp.submit(src)\n",
|
" run = exp.submit(src)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" print(\"Docker engine not installed.\")"
|
" print(\"Docker engine not installed.\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Get run history details\n",
|
"#Get run history details\n",
|
||||||
"run"
|
"run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output=True)"
|
"run.wait_for_completion(show_output=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Query run metrics"
|
"## Query run metrics"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"query history",
|
"query history",
|
||||||
"get metrics"
|
"get metrics"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# get all metris logged in the run\n",
|
"# get all metris logged in the run\n",
|
||||||
"run.get_metrics()\n",
|
"run.get_metrics()\n",
|
||||||
"metrics = run.get_metrics()"
|
"metrics = run.get_metrics()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Let's find the model that has the lowest MSE value logged."
|
"Let's find the model that has the lowest MSE value logged."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"\n",
|
"\n",
|
||||||
"best_alpha = metrics['alpha'][np.argmin(metrics['mse'])]\n",
|
"best_alpha = metrics['alpha'][np.argmin(metrics['mse'])]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n",
|
"print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n",
|
||||||
" min(metrics['mse']), \n",
|
" min(metrics['mse']), \n",
|
||||||
" best_alpha\n",
|
" best_alpha\n",
|
||||||
"))"
|
"))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can also list all the files that are associated with this run record"
|
"You can also list all the files that are associated with this run record"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.get_file_names()"
|
"run.get_file_names()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We know the model `ridge_0.40.pkl` is the best performing model from the eariler queries. So let's register it with the workspace."
|
"We know the model `ridge_0.40.pkl` is the best performing model from the eariler queries. So let's register it with the workspace."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# supply a model name, and the full path to the serialized model file.\n",
|
"# supply a model name, and the full path to the serialized model file.\n",
|
||||||
"model = run.register_model(model_name='best_ridge_model', model_path='./outputs/ridge_0.40.pkl')"
|
"model = run.register_model(model_name='best_ridge_model', model_path='./outputs/ridge_0.40.pkl')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(model.name, model.version, model.url)"
|
"print(model.name, model.version, model.url)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now you can deploy this model following the example in the 01 notebook."
|
"Now you can deploy this model following the example in the 01 notebook."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "roastala"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
}
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
6
how-to-use-azureml/training/train-on-remote-vm/train2.py
Normal file
6
how-to-use-azureml/training/train-on-remote-vm/train2.py
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
print('####################################')
|
||||||
|
print('Hello World (without Azure ML SDK)!')
|
||||||
|
print('####################################')
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user