Compare commits
1 Commits
update-spa
...
vizhur/aut
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e792ba8278 |
188
NBSETUP.md
@@ -1,95 +1,95 @@
|
|||||||
# Set up your notebook environment for Azure Machine Learning
|
# Set up your notebook environment for Azure Machine Learning
|
||||||
|
|
||||||
To run the notebooks in this repository use one of following options.
|
To run the notebooks in this repository use one of following options.
|
||||||
|
|
||||||
## **Option 1: Use Azure Notebooks**
|
## **Option 1: Use Azure Notebooks**
|
||||||
Azure Notebooks is a hosted Jupyter-based notebook service in the Azure cloud. Azure Machine Learning Python SDK is already pre-installed in the Azure Notebooks `Python 3.6` kernel.
|
Azure Notebooks is a hosted Jupyter-based notebook service in the Azure cloud. Azure Machine Learning Python SDK is already pre-installed in the Azure Notebooks `Python 3.6` kernel.
|
||||||
|
|
||||||
1. [](https://aka.ms/aml-clone-azure-notebooks)
|
1. [](https://aka.ms/aml-clone-azure-notebooks)
|
||||||
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks
|
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks
|
||||||
1. Follow the instructions in the [Configuration](configuration.ipynb) notebook to create and connect to a workspace
|
1. Follow the instructions in the [Configuration](configuration.ipynb) notebook to create and connect to a workspace
|
||||||
1. Open one of the sample notebooks
|
1. Open one of the sample notebooks
|
||||||
|
|
||||||
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook by choosing Kernel > Change Kernel > Python 3.6 from the menus.
|
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook by choosing Kernel > Change Kernel > Python 3.6 from the menus.
|
||||||
|
|
||||||
## **Option 2: Use your own notebook server**
|
## **Option 2: Use your own notebook server**
|
||||||
|
|
||||||
### Quick installation
|
### Quick installation
|
||||||
We recommend you create a Python virtual environment ([Miniconda](https://conda.io/miniconda.html) preferred but [virtualenv](https://virtualenv.pypa.io/en/latest/) works too) and install the SDK in it.
|
We recommend you create a Python virtual environment ([Miniconda](https://conda.io/miniconda.html) preferred but [virtualenv](https://virtualenv.pypa.io/en/latest/) works too) and install the SDK in it.
|
||||||
```sh
|
```sh
|
||||||
# install just the base SDK
|
# install just the base SDK
|
||||||
pip install azureml-sdk
|
pip install azureml-sdk
|
||||||
|
|
||||||
# clone the sample repoistory
|
# clone the sample repoistory
|
||||||
git clone https://github.com/Azure/MachineLearningNotebooks.git
|
git clone https://github.com/Azure/MachineLearningNotebooks.git
|
||||||
|
|
||||||
# below steps are optional
|
# below steps are optional
|
||||||
# install the base SDK, Jupyter notebook server and tensorboard
|
# install the base SDK, Jupyter notebook server and tensorboard
|
||||||
pip install azureml-sdk[notebooks,tensorboard]
|
pip install azureml-sdk[notebooks,tensorboard]
|
||||||
|
|
||||||
# install model explainability component
|
# install model explainability component
|
||||||
pip install azureml-sdk[explain]
|
pip install azureml-sdk[explain]
|
||||||
|
|
||||||
# install automated ml components
|
# install automated ml components
|
||||||
pip install azureml-sdk[automl]
|
pip install azureml-sdk[automl]
|
||||||
|
|
||||||
# install experimental features (not ready for production use)
|
# install experimental features (not ready for production use)
|
||||||
pip install azureml-sdk[contrib]
|
pip install azureml-sdk[contrib]
|
||||||
```
|
```
|
||||||
|
|
||||||
Note the _extras_ (the keywords inside the square brackets) can be combined. For example:
|
Note the _extras_ (the keywords inside the square brackets) can be combined. For example:
|
||||||
```sh
|
```sh
|
||||||
# install base SDK, Jupyter notebook and automated ml components
|
# install base SDK, Jupyter notebook and automated ml components
|
||||||
pip install azureml-sdk[notebooks,automl]
|
pip install azureml-sdk[notebooks,automl]
|
||||||
```
|
```
|
||||||
|
|
||||||
### Full instructions
|
### Full instructions
|
||||||
[Install the Azure Machine Learning SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
|
[Install the Azure Machine Learning SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
|
||||||
|
|
||||||
Please make sure you start with the [Configuration](configuration.ipynb) notebook to create and connect to a workspace.
|
Please make sure you start with the [Configuration](configuration.ipynb) notebook to create and connect to a workspace.
|
||||||
|
|
||||||
|
|
||||||
### Video walkthrough:
|
### Video walkthrough:
|
||||||
|
|
||||||
[!VIDEO https://youtu.be/VIsXeTuW3FU]
|
[!VIDEO https://youtu.be/VIsXeTuW3FU]
|
||||||
|
|
||||||
## **Option 3: Use Docker**
|
## **Option 3: Use Docker**
|
||||||
|
|
||||||
You need to have Docker engine installed locally and running. Open a command line window and type the following command.
|
You need to have Docker engine installed locally and running. Open a command line window and type the following command.
|
||||||
|
|
||||||
__Note:__ We use version `1.0.10` below as an exmaple, but you can replace that with any available version number you like.
|
__Note:__ We use version `1.0.10` below as an exmaple, but you can replace that with any available version number you like.
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
# clone the sample repoistory
|
# clone the sample repoistory
|
||||||
git clone https://github.com/Azure/MachineLearningNotebooks.git
|
git clone https://github.com/Azure/MachineLearningNotebooks.git
|
||||||
|
|
||||||
# change current directory to the folder
|
# change current directory to the folder
|
||||||
# where Dockerfile of the specific SDK version is located.
|
# where Dockerfile of the specific SDK version is located.
|
||||||
cd MachineLearningNotebooks/Dockerfiles/1.0.10
|
cd MachineLearningNotebooks/Dockerfiles/1.0.10
|
||||||
|
|
||||||
# build a Docker image with the a name (azuremlsdk for example)
|
# build a Docker image with the a name (azuremlsdk for example)
|
||||||
# and a version number tag (1.0.10 for example).
|
# and a version number tag (1.0.10 for example).
|
||||||
# this can take several minutes depending on your computer speed and network bandwidth.
|
# this can take several minutes depending on your computer speed and network bandwidth.
|
||||||
docker build . -t azuremlsdk:1.0.10
|
docker build . -t azuremlsdk:1.0.10
|
||||||
|
|
||||||
# launch the built Docker container which also automatically starts
|
# launch the built Docker container which also automatically starts
|
||||||
# a Jupyter server instance listening on port 8887 of the host machine
|
# a Jupyter server instance listening on port 8887 of the host machine
|
||||||
docker run -it -p 8887:8887 azuremlsdk:1.0.10
|
docker run -it -p 8887:8887 azuremlsdk:1.0.10
|
||||||
```
|
```
|
||||||
|
|
||||||
Now you can point your browser to http://localhost:8887. We recommend that you start from the `configuration.ipynb` notebook at the root directory.
|
Now you can point your browser to http://localhost:8887. We recommend that you start from the `configuration.ipynb` notebook at the root directory.
|
||||||
|
|
||||||
If you need additional Azure ML SDK components, you can either modify the Docker files before you build the Docker images to add additional steps, or install them through command line in the live container after you build the Docker image. For example:
|
If you need additional Azure ML SDK components, you can either modify the Docker files before you build the Docker images to add additional steps, or install them through command line in the live container after you build the Docker image. For example:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
# install the core SDK and automated ml components
|
# install the core SDK and automated ml components
|
||||||
pip install azureml-sdk[automl]
|
pip install azureml-sdk[automl]
|
||||||
|
|
||||||
# install the core SDK and model explainability component
|
# install the core SDK and model explainability component
|
||||||
pip install azureml-sdk[explain]
|
pip install azureml-sdk[explain]
|
||||||
|
|
||||||
# install the core SDK and experimental components
|
# install the core SDK and experimental components
|
||||||
pip install azureml-sdk[contrib]
|
pip install azureml-sdk[contrib]
|
||||||
```
|
```
|
||||||
Drag and Drop
|
Drag and Drop
|
||||||
The image will be downloaded by Fatkun
|
The image will be downloaded by Fatkun
|
||||||
136
README.md
@@ -1,68 +1,68 @@
|
|||||||
# Azure Machine Learning service example notebooks
|
# Azure Machine Learning service example notebooks
|
||||||
|
|
||||||
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
|
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
## Quick installation
|
## Quick installation
|
||||||
```sh
|
```sh
|
||||||
pip install azureml-sdk
|
pip install azureml-sdk
|
||||||
```
|
```
|
||||||
Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
|
Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
|
||||||
|
|
||||||
## How to navigate and use the example notebooks?
|
## How to navigate and use the example notebooks?
|
||||||
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
|
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
|
||||||
|
|
||||||
If you want to...
|
If you want to...
|
||||||
|
|
||||||
* ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/img-classification-part2-deploy.ipynb).
|
* ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/img-classification-part2-deploy.ipynb).
|
||||||
* ...prepare your data and do automated machine learning, start with regression tutorials: [Part 1 (Data Prep)](./tutorials/regression-part1-data-prep.ipynb) and [Part 2 (Automated ML)](./tutorials/regression-part2-automated-ml.ipynb).
|
* ...prepare your data and do automated machine learning, start with regression tutorials: [Part 1 (Data Prep)](./tutorials/regression-part1-data-prep.ipynb) and [Part 2 (Automated ML)](./tutorials/regression-part2-automated-ml.ipynb).
|
||||||
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
|
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
|
||||||
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
|
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
|
||||||
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
|
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
|
||||||
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
|
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
|
||||||
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
|
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
|
||||||
|
|
||||||
## Tutorials
|
## Tutorials
|
||||||
|
|
||||||
The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs).
|
The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs).
|
||||||
|
|
||||||
## How to use Azure ML
|
## How to use Azure ML
|
||||||
|
|
||||||
The [How to use Azure ML](./how-to-use-azureml) folder contains specific examples demonstrating the features of the Azure Machine Learning SDK
|
The [How to use Azure ML](./how-to-use-azureml) folder contains specific examples demonstrating the features of the Azure Machine Learning SDK
|
||||||
|
|
||||||
- [Training](./how-to-use-azureml/training) - Examples of how to build models using Azure ML's logging and execution capabilities on local and remote compute targets
|
- [Training](./how-to-use-azureml/training) - Examples of how to build models using Azure ML's logging and execution capabilities on local and remote compute targets
|
||||||
- [Training with Deep Learning](./how-to-use-azureml/training-with-deep-learning) - Examples demonstrating how to build deep learning models using estimators and parameter sweeps
|
- [Training with Deep Learning](./how-to-use-azureml/training-with-deep-learning) - Examples demonstrating how to build deep learning models using estimators and parameter sweeps
|
||||||
- [Manage Azure ML Service](./how-to-use-azureml/manage-azureml-service) - Examples how to perform tasks, such as authenticate against Azure ML service in different ways.
|
- [Manage Azure ML Service](./how-to-use-azureml/manage-azureml-service) - Examples how to perform tasks, such as authenticate against Azure ML service in different ways.
|
||||||
- [Automated Machine Learning](./how-to-use-azureml/automated-machine-learning) - Examples using Automated Machine Learning to automatically generate optimal machine learning pipelines and models
|
- [Automated Machine Learning](./how-to-use-azureml/automated-machine-learning) - Examples using Automated Machine Learning to automatically generate optimal machine learning pipelines and models
|
||||||
- [Machine Learning Pipelines](./how-to-use-azureml/machine-learning-pipelines) - Examples showing how to create and use reusable pipelines for training and batch scoring
|
- [Machine Learning Pipelines](./how-to-use-azureml/machine-learning-pipelines) - Examples showing how to create and use reusable pipelines for training and batch scoring
|
||||||
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
|
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
|
||||||
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
|
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
|
||||||
|
|
||||||
---
|
---
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
* Quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
|
* Quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
|
||||||
* [Python SDK reference](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py)
|
* [Python SDK reference](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py)
|
||||||
* Azure ML Data Prep SDK [overview](https://aka.ms/data-prep-sdk), [Python SDK reference](https://aka.ms/aml-data-prep-apiref), and [tutorials and how-tos](https://aka.ms/aml-data-prep-notebooks).
|
* Azure ML Data Prep SDK [overview](https://aka.ms/data-prep-sdk), [Python SDK reference](https://aka.ms/aml-data-prep-apiref), and [tutorials and how-tos](https://aka.ms/aml-data-prep-notebooks).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Projects using Azure Machine Learning
|
## Projects using Azure Machine Learning
|
||||||
|
|
||||||
Visit following repos to see projects contributed by Azure ML users:
|
Visit following repos to see projects contributed by Azure ML users:
|
||||||
|
|
||||||
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
||||||
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
||||||
|
|
||||||
## Data/Telemetry
|
## Data/Telemetry
|
||||||
This repository collects usage data and sends it to Mircosoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement)
|
This repository collects usage data and sends it to Mircosoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement)
|
||||||
|
|
||||||
To opt out of tracking, please go to the raw markdown or .ipynb files and remove the following line of code:
|
To opt out of tracking, please go to the raw markdown or .ipynb files and remove the following line of code:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
""
|
""
|
||||||
```
|
```
|
||||||
This URL will be slightly different depending on the file.
|
This URL will be slightly different depending on the file.
|
||||||
|
|
||||||

|

|
||||||
|
|||||||
@@ -1,383 +1,383 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
},
|
||||||
"\n",
|
"authors": [
|
||||||
"Licensed under the MIT License."
|
{
|
||||||
]
|
"name": "roastala"
|
||||||
},
|
}
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"language_info": {
|
||||||
"metadata": {},
|
"mimetype": "text/x-python",
|
||||||
"source": [
|
"codemirror_mode": {
|
||||||
""
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Configuration\n",
|
"version": "3.6.5"
|
||||||
"\n",
|
}
|
||||||
"_**Setting up your Azure Machine Learning services workspace and configuring your notebook library**_\n",
|
},
|
||||||
"\n",
|
"nbformat": 4,
|
||||||
"---\n",
|
"cells": [
|
||||||
"---\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"## Table of Contents\n",
|
"source": [
|
||||||
"\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"\n",
|
||||||
" 1. What is an Azure Machine Learning workspace\n",
|
"Licensed under the MIT License."
|
||||||
"1. [Setup](#Setup)\n",
|
],
|
||||||
" 1. Azure subscription\n",
|
"cell_type": "markdown"
|
||||||
" 1. Azure ML SDK and other library installation\n",
|
},
|
||||||
" 1. Azure Container Instance registration\n",
|
{
|
||||||
"1. [Configure your Azure ML Workspace](#Configure%20your%20Azure%20ML%20workspace)\n",
|
"metadata": {},
|
||||||
" 1. Workspace parameters\n",
|
"source": [
|
||||||
" 1. Access your workspace\n",
|
""
|
||||||
" 1. Create a new workspace\n",
|
],
|
||||||
" 1. Create compute resources\n",
|
"cell_type": "markdown"
|
||||||
"1. [Next steps](#Next%20steps)\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"---\n",
|
"metadata": {},
|
||||||
"\n",
|
"source": [
|
||||||
"## Introduction\n",
|
"# Configuration\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook configures your library of notebooks to connect to an Azure Machine Learning (ML) workspace. In this case, a library contains all of the notebooks in the current folder and any nested folders. You can configure this notebook library to use an existing workspace or create a new workspace.\n",
|
"_**Setting up your Azure Machine Learning services workspace and configuring your notebook library**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Typically you will need to run this notebook only once per notebook library as all other notebooks will use connection information that is written here. If you want to redirect your notebook library to work with a different workspace, then you should re-run this notebook.\n",
|
"---\n",
|
||||||
"\n",
|
"---\n",
|
||||||
"In this notebook you will\n",
|
"\n",
|
||||||
"* Learn about getting an Azure subscription\n",
|
"## Table of Contents\n",
|
||||||
"* Specify your workspace parameters\n",
|
"\n",
|
||||||
"* Access or create your workspace\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"* Add a default compute cluster for your workspace\n",
|
" 1. What is an Azure Machine Learning workspace\n",
|
||||||
"\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"### What is an Azure Machine Learning workspace\n",
|
" 1. Azure subscription\n",
|
||||||
"\n",
|
" 1. Azure ML SDK and other library installation\n",
|
||||||
"An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models."
|
" 1. Azure Container Instance registration\n",
|
||||||
]
|
"1. [Configure your Azure ML Workspace](#Configure%20your%20Azure%20ML%20workspace)\n",
|
||||||
},
|
" 1. Workspace parameters\n",
|
||||||
{
|
" 1. Access your workspace\n",
|
||||||
"cell_type": "markdown",
|
" 1. Create a new workspace\n",
|
||||||
"metadata": {},
|
" 1. Create compute resources\n",
|
||||||
"source": [
|
"1. [Next steps](#Next%20steps)\n",
|
||||||
"## Setup\n",
|
"\n",
|
||||||
"\n",
|
"---\n",
|
||||||
"This section describes activities required before you can access any Azure ML services functionality."
|
"\n",
|
||||||
]
|
"## Introduction\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"This notebook configures your library of notebooks to connect to an Azure Machine Learning (ML) workspace. In this case, a library contains all of the notebooks in the current folder and any nested folders. You can configure this notebook library to use an existing workspace or create a new workspace.\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"Typically you will need to run this notebook only once per notebook library as all other notebooks will use connection information that is written here. If you want to redirect your notebook library to work with a different workspace, then you should re-run this notebook.\n",
|
||||||
"source": [
|
"\n",
|
||||||
"### 1. Azure Subscription\n",
|
"In this notebook you will\n",
|
||||||
"\n",
|
"* Learn about getting an Azure subscription\n",
|
||||||
"In order to create an Azure ML Workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com). Later in this notebook you will need information such as your subscription ID in order to create and access AML workspaces.\n",
|
"* Specify your workspace parameters\n",
|
||||||
"\n",
|
"* Access or create your workspace\n",
|
||||||
"### 2. Azure ML SDK and other library installation\n",
|
"* Add a default compute cluster for your workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you are running in your own environment, follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment). If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed.\n",
|
"### What is an Azure Machine Learning workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Also install following libraries to your environment. Many of the example notebooks depend on them\n",
|
"An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models."
|
||||||
"\n",
|
],
|
||||||
"```\n",
|
"cell_type": "markdown"
|
||||||
"(myenv) $ conda install -y matplotlib tqdm scikit-learn\n",
|
},
|
||||||
"```\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"Once installation is complete, the following cell checks the Azure ML SDK version:"
|
"source": [
|
||||||
]
|
"## Setup\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"This section describes activities required before you can access any Azure ML services functionality."
|
||||||
"cell_type": "code",
|
],
|
||||||
"execution_count": null,
|
"cell_type": "markdown"
|
||||||
"metadata": {
|
},
|
||||||
"tags": [
|
{
|
||||||
"install"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"### 1. Azure Subscription\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"In order to create an Azure ML Workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com). Later in this notebook you will need information such as your subscription ID in order to create and access AML workspaces.\n",
|
||||||
"import azureml.core\n",
|
"\n",
|
||||||
"\n",
|
"### 2. Azure ML SDK and other library installation\n",
|
||||||
"print(\"This notebook was created using version 1.0.48 of the Azure ML SDK\")\n",
|
"\n",
|
||||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
"If you are running in your own environment, follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment). If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed.\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"Also install following libraries to your environment. Many of the example notebooks depend on them\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"```\n",
|
||||||
"metadata": {},
|
"(myenv) $ conda install -y matplotlib tqdm scikit-learn\n",
|
||||||
"source": [
|
"```\n",
|
||||||
"If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n",
|
"\n",
|
||||||
"\n",
|
"Once installation is complete, the following cell checks the Azure ML SDK version:"
|
||||||
"### 3. Azure Container Instance registration\n",
|
],
|
||||||
"Azure Machine Learning uses of [Azure Container Instance (ACI)](https://azure.microsoft.com/services/container-instances) to deploy dev/test web services. An Azure subscription needs to be registered to use ACI. If you or the subscription owner have not yet registered ACI on your subscription, you will need to use the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and execute the following commands. Note that if you ran through the AML [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) you have already registered ACI. \n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"```shell\n",
|
{
|
||||||
"# check to see if ACI is already registered\n",
|
"metadata": {
|
||||||
"(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n",
|
"tags": [
|
||||||
"\n",
|
"install"
|
||||||
"# if ACI is not registered, run this command.\n",
|
]
|
||||||
"# note you need to be the subscription owner in order to execute this command successfully.\n",
|
},
|
||||||
"(myenv) $ az provider register -n Microsoft.ContainerInstance\n",
|
"outputs": [],
|
||||||
"```\n",
|
"execution_count": null,
|
||||||
"\n",
|
"source": [
|
||||||
"---"
|
"import azureml.core\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"print(\"This notebook was created using version 1.0.48.post1 of the Azure ML SDK\")\n",
|
||||||
{
|
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "code"
|
||||||
"source": [
|
},
|
||||||
"## Configure your Azure ML workspace\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"### Workspace parameters\n",
|
"source": [
|
||||||
"\n",
|
"If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n",
|
||||||
"To use an AML Workspace, you will need to import the Azure ML SDK and supply the following information:\n",
|
"\n",
|
||||||
"* Your subscription id\n",
|
"### 3. Azure Container Instance registration\n",
|
||||||
"* A resource group name\n",
|
"Azure Machine Learning uses of [Azure Container Instance (ACI)](https://azure.microsoft.com/services/container-instances) to deploy dev/test web services. An Azure subscription needs to be registered to use ACI. If you or the subscription owner have not yet registered ACI on your subscription, you will need to use the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and execute the following commands. Note that if you ran through the AML [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) you have already registered ACI. \n",
|
||||||
"* (optional) The region that will host your workspace\n",
|
"\n",
|
||||||
"* A name for your workspace\n",
|
"```shell\n",
|
||||||
"\n",
|
"# check to see if ACI is already registered\n",
|
||||||
"You can get your subscription ID from the [Azure portal](https://portal.azure.com).\n",
|
"(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You will also need access to a [_resource group_](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n",
|
"# if ACI is not registered, run this command.\n",
|
||||||
"\n",
|
"# note you need to be the subscription owner in order to execute this command successfully.\n",
|
||||||
"The region to host your workspace will be used if you are creating a new workspace. You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service). You should pick a region that is close to your location or that contains your data.\n",
|
"(myenv) $ az provider register -n Microsoft.ContainerInstance\n",
|
||||||
"\n",
|
"```\n",
|
||||||
"The name for your workspace is unique within the subscription and should be descriptive enough to discern among other AML Workspaces. The subscription may be used only by you, or it may be used by your department or your entire enterprise, so choose a name that makes sense for your situation.\n",
|
"\n",
|
||||||
"\n",
|
"---"
|
||||||
"The following cell allows you to specify your workspace parameters. This cell uses the python method `os.getenv` to read values from environment variables which is useful for automation. If no environment variable exists, the parameters will be set to the specified default values. \n",
|
],
|
||||||
"\n",
|
"cell_type": "markdown"
|
||||||
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"Replace the default values in the cell below with your workspace parameters"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Configure your Azure ML workspace\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"### Workspace parameters\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"To use an AML Workspace, you will need to import the Azure ML SDK and supply the following information:\n",
|
||||||
"outputs": [],
|
"* Your subscription id\n",
|
||||||
"source": [
|
"* A resource group name\n",
|
||||||
"import os\n",
|
"* (optional) The region that will host your workspace\n",
|
||||||
"\n",
|
"* A name for your workspace\n",
|
||||||
"subscription_id = os.getenv(\"SUBSCRIPTION_ID\", default=\"<my-subscription-id>\")\n",
|
"\n",
|
||||||
"resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"<my-resource-group>\")\n",
|
"You can get your subscription ID from the [Azure portal](https://portal.azure.com).\n",
|
||||||
"workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"<my-workspace-name>\")\n",
|
"\n",
|
||||||
"workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")"
|
"You will also need access to a [_resource group_](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"The region to host your workspace will be used if you are creating a new workspace. You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service). You should pick a region that is close to your location or that contains your data.\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"The name for your workspace is unique within the subscription and should be descriptive enough to discern among other AML Workspaces. The subscription may be used only by you, or it may be used by your department or your entire enterprise, so choose a name that makes sense for your situation.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"The following cell allows you to specify your workspace parameters. This cell uses the python method `os.getenv` to read values from environment variables which is useful for automation. If no environment variable exists, the parameters will be set to the specified default values. \n",
|
||||||
"### Access your workspace\n",
|
"\n",
|
||||||
"\n",
|
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
|
||||||
"The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. "
|
"\n",
|
||||||
]
|
"Replace the default values in the cell below with your workspace parameters"
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "markdown"
|
||||||
"cell_type": "code",
|
},
|
||||||
"execution_count": null,
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"from azureml.core import Workspace\n",
|
"source": [
|
||||||
"\n",
|
"import os\n",
|
||||||
"try:\n",
|
"\n",
|
||||||
" ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)\n",
|
"subscription_id = os.getenv(\"SUBSCRIPTION_ID\", default=\"<my-subscription-id>\")\n",
|
||||||
" # write the details of the workspace to a configuration file to the notebook library\n",
|
"resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"<my-resource-group>\")\n",
|
||||||
" ws.write_config()\n",
|
"workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"<my-workspace-name>\")\n",
|
||||||
" print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n",
|
"workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")"
|
||||||
"except:\n",
|
],
|
||||||
" print(\"Workspace not accessible. Change your parameters or create a new workspace below\")"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Access your workspace\n",
|
||||||
"source": [
|
"\n",
|
||||||
"### Create a new workspace\n",
|
"The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. "
|
||||||
"\n",
|
],
|
||||||
"If you don't have an existing workspace and are the owner of the subscription or resource group, you can create a new workspace. If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"**Note**: As with other Azure services, there are limits on certain resources (for example AmlCompute quota) associated with the Azure ML service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"This cell will create an Azure ML workspace for you in a subscription provided you have the correct permissions.\n",
|
"outputs": [],
|
||||||
"\n",
|
"execution_count": null,
|
||||||
"This will fail if:\n",
|
"source": [
|
||||||
"* You do not have permission to create a workspace in the resource group\n",
|
"from azureml.core import Workspace\n",
|
||||||
"* You do not have permission to create a resource group if it's non-existing.\n",
|
"\n",
|
||||||
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
"try:\n",
|
||||||
"\n",
|
" ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)\n",
|
||||||
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
" # write the details of the workspace to a configuration file to the notebook library\n",
|
||||||
]
|
" ws.write_config()\n",
|
||||||
},
|
" print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n",
|
||||||
{
|
"except:\n",
|
||||||
"cell_type": "code",
|
" print(\"Workspace not accessible. Change your parameters or create a new workspace below\")"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {
|
"cell_type": "code"
|
||||||
"tags": [
|
},
|
||||||
"create workspace"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
"outputs": [],
|
"### Create a new workspace\n",
|
||||||
"source": [
|
"\n",
|
||||||
"from azureml.core import Workspace\n",
|
"If you don't have an existing workspace and are the owner of the subscription or resource group, you can create a new workspace. If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Create the workspace using the specified parameters\n",
|
"**Note**: As with other Azure services, there are limits on certain resources (for example AmlCompute quota) associated with the Azure ML service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n",
|
||||||
"ws = Workspace.create(name = workspace_name,\n",
|
"\n",
|
||||||
" subscription_id = subscription_id,\n",
|
"This cell will create an Azure ML workspace for you in a subscription provided you have the correct permissions.\n",
|
||||||
" resource_group = resource_group, \n",
|
"\n",
|
||||||
" location = workspace_region,\n",
|
"This will fail if:\n",
|
||||||
" create_resource_group = True,\n",
|
"* You do not have permission to create a workspace in the resource group\n",
|
||||||
" exist_ok = True)\n",
|
"* You do not have permission to create a resource group if it's non-existing.\n",
|
||||||
"ws.get_details()\n",
|
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# write the details of the workspace to a configuration file to the notebook library\n",
|
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
||||||
"ws.write_config()"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {
|
||||||
"metadata": {},
|
"tags": [
|
||||||
"source": [
|
"create workspace"
|
||||||
"### Create compute resources for your training experiments\n",
|
]
|
||||||
"\n",
|
},
|
||||||
"Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
|
"outputs": [],
|
||||||
"\n",
|
"execution_count": null,
|
||||||
"To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
|
"source": [
|
||||||
"\n",
|
"from azureml.core import Workspace\n",
|
||||||
"The cluster parameters are:\n",
|
"\n",
|
||||||
"* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n",
|
"# Create the workspace using the specified parameters\n",
|
||||||
"\n",
|
"ws = Workspace.create(name = workspace_name,\n",
|
||||||
"```shell\n",
|
" subscription_id = subscription_id,\n",
|
||||||
"az vm list-skus -o tsv\n",
|
" resource_group = resource_group, \n",
|
||||||
"```\n",
|
" location = workspace_region,\n",
|
||||||
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while not in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
|
" create_resource_group = True,\n",
|
||||||
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
|
" exist_ok = True)\n",
|
||||||
"\n",
|
"ws.get_details()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
|
"# write the details of the workspace to a configuration file to the notebook library\n",
|
||||||
]
|
"ws.write_config()"
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "code"
|
||||||
"cell_type": "code",
|
},
|
||||||
"execution_count": null,
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"### Create compute resources for your training experiments\n",
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for your CPU cluster\n",
|
"To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
|
||||||
"cpu_cluster_name = \"cpu-cluster\"\n",
|
"\n",
|
||||||
"\n",
|
"The cluster parameters are:\n",
|
||||||
"# Verify that cluster does not exist already\n",
|
"* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n",
|
||||||
"try:\n",
|
"\n",
|
||||||
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
|
"```shell\n",
|
||||||
" print(\"Found existing cpu-cluster\")\n",
|
"az vm list-skus -o tsv\n",
|
||||||
"except ComputeTargetException:\n",
|
"```\n",
|
||||||
" print(\"Creating new cpu-cluster\")\n",
|
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while not in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
|
||||||
" \n",
|
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
|
||||||
" # Specify the configuration for the new cluster\n",
|
"\n",
|
||||||
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
|
"\n",
|
||||||
" min_nodes=0,\n",
|
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
|
||||||
" max_nodes=4)\n",
|
],
|
||||||
"\n",
|
"cell_type": "markdown"
|
||||||
" # Create the cluster with the specified name and configuration\n",
|
},
|
||||||
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
|
{
|
||||||
" \n",
|
"metadata": {},
|
||||||
" # Wait for the cluster to complete, show the output log\n",
|
"outputs": [],
|
||||||
" cpu_cluster.wait_for_completion(show_output=True)"
|
"execution_count": null,
|
||||||
]
|
"source": [
|
||||||
},
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
{
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"# Choose a name for your CPU cluster\n",
|
||||||
"source": [
|
"cpu_cluster_name = \"cpu-cluster\"\n",
|
||||||
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
|
"\n",
|
||||||
]
|
"# Verify that cluster does not exist already\n",
|
||||||
},
|
"try:\n",
|
||||||
{
|
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
|
||||||
"cell_type": "code",
|
" print(\"Found existing cpu-cluster\")\n",
|
||||||
"execution_count": null,
|
"except ComputeTargetException:\n",
|
||||||
"metadata": {},
|
" print(\"Creating new cpu-cluster\")\n",
|
||||||
"outputs": [],
|
" \n",
|
||||||
"source": [
|
" # Specify the configuration for the new cluster\n",
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
" min_nodes=0,\n",
|
||||||
"\n",
|
" max_nodes=4)\n",
|
||||||
"# Choose a name for your GPU cluster\n",
|
"\n",
|
||||||
"gpu_cluster_name = \"gpu-cluster\"\n",
|
" # Create the cluster with the specified name and configuration\n",
|
||||||
"\n",
|
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
|
||||||
"# Verify that cluster does not exist already\n",
|
" \n",
|
||||||
"try:\n",
|
" # Wait for the cluster to complete, show the output log\n",
|
||||||
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
|
" cpu_cluster.wait_for_completion(show_output=True)"
|
||||||
" print(\"Found existing gpu cluster\")\n",
|
],
|
||||||
"except ComputeTargetException:\n",
|
"cell_type": "code"
|
||||||
" print(\"Creating new gpu-cluster\")\n",
|
},
|
||||||
" \n",
|
{
|
||||||
" # Specify the configuration for the new cluster\n",
|
"metadata": {},
|
||||||
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
|
"source": [
|
||||||
" min_nodes=0,\n",
|
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
|
||||||
" max_nodes=4)\n",
|
],
|
||||||
" # Create the cluster with the specified name and configuration\n",
|
"cell_type": "markdown"
|
||||||
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
|
},
|
||||||
"\n",
|
{
|
||||||
" # Wait for the cluster to complete, show the output log\n",
|
"metadata": {},
|
||||||
" gpu_cluster.wait_for_completion(show_output=True)"
|
"outputs": [],
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"source": [
|
||||||
{
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"cell_type": "markdown",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"# Choose a name for your GPU cluster\n",
|
||||||
"---\n",
|
"gpu_cluster_name = \"gpu-cluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Next steps\n",
|
"# Verify that cluster does not exist already\n",
|
||||||
"\n",
|
"try:\n",
|
||||||
"In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n",
|
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
|
||||||
"\n",
|
" print(\"Found existing gpu cluster\")\n",
|
||||||
"If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process."
|
"except ComputeTargetException:\n",
|
||||||
]
|
" print(\"Creating new gpu-cluster\")\n",
|
||||||
},
|
" \n",
|
||||||
{
|
" # Specify the configuration for the new cluster\n",
|
||||||
"cell_type": "code",
|
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
|
||||||
"execution_count": null,
|
" min_nodes=0,\n",
|
||||||
"metadata": {},
|
" max_nodes=4)\n",
|
||||||
"outputs": [],
|
" # Create the cluster with the specified name and configuration\n",
|
||||||
"source": []
|
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
|
||||||
}
|
"\n",
|
||||||
],
|
" # Wait for the cluster to complete, show the output log\n",
|
||||||
"metadata": {
|
" gpu_cluster.wait_for_completion(show_output=True)"
|
||||||
"authors": [
|
],
|
||||||
{
|
"cell_type": "code"
|
||||||
"name": "roastala"
|
},
|
||||||
}
|
{
|
||||||
],
|
"metadata": {},
|
||||||
"kernelspec": {
|
"source": [
|
||||||
"display_name": "Python 3.6",
|
"---\n",
|
||||||
"language": "python",
|
"\n",
|
||||||
"name": "python36"
|
"## Next steps\n",
|
||||||
},
|
"\n",
|
||||||
"language_info": {
|
"In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n",
|
||||||
"codemirror_mode": {
|
"\n",
|
||||||
"name": "ipython",
|
"If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process."
|
||||||
"version": 3
|
],
|
||||||
},
|
"cell_type": "markdown"
|
||||||
"file_extension": ".py",
|
},
|
||||||
"mimetype": "text/x-python",
|
{
|
||||||
"name": "python",
|
"metadata": {},
|
||||||
"nbconvert_exporter": "python",
|
"outputs": [],
|
||||||
"pygments_lexer": "ipython3",
|
"execution_count": null,
|
||||||
"version": "3.6.5"
|
"source": [],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
name: configuration
|
name: configuration
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
|
|||||||
@@ -1,307 +0,0 @@
|
|||||||
## How to use the RAPIDS on AzureML materials
|
|
||||||
### Setting up requirements
|
|
||||||
The material requires the use of the Azure ML SDK and of the Jupyter Notebook Server to run the interactive execution. Please refer to instructions to [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") Follow the instructions under **Local Computer**, make sure to run the last step: <span style="font-family: Courier New;">pip install \<new package\></span> with <span style="font-family: Courier New;">new package = progressbar2 (pip install progressbar2)</span>
|
|
||||||
|
|
||||||
After following the directions, the user should end up setting a conda environment (<span style="font-family: Courier New;">myenv</span>)that can be activated in an Anaconda prompt
|
|
||||||
|
|
||||||
The user would also require an Azure Subscription with a Machine Learning Services quota on the desired region for 24 nodes or more (to be able to select a vmSize with 4 GPUs as it is used on the Notebook) on the desired VM family ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)), the specific vmSize to be used within the chosen family would also need to be whitelisted for Machine Learning Services usage.
|
|
||||||
|
|
||||||
|
|
||||||
### Getting and running the material
|
|
||||||
Clone the AzureML Notebooks repository in GitHub by running the following command on a local_directory:
|
|
||||||
|
|
||||||
* C:\local_directory>git clone https://github.com/Azure/MachineLearningNotebooks.git
|
|
||||||
|
|
||||||
On a conda prompt navigate to the local directory, activate the conda environment (<span style="font-family: Courier New;">myenv</span>), where the Azure ML SDK was installed and launch Jupyter Notebook.
|
|
||||||
|
|
||||||
* (<span style="font-family: Courier New;">myenv</span>) C:\local_directory>jupyter notebook
|
|
||||||
|
|
||||||
From the resulting browser at http://localhost:8888/tree, navigate to the master notebook:
|
|
||||||
|
|
||||||
* http://localhost:8888/tree/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
|
|
||||||
|
|
||||||
|
|
||||||
The following notebook will appear:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
### Master Jupyter Notebook
|
|
||||||
The notebook can be executed interactively step by step, by pressing the Run button (In a red circle in the above image.)
|
|
||||||
|
|
||||||
The first couple of functional steps import the necessary AzureML libraries. If you experience any errors please refer back to the [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") instructions.
|
|
||||||
|
|
||||||
|
|
||||||
#### Setting up a Workspace
|
|
||||||
The following step gathers the information necessary to set up a workspace to execute the RAPIDS script. This needs to be done only once, or not at all if you already have a workspace you can use set up on the Azure Portal:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
It is important to be sure to set the correct values for the subscription\_id, resource\_group, workspace\_name, and region before executing the step. An example is:
|
|
||||||
|
|
||||||
subscription_id = os.environ.get("SUBSCRIPTION_ID", "1358e503-xxxx-4043-xxxx-65b83xxxx32d")
|
|
||||||
resource_group = os.environ.get("RESOURCE_GROUP", "AML-Rapids-Testing")
|
|
||||||
workspace_name = os.environ.get("WORKSPACE_NAME", "AML_Rapids_Tester")
|
|
||||||
workspace_region = os.environ.get("WORKSPACE_REGION", "West US 2")
|
|
||||||
|
|
||||||
|
|
||||||
The resource\_group and workspace_name could take any value, the region should match the region for which the subscription has the required Machine Learning Services node quota.
|
|
||||||
|
|
||||||
The first time the code is executed it will redirect to the Azure Portal to validate subscription credentials. After the workspace is created, its related information is stored on a local file so that this step can be subsequently skipped. The immediate step will just load the saved workspace
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Once a workspace has been created the user could skip its creation and just jump to this step. The configuration file resides in:
|
|
||||||
|
|
||||||
* C:\local_directory\\MachineLearningNotebooks\contrib\RAPIDS\aml_config\config.json
|
|
||||||
|
|
||||||
|
|
||||||
#### Creating an AML Compute Target
|
|
||||||
Following step, creates an AML Compute Target
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Parameter vm\_size on function call AmlCompute.provisioning\_configuration() has to be a member of the VM families ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)) that are the ones provided with P40 or V100 GPUs, that are the ones supported by RAPIDS. In this particular case an Standard\_NC24s\_V2 was used.
|
|
||||||
|
|
||||||
|
|
||||||
If the output of running the step has an error of the form:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
It is an indication that even though the subscription has a node quota for VMs for that family, it does not have a node quota for Machine Learning Services for that family.
|
|
||||||
You will need to request an increase node quota for that family in that region for **Machine Learning Services**.
|
|
||||||
|
|
||||||
|
|
||||||
Another possible error is the following:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Which indicates that specified vmSize has not been whitelisted for usage on Machine Learning Services and a request to do so should be filled.
|
|
||||||
|
|
||||||
The successful creation of the compute target would have an output like the following:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### RAPIDS script uploading and viewing
|
|
||||||
The next step copies the RAPIDS script process_data.py, which is a slightly modified implementation of the [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb), into a script processing folder and it presents its contents to the user. (The script is discussed in the next section in detail).
|
|
||||||
If the user wants to use a different RAPIDS script, the references to the <span style="font-family: Courier New;">process_data.py</span> script have to be changed
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### Data Uploading
|
|
||||||
The RAPIDS script loads and extracts features from the Fannie Mae’s Mortgage Dataset to train an XGBoost prediction model. The script uses two years of data
|
|
||||||
|
|
||||||
The next few steps download and decompress the data and is made available to the script as an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data).
|
|
||||||
|
|
||||||
|
|
||||||
The following functions are used to download and decompress the input data
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||

|
|
||||||

|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The next step uses those functions to download locally file:
|
|
||||||
http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/mortgage_2000-2001.tgz'
|
|
||||||
And to decompress it, into local folder path = .\mortgage_2000-2001
|
|
||||||
The step takes several minutes, the intermediate outputs provide progress indicators.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The decompressed data should have the following structure:
|
|
||||||
* .\mortgage_2000-2001\acq\Acquisition_<year>Q<num>.txt
|
|
||||||
* .\mortgage_2000-2001\perf\Performance_<year>Q<num>.txt
|
|
||||||
* .\mortgage_2000-2001\names.csv
|
|
||||||
|
|
||||||
The data is divided in partitions that roughly correspond to yearly quarters. RAPIDS includes support for multi-node, multi-GPU deployments, enabling scaling up and out on much larger dataset sizes. The user will be able to verify that the number of partitions that the script is able to process increases with the number of GPUs used. The RAPIDS script is implemented for single-machine scenarios. An example supporting multiple nodes will be published later.
|
|
||||||
|
|
||||||
|
|
||||||
The next step upload the data into the [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) under reference <span style="font-family: Courier New;">fileroot = mortgage_2000-2001</span>
|
|
||||||
|
|
||||||
The step takes several minutes to load the data, the output provides a progress indicator.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Once the data has been loaded into the Azure Machine LEarning Data Store, in subsequent run, the user can comment out the ds.upload line and just make reference to the <span style="font-family: Courier New;">mortgage_2000-2001</blog> data store reference
|
|
||||||
|
|
||||||
|
|
||||||
#### Setting up required libraries and environment to run RAPIDS code
|
|
||||||
There are two options to setup the environment to run RAPIDS code. The following steps shows how to ues a prebuilt conda environment. A recommended alternative is to specify a base Docker image and package dependencies. You can find sample code for that in the notebook.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Wrapper function to submit the RAPIDS script as an Azure Machine Learning experiment
|
|
||||||
|
|
||||||
The next step consists of the definition of a wrapper function to be used when the user attempts to run the RAPIDS script with different arguments. It takes as arguments: <span style="font-family: Times New Roman;">*cpu\_training*</span>; a flag that indicates if the run is meant to be processed with CPU-only, <span style="font-family: Times New Roman;">*gpu\_count*</span>; the number of GPUs to be used if they are meant to be used and part_count: the number of data partitions to be used
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The core of the function resides in configuring the run by the instantiation of a ScriptRunConfig object, which defines the source_directory for the script to be executed, the name of the script and the arguments to be passed to the script.
|
|
||||||
In addition to the wrapper function arguments, two other arguments are passed: <span style="font-family: Times New Roman;">*data\_dir*</span>, the directory where the data is stored and <span style="font-family: Times New Roman;">*end_year*</span> is the largest year to use partition from.
|
|
||||||
|
|
||||||
|
|
||||||
As mentioned earlier the size of the data that can be processed increases with the number of gpus, in the function, dictionary <span style="font-family: Times New Roman;">*max\_gpu\_count\_data\_partition_mapping*</span> maps the maximum number of partitions that we empirically found that the system can handle given the number of GPUs used. The function throws a warning when the number of partitions for a given number of gpus exceeds the maximum but the script is still executed, however the user should expect an error as an out of memory situation would be encountered
|
|
||||||
If the user wants to use a different RAPIDS script, the reference to the process_data.py script has to be changed
|
|
||||||
|
|
||||||
|
|
||||||
#### Submitting Experiments
|
|
||||||
We are ready to submit experiments: launching the RAPIDS script with different sets of parameters.
|
|
||||||
|
|
||||||
|
|
||||||
The following couple of steps submit experiments under different conditions.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The user can change variable num\_gpu between one and the number of GPUs supported by the chosen vmSize. Variable part\_count can take any value between 1 and 11, but if it exceeds the maximum for num_gpu, the run would result in an error
|
|
||||||
|
|
||||||
|
|
||||||
If the experiment is successfully submitted, it would be placed on a queue for processing, its status would appeared as Queued and an output like the following would appear
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
When the experiment starts running, its status would appeared as Running and the output would change to something like this:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Reproducing the performance gains plot results on the Blog Post
|
|
||||||
When the run has finished successfully, its status would appeared as Completed and the output would change to something like this:
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Which is the output for an experiment run with three partitions and one GPU, notice that the reported processing time is 49.16 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
This output corresponds to a run with three partitions and two GPUs, notice that the reported processing time is 37.50 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and three GPUs, notice that the reported processing time is 24.40 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and four GPUs, notice that the reported processing time is 23.33 seconds just as depicted on the performance gains plot on the blogpost
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and using only CPU, notice that the reported processing time is 9 minutes and 1.21 seconds or 541.21 second just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with nine partitions and four GPUs, notice that the notebook throws a warning signaling that the number of partitions exceed the maximum that the system can handle with those many GPUs and the run ends up failing, hence having and status of Failed.
|
|
||||||
|
|
||||||
|
|
||||||
##### Freeing Resources
|
|
||||||
In the last step the notebook deletes the compute target. (This step is optional especially if the min_nodes in the cluster is set to 0 with which the cluster will scale down to 0 nodes when there is no usage.)
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
### RAPIDS Script
|
|
||||||
The Master Notebook runs experiments by launching a RAPIDS script with different sets of parameters. In this section, the RAPIDS script, process_data.py in the material, is analyzed
|
|
||||||
|
|
||||||
The script first imports all the necessary libraries and parses the arguments passed by the Master Notebook.
|
|
||||||
|
|
||||||
The all internal functions to be used by the script are defined.
|
|
||||||
|
|
||||||
|
|
||||||
#### Wrapper Auxiliary Functions:
|
|
||||||
The below functions are wrappers for a configuration module for librmm, the RAPIDS Memory Manager python interface:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
A couple of other functions are wrappers for the submission of jobs to the DASK client:
|
|
||||||
|
|
||||||

|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Data Loading Functions:
|
|
||||||
The data is loaded through the use of the following three functions
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
All three functions use library function cudf.read_csv(), cuDF version for the well known counterpart on Pandas.
|
|
||||||
|
|
||||||
|
|
||||||
#### Data Transformation and Feature Extraction Functions:
|
|
||||||
The raw data is transformed and processed to extract features by joining, slicing, grouping, aggregating, factoring, etc, the original dataframes just as is done with Pandas. The following functions in the script are used for that purpose:
|
|
||||||

|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Main() Function
|
|
||||||
The previous functions are used in the Main function to accomplish several steps: Set up the Dask client, do all ETL operations, set up and train an XGBoost model, the function also assigns which data needs to be processed by each Dask client
|
|
||||||
|
|
||||||
|
|
||||||
##### Setting Up DASK client:
|
|
||||||
The following lines:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
Initialize and set up a DASK client with a number of workers corresponding to the number of GPUs to be used on the run. A successful execution of the set up will result on the following output:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
##### All ETL functions are used on single calls to process\_quarter_gpu, one per data partition
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
##### Concentrating the data assigned to each DASK worker
|
|
||||||
The partitions assigned to each worker are concatenated and set up for training.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
##### Setting Training Parameters
|
|
||||||
The parameters used for the training of a gradient boosted decision tree model are set up in the following code block:
|
|
||||||

|
|
||||||
|
|
||||||
Notice how the parameters are modified when using the CPU-only mode.
|
|
||||||
|
|
||||||
|
|
||||||
##### Launching the training of a gradient boosted decision tree model using XGBoost.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
The outputs of the script can be observed in the master notebook as the script is executed
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Before Width: | Height: | Size: 180 KiB |
|
Before Width: | Height: | Size: 183 KiB |
|
Before Width: | Height: | Size: 183 KiB |
|
Before Width: | Height: | Size: 177 KiB |
|
Before Width: | Height: | Size: 5.0 KiB |
|
Before Width: | Height: | Size: 4.8 KiB |
|
Before Width: | Height: | Size: 3.2 KiB |
|
Before Width: | Height: | Size: 70 KiB |
|
Before Width: | Height: | Size: 64 KiB |
|
Before Width: | Height: | Size: 554 KiB |
|
Before Width: | Height: | Size: 213 KiB |
|
Before Width: | Height: | Size: 58 KiB |
|
Before Width: | Height: | Size: 34 KiB |
|
Before Width: | Height: | Size: 4.5 KiB |
|
Before Width: | Height: | Size: 187 KiB |
|
Before Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 9.7 KiB |
|
Before Width: | Height: | Size: 163 KiB |
|
Before Width: | Height: | Size: 3.5 KiB |
|
Before Width: | Height: | Size: 2.9 KiB |
|
Before Width: | Height: | Size: 2.5 KiB |
|
Before Width: | Height: | Size: 3.0 KiB |
|
Before Width: | Height: | Size: 60 KiB |
|
Before Width: | Height: | Size: 3.5 KiB |
|
Before Width: | Height: | Size: 3.9 KiB |
|
Before Width: | Height: | Size: 5.0 KiB |
|
Before Width: | Height: | Size: 4.0 KiB |
|
Before Width: | Height: | Size: 4.1 KiB |
|
Before Width: | Height: | Size: 4.5 KiB |
|
Before Width: | Height: | Size: 5.1 KiB |
|
Before Width: | Height: | Size: 3.9 KiB |
|
Before Width: | Height: | Size: 3.6 KiB |
|
Before Width: | Height: | Size: 120 KiB |
|
Before Width: | Height: | Size: 55 KiB |
|
Before Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 181 KiB |
|
Before Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 19 KiB |
|
Before Width: | Height: | Size: 45 KiB |
|
Before Width: | Height: | Size: 31 KiB |
|
Before Width: | Height: | Size: 29 KiB |
|
Before Width: | Height: | Size: 10 KiB |
|
Before Width: | Height: | Size: 18 KiB |
|
Before Width: | Height: | Size: 2.4 KiB |
|
Before Width: | Height: | Size: 2.5 KiB |
|
Before Width: | Height: | Size: 3.4 KiB |
|
Before Width: | Height: | Size: 4.8 KiB |
|
Before Width: | Height: | Size: 99 KiB |
@@ -1,35 +0,0 @@
|
|||||||
name: rapids
|
|
||||||
channels:
|
|
||||||
- nvidia
|
|
||||||
- numba
|
|
||||||
- conda-forge
|
|
||||||
- rapidsai
|
|
||||||
- defaults
|
|
||||||
- pytorch
|
|
||||||
|
|
||||||
dependencies:
|
|
||||||
- arrow-cpp=0.12.0
|
|
||||||
- bokeh
|
|
||||||
- cffi=1.11.5
|
|
||||||
- cmake=3.12
|
|
||||||
- cuda92
|
|
||||||
- cython==0.29
|
|
||||||
- dask=1.1.1
|
|
||||||
- distributed=1.25.3
|
|
||||||
- faiss-gpu=1.5.0
|
|
||||||
- numba=0.42
|
|
||||||
- numpy=1.15.4
|
|
||||||
- nvstrings
|
|
||||||
- pandas=0.23.4
|
|
||||||
- pyarrow=0.12.0
|
|
||||||
- scikit-learn
|
|
||||||
- scipy
|
|
||||||
- cudf
|
|
||||||
- cuml
|
|
||||||
- python=3.6.2
|
|
||||||
- jupyterlab
|
|
||||||
- pip:
|
|
||||||
- file:/rapids/xgboost/python-package/dist/xgboost-0.81-py3-none-any.whl
|
|
||||||
- git+https://github.com/rapidsai/dask-xgboost@dask-cudf
|
|
||||||
- git+https://github.com/rapidsai/dask-cudf@master
|
|
||||||
- git+https://github.com/rapidsai/dask-cuda@master
|
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
name: azure-ml-datadrift
|
name: azure-ml-datadrift
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-contrib-datadrift
|
- azureml-contrib-datadrift
|
||||||
- azureml-opendatasets
|
- azureml-opendatasets
|
||||||
- lightgbm
|
- lightgbm
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
|
|||||||
@@ -1,58 +1,58 @@
|
|||||||
import pickle
|
import pickle
|
||||||
import json
|
import json
|
||||||
import numpy
|
import numpy
|
||||||
import azureml.train.automl
|
import azureml.train.automl
|
||||||
from sklearn.externals import joblib
|
from sklearn.externals import joblib
|
||||||
from sklearn.linear_model import Ridge
|
from sklearn.linear_model import Ridge
|
||||||
from azureml.core.model import Model
|
from azureml.core.model import Model
|
||||||
from azureml.core.run import Run
|
from azureml.core.run import Run
|
||||||
from azureml.monitoring import ModelDataCollector
|
from azureml.monitoring import ModelDataCollector
|
||||||
import time
|
import time
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
|
|
||||||
|
|
||||||
def init():
|
def init():
|
||||||
global model, inputs_dc, prediction_dc, feature_names, categorical_features
|
global model, inputs_dc, prediction_dc, feature_names, categorical_features
|
||||||
|
|
||||||
print("Model is initialized" + time.strftime("%H:%M:%S"))
|
print("Model is initialized" + time.strftime("%H:%M:%S"))
|
||||||
model_path = Model.get_model_path(model_name="driftmodel")
|
model_path = Model.get_model_path(model_name="driftmodel")
|
||||||
model = joblib.load(model_path)
|
model = joblib.load(model_path)
|
||||||
|
|
||||||
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
|
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
|
||||||
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
|
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
|
||||||
"temperature-7"]
|
"temperature-7"]
|
||||||
|
|
||||||
categorical_features = ["usaf", "wban", "p_k", "station_name"]
|
categorical_features = ["usaf", "wban", "p_k", "station_name"]
|
||||||
|
|
||||||
inputs_dc = ModelDataCollector(model_name="driftmodel",
|
inputs_dc = ModelDataCollector(model_name="driftmodel",
|
||||||
identifier="inputs",
|
identifier="inputs",
|
||||||
feature_names=feature_names)
|
feature_names=feature_names)
|
||||||
|
|
||||||
prediction_dc = ModelDataCollector("driftmodel",
|
prediction_dc = ModelDataCollector("driftmodel",
|
||||||
identifier="predictions",
|
identifier="predictions",
|
||||||
feature_names=["temperature"])
|
feature_names=["temperature"])
|
||||||
|
|
||||||
|
|
||||||
def run(raw_data):
|
def run(raw_data):
|
||||||
global inputs_dc, prediction_dc
|
global inputs_dc, prediction_dc
|
||||||
|
|
||||||
try:
|
try:
|
||||||
data = json.loads(raw_data)["data"]
|
data = json.loads(raw_data)["data"]
|
||||||
data = pd.DataFrame(data)
|
data = pd.DataFrame(data)
|
||||||
|
|
||||||
# Remove the categorical features as the model expects OHE values
|
# Remove the categorical features as the model expects OHE values
|
||||||
input_data = data.drop(categorical_features, axis=1)
|
input_data = data.drop(categorical_features, axis=1)
|
||||||
|
|
||||||
result = model.predict(input_data)
|
result = model.predict(input_data)
|
||||||
|
|
||||||
# Collect the non-OHE dataframe
|
# Collect the non-OHE dataframe
|
||||||
collected_df = data[feature_names]
|
collected_df = data[feature_names]
|
||||||
|
|
||||||
inputs_dc.collect(collected_df.values)
|
inputs_dc.collect(collected_df.values)
|
||||||
prediction_dc.collect(result)
|
prediction_dc.collect(result)
|
||||||
return result.tolist()
|
return result.tolist()
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
error = str(e)
|
error = str(e)
|
||||||
|
|
||||||
print(error + time.strftime("%H:%M:%S"))
|
print(error + time.strftime("%H:%M:%S"))
|
||||||
return error
|
return error
|
||||||
|
|||||||
@@ -1,290 +1,290 @@
|
|||||||
# Table of Contents
|
# Table of Contents
|
||||||
1. [Automated ML Introduction](#introduction)
|
1. [Automated ML Introduction](#introduction)
|
||||||
1. [Setup using Azure Notebooks](#jupyter)
|
1. [Setup using Azure Notebooks](#jupyter)
|
||||||
1. [Setup using Azure Databricks](#databricks)
|
1. [Setup using Azure Databricks](#databricks)
|
||||||
1. [Setup using a Local Conda environment](#localconda)
|
1. [Setup using a Local Conda environment](#localconda)
|
||||||
1. [Automated ML SDK Sample Notebooks](#samples)
|
1. [Automated ML SDK Sample Notebooks](#samples)
|
||||||
1. [Documentation](#documentation)
|
1. [Documentation](#documentation)
|
||||||
1. [Running using python command](#pythoncommand)
|
1. [Running using python command](#pythoncommand)
|
||||||
1. [Troubleshooting](#troubleshooting)
|
1. [Troubleshooting](#troubleshooting)
|
||||||
|
|
||||||
<a name="introduction"></a>
|
<a name="introduction"></a>
|
||||||
# Automated ML introduction
|
# Automated ML introduction
|
||||||
Automated machine learning (automated ML) builds high quality machine learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, automated ML will give you a high quality machine learning model that you can use for predictions.
|
Automated machine learning (automated ML) builds high quality machine learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, automated ML will give you a high quality machine learning model that you can use for predictions.
|
||||||
|
|
||||||
|
|
||||||
If you are new to Data Science, automated ML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
|
If you are new to Data Science, automated ML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
|
||||||
|
|
||||||
If you are an experienced data scientist, automated ML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. Automated ML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
|
If you are an experienced data scientist, automated ML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. Automated ML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
|
||||||
|
|
||||||
Below are the three execution environments supported by automated ML.
|
Below are the three execution environments supported by automated ML.
|
||||||
|
|
||||||
|
|
||||||
<a name="jupyter"></a>
|
<a name="jupyter"></a>
|
||||||
## Setup using Azure Notebooks - Jupyter based notebooks in the Azure cloud
|
## Setup using Azure Notebooks - Jupyter based notebooks in the Azure cloud
|
||||||
|
|
||||||
1. [](https://aka.ms/aml-clone-azure-notebooks)
|
1. [](https://aka.ms/aml-clone-azure-notebooks)
|
||||||
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks.
|
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks.
|
||||||
1. Follow the instructions in the [configuration](../../configuration.ipynb) notebook to create and connect to a workspace.
|
1. Follow the instructions in the [configuration](../../configuration.ipynb) notebook to create and connect to a workspace.
|
||||||
1. Open one of the sample notebooks.
|
1. Open one of the sample notebooks.
|
||||||
|
|
||||||
<a name="databricks"></a>
|
<a name="databricks"></a>
|
||||||
## Setup using Azure Databricks
|
## Setup using Azure Databricks
|
||||||
|
|
||||||
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown).
|
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown).
|
||||||
**NOTE**: You should at least have contributor access to your Azure subcription to run the notebook.
|
**NOTE**: You should at least have contributor access to your Azure subcription to run the notebook.
|
||||||
- Please remove the previous SDK version if there is any and install the latest SDK by installing **azureml-sdk[automl_databricks]** as a PyPi library in Azure Databricks workspace.
|
- Please remove the previous SDK version if there is any and install the latest SDK by installing **azureml-sdk[automl_databricks]** as a PyPi library in Azure Databricks workspace.
|
||||||
- You can find the detail Readme instructions at [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks).
|
- You can find the detail Readme instructions at [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks).
|
||||||
- Download the sample notebook automl-databricks-local-01.ipynb from [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks) and import into the Azure databricks workspace.
|
- Download the sample notebook automl-databricks-local-01.ipynb from [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks) and import into the Azure databricks workspace.
|
||||||
- Attach the notebook to the cluster.
|
- Attach the notebook to the cluster.
|
||||||
|
|
||||||
<a name="localconda"></a>
|
<a name="localconda"></a>
|
||||||
## Setup using a Local Conda environment
|
## Setup using a Local Conda environment
|
||||||
|
|
||||||
To run these notebook on your own notebook server, use these installation instructions.
|
To run these notebook on your own notebook server, use these installation instructions.
|
||||||
The instructions below will install everything you need and then start a Jupyter notebook.
|
The instructions below will install everything you need and then start a Jupyter notebook.
|
||||||
|
|
||||||
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose 64-bit Python 3.7 or higher.
|
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose 64-bit Python 3.7 or higher.
|
||||||
- **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
|
- **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
|
||||||
There's no need to install mini-conda specifically.
|
There's no need to install mini-conda specifically.
|
||||||
|
|
||||||
### 2. Downloading the sample notebooks
|
### 2. Downloading the sample notebooks
|
||||||
- Download the sample notebooks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) as zip and extract the contents to a local directory. The automated ML sample notebooks are in the "automated-machine-learning" folder.
|
- Download the sample notebooks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) as zip and extract the contents to a local directory. The automated ML sample notebooks are in the "automated-machine-learning" folder.
|
||||||
|
|
||||||
### 3. Setup a new conda environment
|
### 3. Setup a new conda environment
|
||||||
The **automl_setup** script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook. It takes the conda environment name as an optional parameter. The default conda environment name is azure_automl. The exact command depends on the operating system. See the specific sections below for Windows, Mac and Linux. It can take about 10 minutes to execute.
|
The **automl_setup** script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook. It takes the conda environment name as an optional parameter. The default conda environment name is azure_automl. The exact command depends on the operating system. See the specific sections below for Windows, Mac and Linux. It can take about 10 minutes to execute.
|
||||||
|
|
||||||
Packages installed by the **automl_setup** script:
|
Packages installed by the **automl_setup** script:
|
||||||
<ul><li>python</li><li>nb_conda</li><li>matplotlib</li><li>numpy</li><li>cython</li><li>urllib3</li><li>scipy</li><li>scikit-learn</li><li>pandas</li><li>tensorflow</li><li>py-xgboost</li><li>azureml-sdk</li><li>azureml-widgets</li><li>pandas-ml</li></ul>
|
<ul><li>python</li><li>nb_conda</li><li>matplotlib</li><li>numpy</li><li>cython</li><li>urllib3</li><li>scipy</li><li>scikit-learn</li><li>pandas</li><li>tensorflow</li><li>py-xgboost</li><li>azureml-sdk</li><li>azureml-widgets</li><li>pandas-ml</li></ul>
|
||||||
|
|
||||||
For more details refer to the [automl_env.yml](./automl_env.yml)
|
For more details refer to the [automl_env.yml](./automl_env.yml)
|
||||||
## Windows
|
## Windows
|
||||||
Start an **Anaconda Prompt** window, cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
|
Start an **Anaconda Prompt** window, cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
|
||||||
```
|
```
|
||||||
automl_setup
|
automl_setup
|
||||||
```
|
```
|
||||||
## Mac
|
## Mac
|
||||||
Install "Command line developer tools" if it is not already installed (you can use the command: `xcode-select --install`).
|
Install "Command line developer tools" if it is not already installed (you can use the command: `xcode-select --install`).
|
||||||
|
|
||||||
Start a Terminal windows, cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
|
Start a Terminal windows, cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
|
||||||
|
|
||||||
```
|
```
|
||||||
bash automl_setup_mac.sh
|
bash automl_setup_mac.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
## Linux
|
## Linux
|
||||||
cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
|
cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
|
||||||
|
|
||||||
```
|
```
|
||||||
bash automl_setup_linux.sh
|
bash automl_setup_linux.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Running configuration.ipynb
|
### 4. Running configuration.ipynb
|
||||||
- Before running any samples you next need to run the configuration notebook. Click on [configuration](../../configuration.ipynb) notebook
|
- Before running any samples you next need to run the configuration notebook. Click on [configuration](../../configuration.ipynb) notebook
|
||||||
- Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
|
- Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
|
||||||
|
|
||||||
### 5. Running Samples
|
### 5. Running Samples
|
||||||
- Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks.
|
- Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks.
|
||||||
- Follow the instructions in the individual notebooks to explore various features in automated ML.
|
- Follow the instructions in the individual notebooks to explore various features in automated ML.
|
||||||
|
|
||||||
### 6. Starting jupyter notebook manually
|
### 6. Starting jupyter notebook manually
|
||||||
To start your Jupyter notebook manually, use:
|
To start your Jupyter notebook manually, use:
|
||||||
|
|
||||||
```
|
```
|
||||||
conda activate azure_automl
|
conda activate azure_automl
|
||||||
jupyter notebook
|
jupyter notebook
|
||||||
```
|
```
|
||||||
|
|
||||||
or on Mac or Linux:
|
or on Mac or Linux:
|
||||||
|
|
||||||
```
|
```
|
||||||
source activate azure_automl
|
source activate azure_automl
|
||||||
jupyter notebook
|
jupyter notebook
|
||||||
```
|
```
|
||||||
|
|
||||||
<a name="samples"></a>
|
<a name="samples"></a>
|
||||||
# Automated ML SDK Sample Notebooks
|
# Automated ML SDK Sample Notebooks
|
||||||
|
|
||||||
- [auto-ml-classification.ipynb](classification/auto-ml-classification.ipynb)
|
- [auto-ml-classification.ipynb](classification/auto-ml-classification.ipynb)
|
||||||
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
||||||
- Simple example of using automated ML for classification
|
- Simple example of using automated ML for classification
|
||||||
- Uses local compute for training
|
- Uses local compute for training
|
||||||
|
|
||||||
- [auto-ml-regression.ipynb](regression/auto-ml-regression.ipynb)
|
- [auto-ml-regression.ipynb](regression/auto-ml-regression.ipynb)
|
||||||
- Dataset: scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html)
|
- Dataset: scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html)
|
||||||
- Simple example of using automated ML for regression
|
- Simple example of using automated ML for regression
|
||||||
- Uses local compute for training
|
- Uses local compute for training
|
||||||
|
|
||||||
- [auto-ml-remote-amlcompute.ipynb](remote-amlcompute/auto-ml-remote-amlcompute.ipynb)
|
- [auto-ml-remote-amlcompute.ipynb](remote-amlcompute/auto-ml-remote-amlcompute.ipynb)
|
||||||
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
||||||
- Example of using automated ML for classification using remote AmlCompute for training
|
- Example of using automated ML for classification using remote AmlCompute for training
|
||||||
- Parallel execution of iterations
|
- Parallel execution of iterations
|
||||||
- Async tracking of progress
|
- Async tracking of progress
|
||||||
- Cancelling individual iterations or entire run
|
- Cancelling individual iterations or entire run
|
||||||
- Retrieving models for any iteration or logged metric
|
- Retrieving models for any iteration or logged metric
|
||||||
- Specify automated ML settings as kwargs
|
- Specify automated ML settings as kwargs
|
||||||
|
|
||||||
- [auto-ml-missing-data-blacklist-early-termination.ipynb](missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb)
|
- [auto-ml-missing-data-blacklist-early-termination.ipynb](missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb)
|
||||||
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
||||||
- Blacklist certain pipelines
|
- Blacklist certain pipelines
|
||||||
- Specify a target metrics to indicate stopping criteria
|
- Specify a target metrics to indicate stopping criteria
|
||||||
- Handling Missing Data in the input
|
- Handling Missing Data in the input
|
||||||
|
|
||||||
- [auto-ml-sparse-data-train-test-split.ipynb](sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.ipynb)
|
- [auto-ml-sparse-data-train-test-split.ipynb](sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.ipynb)
|
||||||
- Dataset: Scikit learn's [20newsgroup](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html)
|
- Dataset: Scikit learn's [20newsgroup](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html)
|
||||||
- Handle sparse datasets
|
- Handle sparse datasets
|
||||||
- Specify custom train and validation set
|
- Specify custom train and validation set
|
||||||
|
|
||||||
- [auto-ml-exploring-previous-runs.ipynb](exploring-previous-runs/auto-ml-exploring-previous-runs.ipynb)
|
- [auto-ml-exploring-previous-runs.ipynb](exploring-previous-runs/auto-ml-exploring-previous-runs.ipynb)
|
||||||
- List all projects for the workspace
|
- List all projects for the workspace
|
||||||
- List all automated ML Runs for a given project
|
- List all automated ML Runs for a given project
|
||||||
- Get details for a automated ML Run. (automated ML settings, run widget & all metrics)
|
- Get details for a automated ML Run. (automated ML settings, run widget & all metrics)
|
||||||
- Download fitted pipeline for any iteration
|
- Download fitted pipeline for any iteration
|
||||||
|
|
||||||
- [auto-ml-classification-with-deployment.ipynb](classification-with-deployment/auto-ml-classification-with-deployment.ipynb)
|
- [auto-ml-classification-with-deployment.ipynb](classification-with-deployment/auto-ml-classification-with-deployment.ipynb)
|
||||||
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
||||||
- Simple example of using automated ML for classification
|
- Simple example of using automated ML for classification
|
||||||
- Registering the model
|
- Registering the model
|
||||||
- Creating Image and creating aci service
|
- Creating Image and creating aci service
|
||||||
- Testing the aci service
|
- Testing the aci service
|
||||||
|
|
||||||
- [auto-ml-sample-weight.ipynb](sample-weight/auto-ml-sample-weight.ipynb)
|
- [auto-ml-sample-weight.ipynb](sample-weight/auto-ml-sample-weight.ipynb)
|
||||||
- How to specifying sample_weight
|
- How to specifying sample_weight
|
||||||
- The difference that it makes to test results
|
- The difference that it makes to test results
|
||||||
|
|
||||||
- [auto-ml-subsampling-local.ipynb](subsampling/auto-ml-subsampling-local.ipynb)
|
- [auto-ml-subsampling-local.ipynb](subsampling/auto-ml-subsampling-local.ipynb)
|
||||||
- How to enable subsampling
|
- How to enable subsampling
|
||||||
|
|
||||||
- [auto-ml-dataprep.ipynb](dataprep/auto-ml-dataprep.ipynb)
|
- [auto-ml-dataprep.ipynb](dataprep/auto-ml-dataprep.ipynb)
|
||||||
- Using DataPrep for reading data
|
- Using DataPrep for reading data
|
||||||
|
|
||||||
- [auto-ml-dataprep-remote-execution.ipynb](dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb)
|
- [auto-ml-dataprep-remote-execution.ipynb](dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb)
|
||||||
- Using DataPrep for reading data with remote execution
|
- Using DataPrep for reading data with remote execution
|
||||||
|
|
||||||
- [auto-ml-classification-with-whitelisting.ipynb](classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb)
|
- [auto-ml-classification-with-whitelisting.ipynb](classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb)
|
||||||
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
||||||
- Simple example of using automated ML for classification with whitelisting tensorflow models.
|
- Simple example of using automated ML for classification with whitelisting tensorflow models.
|
||||||
- Uses local compute for training
|
- Uses local compute for training
|
||||||
|
|
||||||
- [auto-ml-forecasting-energy-demand.ipynb](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)
|
- [auto-ml-forecasting-energy-demand.ipynb](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)
|
||||||
- Dataset: [NYC energy demand data](forecasting-a/nyc_energy.csv)
|
- Dataset: [NYC energy demand data](forecasting-a/nyc_energy.csv)
|
||||||
- Example of using automated ML for training a forecasting model
|
- Example of using automated ML for training a forecasting model
|
||||||
|
|
||||||
- [auto-ml-forecasting-orange-juice-sales.ipynb](forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb)
|
- [auto-ml-forecasting-orange-juice-sales.ipynb](forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb)
|
||||||
- Dataset: [Dominick's grocery sales of orange juice](forecasting-b/dominicks_OJ.csv)
|
- Dataset: [Dominick's grocery sales of orange juice](forecasting-b/dominicks_OJ.csv)
|
||||||
- Example of training an automated ML forecasting model on multiple time-series
|
- Example of training an automated ML forecasting model on multiple time-series
|
||||||
|
|
||||||
- [auto-ml-classification-with-onnx.ipynb](classification-with-onnx/auto-ml-classification-with-onnx.ipynb)
|
- [auto-ml-classification-with-onnx.ipynb](classification-with-onnx/auto-ml-classification-with-onnx.ipynb)
|
||||||
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
||||||
- Simple example of using automated ML for classification with ONNX models
|
- Simple example of using automated ML for classification with ONNX models
|
||||||
- Uses local compute for training
|
- Uses local compute for training
|
||||||
|
|
||||||
- [auto-ml-bank-marketing-subscribers-with-deployment.ipynb](bank-marketing-subscribers-with-deployment/auto-ml-bank-marketing-with-deployment.ipynb)
|
- [auto-ml-bank-marketing-subscribers-with-deployment.ipynb](bank-marketing-subscribers-with-deployment/auto-ml-bank-marketing-with-deployment.ipynb)
|
||||||
- Dataset: UCI's [bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
|
- Dataset: UCI's [bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
|
||||||
- Simple example of using automated ML for classification to predict term deposit subscriptions for a bank
|
- Simple example of using automated ML for classification to predict term deposit subscriptions for a bank
|
||||||
- Uses azure compute for training
|
- Uses azure compute for training
|
||||||
|
|
||||||
- [auto-ml-creditcard-with-deployment.ipynb](credit-card-fraud-detection-with-deployment/auto-ml-creditcard-with-deployment.ipynb)
|
- [auto-ml-creditcard-with-deployment.ipynb](credit-card-fraud-detection-with-deployment/auto-ml-creditcard-with-deployment.ipynb)
|
||||||
- Dataset: Kaggle's [credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
|
- Dataset: Kaggle's [credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
|
||||||
- Simple example of using automated ML for classification to fraudulent credit card transactions
|
- Simple example of using automated ML for classification to fraudulent credit card transactions
|
||||||
- Uses azure compute for training
|
- Uses azure compute for training
|
||||||
|
|
||||||
- [auto-ml-hardware-performance-with-deployment.ipynb](hardware-performance-prediction-with-deployment/auto-ml-hardware-performance-with-deployment.ipynb)
|
- [auto-ml-hardware-performance-with-deployment.ipynb](hardware-performance-prediction-with-deployment/auto-ml-hardware-performance-with-deployment.ipynb)
|
||||||
- Dataset: UCI's [computer hardware dataset](https://archive.ics.uci.edu/ml/datasets/Computer+Hardware)
|
- Dataset: UCI's [computer hardware dataset](https://archive.ics.uci.edu/ml/datasets/Computer+Hardware)
|
||||||
- Simple example of using automated ML for regression to predict the performance of certain combinations of hardware components
|
- Simple example of using automated ML for regression to predict the performance of certain combinations of hardware components
|
||||||
- Uses azure compute for training
|
- Uses azure compute for training
|
||||||
|
|
||||||
- [auto-ml-concrete-strength-with-deployment.ipynb](predicting-concrete-strength-with-deployment/auto-ml-concrete-strength-with-deployment.ipynb)
|
- [auto-ml-concrete-strength-with-deployment.ipynb](predicting-concrete-strength-with-deployment/auto-ml-concrete-strength-with-deployment.ipynb)
|
||||||
- Dataset: UCI's [concrete compressive strength dataset](https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set)
|
- Dataset: UCI's [concrete compressive strength dataset](https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set)
|
||||||
- Simple example of using automated ML for regression to predict the strength predict the compressive strength of concrete based off of different ingredient combinations and quantities of those ingredients
|
- Simple example of using automated ML for regression to predict the strength predict the compressive strength of concrete based off of different ingredient combinations and quantities of those ingredients
|
||||||
- Uses azure compute for training
|
- Uses azure compute for training
|
||||||
|
|
||||||
<a name="documentation"></a>
|
<a name="documentation"></a>
|
||||||
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
|
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
|
||||||
|
|
||||||
<a name="pythoncommand"></a>
|
<a name="pythoncommand"></a>
|
||||||
# Running using python command
|
# Running using python command
|
||||||
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
|
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
|
||||||
You can then run this file using the python command.
|
You can then run this file using the python command.
|
||||||
However, on Windows the file needs to be modified before it can be run.
|
However, on Windows the file needs to be modified before it can be run.
|
||||||
The following condition must be added to the main code in the file:
|
The following condition must be added to the main code in the file:
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|
||||||
The main code of the file must be indented so that it is under this condition.
|
The main code of the file must be indented so that it is under this condition.
|
||||||
|
|
||||||
<a name="troubleshooting"></a>
|
<a name="troubleshooting"></a>
|
||||||
# Troubleshooting
|
# Troubleshooting
|
||||||
## automl_setup fails
|
## automl_setup fails
|
||||||
1. On Windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window. You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt". If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed. In that case, you can install it [here](https://conda.io/miniconda.html)
|
1. On Windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window. You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt". If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed. In that case, you can install it [here](https://conda.io/miniconda.html)
|
||||||
2. Check that you have conda 64-bit installed rather than 32-bit. You can check this with the command `conda info`. The `platform` should be `win-64` for Windows or `osx-64` for Mac.
|
2. Check that you have conda 64-bit installed rather than 32-bit. You can check this with the command `conda info`. The `platform` should be `win-64` for Windows or `osx-64` for Mac.
|
||||||
3. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
|
3. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
|
||||||
4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`.
|
4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`.
|
||||||
5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
|
5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
|
||||||
|
|
||||||
## automl_setup_linux.sh fails
|
## automl_setup_linux.sh fails
|
||||||
If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory`
|
If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory`
|
||||||
1. Make sure that outbound ports 53 and 80 are enabled. On an Azure VM, you can do this from the Azure Portal by selecting the VM and clicking on Networking.
|
1. Make sure that outbound ports 53 and 80 are enabled. On an Azure VM, you can do this from the Azure Portal by selecting the VM and clicking on Networking.
|
||||||
2. Run the command: `sudo apt-get update`
|
2. Run the command: `sudo apt-get update`
|
||||||
3. Run the command: `sudo apt-get install build-essential --fix-missing`
|
3. Run the command: `sudo apt-get install build-essential --fix-missing`
|
||||||
4. Run `automl_setup_linux.sh` again.
|
4. Run `automl_setup_linux.sh` again.
|
||||||
|
|
||||||
## configuration.ipynb fails
|
## configuration.ipynb fails
|
||||||
1) For local conda, make sure that you have susccessfully run automl_setup first.
|
1) For local conda, make sure that you have susccessfully run automl_setup first.
|
||||||
2) Check that the subscription_id is correct. You can find the subscription_id in the Azure Portal by selecting All Service and then Subscriptions. The characters "<" and ">" should not be included in the subscription_id value. For example, `subscription_id = "12345678-90ab-1234-5678-1234567890abcd"` has the valid format.
|
2) Check that the subscription_id is correct. You can find the subscription_id in the Azure Portal by selecting All Service and then Subscriptions. The characters "<" and ">" should not be included in the subscription_id value. For example, `subscription_id = "12345678-90ab-1234-5678-1234567890abcd"` has the valid format.
|
||||||
3) Check that you have Contributor or Owner access to the Subscription.
|
3) Check that you have Contributor or Owner access to the Subscription.
|
||||||
4) Check that the region is one of the supported regions: `eastus2`, `eastus`, `westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`
|
4) Check that the region is one of the supported regions: `eastus2`, `eastus`, `westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`
|
||||||
5) Check that you have access to the region using the Azure Portal.
|
5) Check that you have access to the region using the Azure Portal.
|
||||||
|
|
||||||
## workspace.from_config fails
|
## workspace.from_config fails
|
||||||
If the call `ws = Workspace.from_config()` fails:
|
If the call `ws = Workspace.from_config()` fails:
|
||||||
1) Make sure that you have run the `configuration.ipynb` notebook successfully.
|
1) Make sure that you have run the `configuration.ipynb` notebook successfully.
|
||||||
2) If you are running a notebook from a folder that is not under the folder where you ran `configuration.ipynb`, copy the folder aml_config and the file config.json that it contains to the new folder. Workspace.from_config reads the config.json for the notebook folder or it parent folder.
|
2) If you are running a notebook from a folder that is not under the folder where you ran `configuration.ipynb`, copy the folder aml_config and the file config.json that it contains to the new folder. Workspace.from_config reads the config.json for the notebook folder or it parent folder.
|
||||||
3) If you are switching to a new subscription, resource group, workspace or region, make sure that you run the `configuration.ipynb` notebook again. Changing config.json directly will only work if the workspace already exists in the specified resource group under the specified subscription.
|
3) If you are switching to a new subscription, resource group, workspace or region, make sure that you run the `configuration.ipynb` notebook again. Changing config.json directly will only work if the workspace already exists in the specified resource group under the specified subscription.
|
||||||
4) If you want to change the region, please change the workspace, resource group or subscription. `Workspace.create` will not create or update a workspace if it already exists, even if the region specified is different.
|
4) If you want to change the region, please change the workspace, resource group or subscription. `Workspace.create` will not create or update a workspace if it already exists, even if the region specified is different.
|
||||||
|
|
||||||
## Sample notebook fails
|
## Sample notebook fails
|
||||||
If a sample notebook fails with an error that property, method or library does not exist:
|
If a sample notebook fails with an error that property, method or library does not exist:
|
||||||
1) Check that you have selected correct kernel in jupyter notebook. The kernel is displayed in the top right of the notebook page. It can be changed using the `Kernel | Change Kernel` menu option. For Azure Notebooks, it should be `Python 3.6`. For local conda environments, it should be the conda envioronment name that you specified in automl_setup. The default is azure_automl. Note that the kernel is saved as part of the notebook. So, if you switch to a new conda environment, you will have to select the new kernel in the notebook.
|
1) Check that you have selected correct kernel in jupyter notebook. The kernel is displayed in the top right of the notebook page. It can be changed using the `Kernel | Change Kernel` menu option. For Azure Notebooks, it should be `Python 3.6`. For local conda environments, it should be the conda envioronment name that you specified in automl_setup. The default is azure_automl. Note that the kernel is saved as part of the notebook. So, if you switch to a new conda environment, you will have to select the new kernel in the notebook.
|
||||||
2) Check that the notebook is for the SDK version that you are using. You can check the SDK version by executing `azureml.core.VERSION` in a jupyter notebook cell. You can download previous version of the sample notebooks from GitHub by clicking the `Branch` button, selecting the `Tags` tab and then selecting the version.
|
2) Check that the notebook is for the SDK version that you are using. You can check the SDK version by executing `azureml.core.VERSION` in a jupyter notebook cell. You can download previous version of the sample notebooks from GitHub by clicking the `Branch` button, selecting the `Tags` tab and then selecting the version.
|
||||||
|
|
||||||
## Numpy import fails on Windows
|
## Numpy import fails on Windows
|
||||||
Some Windows environments see an error loading numpy with the latest Python version 3.6.8. If you see this issue, try with Python version 3.6.7.
|
Some Windows environments see an error loading numpy with the latest Python version 3.6.8. If you see this issue, try with Python version 3.6.7.
|
||||||
|
|
||||||
## Numpy import fails
|
## Numpy import fails
|
||||||
Check the tensorflow version in the automated ml conda environment. Supported versions are < 1.13. Uninstall tensorflow from the environment if version is >= 1.13
|
Check the tensorflow version in the automated ml conda environment. Supported versions are < 1.13. Uninstall tensorflow from the environment if version is >= 1.13
|
||||||
You may check the version of tensorflow and uninstall as follows
|
You may check the version of tensorflow and uninstall as follows
|
||||||
1) start a command shell, activate conda environment where automated ml packages are installed
|
1) start a command shell, activate conda environment where automated ml packages are installed
|
||||||
2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13
|
2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13
|
||||||
3) If the listed version is a not a supported version, `pip uninstall tensorflow` in the command shell and enter y for confirmation.
|
3) If the listed version is a not a supported version, `pip uninstall tensorflow` in the command shell and enter y for confirmation.
|
||||||
|
|
||||||
## Remote run: DsvmCompute.create fails
|
## Remote run: DsvmCompute.create fails
|
||||||
There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are:
|
There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are:
|
||||||
1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.` Note that underscore is not allowed in the name.
|
1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.` Note that underscore is not allowed in the name.
|
||||||
2) `The requested VM size xxxxx is not available in the current region.` You can select a different region or vm_size.
|
2) `The requested VM size xxxxx is not available in the current region.` You can select a different region or vm_size.
|
||||||
|
|
||||||
## Remote run: Unable to establish SSH connection
|
## Remote run: Unable to establish SSH connection
|
||||||
Automated ML uses the SSH protocol to communicate with remote DSVMs. This defaults to port 22. Possible causes for this error are:
|
Automated ML uses the SSH protocol to communicate with remote DSVMs. This defaults to port 22. Possible causes for this error are:
|
||||||
1) The DSVM is not ready for SSH connections. When DSVM creation completes, the DSVM might still not be ready to acceept SSH connections. The sample notebooks have a one minute delay to allow for this.
|
1) The DSVM is not ready for SSH connections. When DSVM creation completes, the DSVM might still not be ready to acceept SSH connections. The sample notebooks have a one minute delay to allow for this.
|
||||||
2) Your Azure Subscription may restrict the IP address ranges that can access the DSVM on port 22. You can check this in the Azure Portal by selecting the Virtual Machine and then clicking Networking. The Virtual Machine name is the name that you provided in the notebook plus 10 alpha numeric characters to make the name unique. The Inbound Port Rules define what can access the VM on specific ports. Note that there is a priority priority order. So, a Deny entry with a low priority number will override a Allow entry with a higher priority number.
|
2) Your Azure Subscription may restrict the IP address ranges that can access the DSVM on port 22. You can check this in the Azure Portal by selecting the Virtual Machine and then clicking Networking. The Virtual Machine name is the name that you provided in the notebook plus 10 alpha numeric characters to make the name unique. The Inbound Port Rules define what can access the VM on specific ports. Note that there is a priority priority order. So, a Deny entry with a low priority number will override a Allow entry with a higher priority number.
|
||||||
|
|
||||||
## Remote run: setup iteration fails
|
## Remote run: setup iteration fails
|
||||||
This is often an issue with the `get_data` method.
|
This is often an issue with the `get_data` method.
|
||||||
1) Check that the `get_data` method is valid by running it locally.
|
1) Check that the `get_data` method is valid by running it locally.
|
||||||
2) Make sure that `get_data` isn't referring to any local files. `get_data` is executed on the remote DSVM. So, it doesn't have direct access to local data files. Instead you can store the data files with DataStore. See [auto-ml-remote-execution-with-datastore.ipynb](remote-execution-with-datastore/auto-ml-remote-execution-with-datastore.ipynb)
|
2) Make sure that `get_data` isn't referring to any local files. `get_data` is executed on the remote DSVM. So, it doesn't have direct access to local data files. Instead you can store the data files with DataStore. See [auto-ml-remote-execution-with-datastore.ipynb](remote-execution-with-datastore/auto-ml-remote-execution-with-datastore.ipynb)
|
||||||
3) You can get to the error log for the setup iteration by clicking the `Click here to see the run in Azure portal` link, click `Back to Experiment`, click on the highest run number and then click on Logs.
|
3) You can get to the error log for the setup iteration by clicking the `Click here to see the run in Azure portal` link, click `Back to Experiment`, click on the highest run number and then click on Logs.
|
||||||
|
|
||||||
## Remote run: disk full
|
## Remote run: disk full
|
||||||
Automated ML creates files under /tmp/azureml_runs for each iteration that it runs. It creates a folder with the iteration id. For example: AutoML_9a038a18-77cc-48f1-80fb-65abdbc33abe_93. Under this, there is a azureml-logs folder, which contains logs. If you run too many iterations on the same DSVM, these files can fill the disk.
|
Automated ML creates files under /tmp/azureml_runs for each iteration that it runs. It creates a folder with the iteration id. For example: AutoML_9a038a18-77cc-48f1-80fb-65abdbc33abe_93. Under this, there is a azureml-logs folder, which contains logs. If you run too many iterations on the same DSVM, these files can fill the disk.
|
||||||
You can delete the files under /tmp/azureml_runs or just delete the VM and create a new one.
|
You can delete the files under /tmp/azureml_runs or just delete the VM and create a new one.
|
||||||
If your get_data downloads files, make sure the delete them or they can use disk space as well.
|
If your get_data downloads files, make sure the delete them or they can use disk space as well.
|
||||||
When using DataStore, it is good to specify an absolute path for the files so that they are downloaded just once. If you specify a relative path, it will download a file for each iteration.
|
When using DataStore, it is good to specify an absolute path for the files so that they are downloaded just once. If you specify a relative path, it will download a file for each iteration.
|
||||||
|
|
||||||
## Remote run: Iterations fail and the log contains "MemoryError"
|
## Remote run: Iterations fail and the log contains "MemoryError"
|
||||||
This can be caused by insufficient memory on the DSVM. Automated ML loads all training data into memory. So, the available memory should be more than the training data size.
|
This can be caused by insufficient memory on the DSVM. Automated ML loads all training data into memory. So, the available memory should be more than the training data size.
|
||||||
If you are using a remote DSVM, memory is needed for each concurrent iteration. The max_concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and max_concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.
|
If you are using a remote DSVM, memory is needed for each concurrent iteration. The max_concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and max_concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.
|
||||||
To resolve this issue, allocate a DSVM with more memory or reduce the value specified for max_concurrent_iterations.
|
To resolve this issue, allocate a DSVM with more memory or reduce the value specified for max_concurrent_iterations.
|
||||||
|
|
||||||
## Remote run: Iterations show as "Not Responding" in the RunDetails widget.
|
## Remote run: Iterations show as "Not Responding" in the RunDetails widget.
|
||||||
This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM.
|
This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM.
|
||||||
To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.
|
To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.
|
||||||
@@ -1,22 +1,22 @@
|
|||||||
name: azure_automl
|
name: azure_automl
|
||||||
dependencies:
|
dependencies:
|
||||||
# The python interpreter version.
|
# The python interpreter version.
|
||||||
# Currently Azure ML only supports 3.5.2 and later.
|
# Currently Azure ML only supports 3.5.2 and later.
|
||||||
- pip
|
- pip
|
||||||
- python>=3.5.2,<3.6.8
|
- python>=3.5.2,<3.6.8
|
||||||
- nb_conda
|
- nb_conda
|
||||||
- matplotlib==2.1.0
|
- matplotlib==2.1.0
|
||||||
- numpy>=1.11.0,<=1.16.2
|
- numpy>=1.11.0,<=1.16.2
|
||||||
- cython
|
- cython
|
||||||
- urllib3<1.24
|
- urllib3<1.24
|
||||||
- scipy>=1.0.0,<=1.1.0
|
- scipy>=1.0.0,<=1.1.0
|
||||||
- scikit-learn>=0.19.0,<=0.20.3
|
- scikit-learn>=0.19.0,<=0.20.3
|
||||||
- pandas>=0.22.0,<=0.23.4
|
- pandas>=0.22.0,<=0.23.4
|
||||||
- py-xgboost<=0.80
|
- py-xgboost<=0.80
|
||||||
|
|
||||||
- pip:
|
- pip:
|
||||||
# Required packages for AzureML execution, history, and data preparation.
|
# Required packages for AzureML execution, history, and data preparation.
|
||||||
- azureml-sdk[automl,explain]
|
- azureml-sdk[automl,explain]
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|
||||||
|
|||||||
@@ -1,22 +0,0 @@
|
|||||||
name: azure_automl
|
|
||||||
dependencies:
|
|
||||||
# The python interpreter version.
|
|
||||||
# Currently Azure ML only supports 3.5.2 and later.
|
|
||||||
- nomkl
|
|
||||||
- python>=3.5.2,<3.6.8
|
|
||||||
- nb_conda
|
|
||||||
- matplotlib==2.1.0
|
|
||||||
- numpy>=1.11.0,<=1.16.2
|
|
||||||
- cython
|
|
||||||
- urllib3<1.24
|
|
||||||
- scipy>=1.0.0,<=1.1.0
|
|
||||||
- scikit-learn>=0.19.0,<=0.20.3
|
|
||||||
- pandas>=0.22.0,<0.23.0
|
|
||||||
- py-xgboost<=0.80
|
|
||||||
|
|
||||||
- pip:
|
|
||||||
# Required packages for AzureML execution, history, and data preparation.
|
|
||||||
- azureml-sdk[automl,explain]
|
|
||||||
- azureml-widgets
|
|
||||||
- pandas_ml
|
|
||||||
|
|
||||||
@@ -1,62 +1,62 @@
|
|||||||
@echo off
|
@echo off
|
||||||
set conda_env_name=%1
|
set conda_env_name=%1
|
||||||
set automl_env_file=%2
|
set automl_env_file=%2
|
||||||
set options=%3
|
set options=%3
|
||||||
set PIP_NO_WARN_SCRIPT_LOCATION=0
|
set PIP_NO_WARN_SCRIPT_LOCATION=0
|
||||||
|
|
||||||
IF "%conda_env_name%"=="" SET conda_env_name="azure_automl"
|
IF "%conda_env_name%"=="" SET conda_env_name="azure_automl"
|
||||||
IF "%automl_env_file%"=="" SET automl_env_file="automl_env.yml"
|
IF "%automl_env_file%"=="" SET automl_env_file="automl_env.yml"
|
||||||
|
|
||||||
IF NOT EXIST %automl_env_file% GOTO YmlMissing
|
IF NOT EXIST %automl_env_file% GOTO YmlMissing
|
||||||
|
|
||||||
IF "%CONDA_EXE%"=="" GOTO CondaMissing
|
IF "%CONDA_EXE%"=="" GOTO CondaMissing
|
||||||
|
|
||||||
call conda activate %conda_env_name% 2>nul:
|
call conda activate %conda_env_name% 2>nul:
|
||||||
|
|
||||||
if not errorlevel 1 (
|
if not errorlevel 1 (
|
||||||
echo Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment %conda_env_name%
|
echo Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment %conda_env_name%
|
||||||
call pip install --upgrade azureml-sdk[automl,notebooks,explain]
|
call pip install --upgrade azureml-sdk[automl,notebooks,explain]
|
||||||
if errorlevel 1 goto ErrorExit
|
if errorlevel 1 goto ErrorExit
|
||||||
) else (
|
) else (
|
||||||
call conda env create -f %automl_env_file% -n %conda_env_name%
|
call conda env create -f %automl_env_file% -n %conda_env_name%
|
||||||
)
|
)
|
||||||
|
|
||||||
call conda activate %conda_env_name% 2>nul:
|
call conda activate %conda_env_name% 2>nul:
|
||||||
if errorlevel 1 goto ErrorExit
|
if errorlevel 1 goto ErrorExit
|
||||||
|
|
||||||
call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
|
call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
|
||||||
|
|
||||||
REM azureml.widgets is now installed as part of the pip install under the conda env.
|
REM azureml.widgets is now installed as part of the pip install under the conda env.
|
||||||
REM Removing the old user install so that the notebooks will use the latest widget.
|
REM Removing the old user install so that the notebooks will use the latest widget.
|
||||||
call jupyter nbextension uninstall --user --py azureml.widgets
|
call jupyter nbextension uninstall --user --py azureml.widgets
|
||||||
|
|
||||||
echo.
|
echo.
|
||||||
echo.
|
echo.
|
||||||
echo ***************************************
|
echo ***************************************
|
||||||
echo * AutoML setup completed successfully *
|
echo * AutoML setup completed successfully *
|
||||||
echo ***************************************
|
echo ***************************************
|
||||||
IF NOT "%options%"=="nolaunch" (
|
IF NOT "%options%"=="nolaunch" (
|
||||||
echo.
|
echo.
|
||||||
echo Starting jupyter notebook - please run the configuration notebook
|
echo Starting jupyter notebook - please run the configuration notebook
|
||||||
echo.
|
echo.
|
||||||
jupyter notebook --log-level=50 --notebook-dir='..\..'
|
jupyter notebook --log-level=50 --notebook-dir='..\..'
|
||||||
)
|
)
|
||||||
|
|
||||||
goto End
|
goto End
|
||||||
|
|
||||||
:CondaMissing
|
:CondaMissing
|
||||||
echo Please run this script from an Anaconda Prompt window.
|
echo Please run this script from an Anaconda Prompt window.
|
||||||
echo You can start an Anaconda Prompt window by
|
echo You can start an Anaconda Prompt window by
|
||||||
echo typing Anaconda Prompt on the Start menu.
|
echo typing Anaconda Prompt on the Start menu.
|
||||||
echo If you don't see the Anaconda Prompt app, install Miniconda.
|
echo If you don't see the Anaconda Prompt app, install Miniconda.
|
||||||
echo If you are running an older version of Miniconda or Anaconda,
|
echo If you are running an older version of Miniconda or Anaconda,
|
||||||
echo you can upgrade using the command: conda update conda
|
echo you can upgrade using the command: conda update conda
|
||||||
goto End
|
goto End
|
||||||
|
|
||||||
:YmlMissing
|
:YmlMissing
|
||||||
echo File %automl_env_file% not found.
|
echo File %automl_env_file% not found.
|
||||||
|
|
||||||
:ErrorExit
|
:ErrorExit
|
||||||
echo Install failed
|
echo Install failed
|
||||||
|
|
||||||
:End
|
:End
|
||||||
@@ -1,52 +1,52 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
CONDA_ENV_NAME=$1
|
CONDA_ENV_NAME=$1
|
||||||
AUTOML_ENV_FILE=$2
|
AUTOML_ENV_FILE=$2
|
||||||
OPTIONS=$3
|
OPTIONS=$3
|
||||||
PIP_NO_WARN_SCRIPT_LOCATION=0
|
PIP_NO_WARN_SCRIPT_LOCATION=0
|
||||||
|
|
||||||
if [ "$CONDA_ENV_NAME" == "" ]
|
if [ "$CONDA_ENV_NAME" == "" ]
|
||||||
then
|
then
|
||||||
CONDA_ENV_NAME="azure_automl"
|
CONDA_ENV_NAME="azure_automl"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ "$AUTOML_ENV_FILE" == "" ]
|
if [ "$AUTOML_ENV_FILE" == "" ]
|
||||||
then
|
then
|
||||||
AUTOML_ENV_FILE="automl_env.yml"
|
AUTOML_ENV_FILE="automl_env.yml"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ ! -f $AUTOML_ENV_FILE ]; then
|
if [ ! -f $AUTOML_ENV_FILE ]; then
|
||||||
echo "File $AUTOML_ENV_FILE not found"
|
echo "File $AUTOML_ENV_FILE not found"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if source activate $CONDA_ENV_NAME 2> /dev/null
|
if source activate $CONDA_ENV_NAME 2> /dev/null
|
||||||
then
|
then
|
||||||
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
||||||
pip install --upgrade azureml-sdk[automl,notebooks,explain] &&
|
pip install --upgrade azureml-sdk[automl,notebooks,explain] &&
|
||||||
jupyter nbextension uninstall --user --py azureml.widgets
|
jupyter nbextension uninstall --user --py azureml.widgets
|
||||||
else
|
else
|
||||||
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
||||||
source activate $CONDA_ENV_NAME &&
|
source activate $CONDA_ENV_NAME &&
|
||||||
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
|
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
|
||||||
jupyter nbextension uninstall --user --py azureml.widgets &&
|
jupyter nbextension uninstall --user --py azureml.widgets &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "***************************************" &&
|
echo "***************************************" &&
|
||||||
echo "* AutoML setup completed successfully *" &&
|
echo "* AutoML setup completed successfully *" &&
|
||||||
echo "***************************************" &&
|
echo "***************************************" &&
|
||||||
if [ "$OPTIONS" != "nolaunch" ]
|
if [ "$OPTIONS" != "nolaunch" ]
|
||||||
then
|
then
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "Starting jupyter notebook - please run the configuration notebook" &&
|
echo "Starting jupyter notebook - please run the configuration notebook" &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
jupyter notebook --log-level=50 --notebook-dir '../..'
|
jupyter notebook --log-level=50 --notebook-dir '../..'
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ $? -gt 0 ]
|
if [ $? -gt 0 ]
|
||||||
then
|
then
|
||||||
echo "Installation failed"
|
echo "Installation failed"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,54 +1,54 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
CONDA_ENV_NAME=$1
|
CONDA_ENV_NAME=$1
|
||||||
AUTOML_ENV_FILE=$2
|
AUTOML_ENV_FILE=$2
|
||||||
OPTIONS=$3
|
OPTIONS=$3
|
||||||
PIP_NO_WARN_SCRIPT_LOCATION=0
|
PIP_NO_WARN_SCRIPT_LOCATION=0
|
||||||
|
|
||||||
if [ "$CONDA_ENV_NAME" == "" ]
|
if [ "$CONDA_ENV_NAME" == "" ]
|
||||||
then
|
then
|
||||||
CONDA_ENV_NAME="azure_automl"
|
CONDA_ENV_NAME="azure_automl"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ "$AUTOML_ENV_FILE" == "" ]
|
if [ "$AUTOML_ENV_FILE" == "" ]
|
||||||
then
|
then
|
||||||
AUTOML_ENV_FILE="automl_env_mac.yml"
|
AUTOML_ENV_FILE="automl_env_mac.yml"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ ! -f $AUTOML_ENV_FILE ]; then
|
if [ ! -f $AUTOML_ENV_FILE ]; then
|
||||||
echo "File $AUTOML_ENV_FILE not found"
|
echo "File $AUTOML_ENV_FILE not found"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if source activate $CONDA_ENV_NAME 2> /dev/null
|
if source activate $CONDA_ENV_NAME 2> /dev/null
|
||||||
then
|
then
|
||||||
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
|
||||||
pip install --upgrade azureml-sdk[automl,notebooks,explain] &&
|
pip install --upgrade azureml-sdk[automl,notebooks,explain] &&
|
||||||
jupyter nbextension uninstall --user --py azureml.widgets
|
jupyter nbextension uninstall --user --py azureml.widgets
|
||||||
else
|
else
|
||||||
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
|
||||||
source activate $CONDA_ENV_NAME &&
|
source activate $CONDA_ENV_NAME &&
|
||||||
conda install lightgbm -c conda-forge -y &&
|
conda install lightgbm -c conda-forge -y &&
|
||||||
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
|
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
|
||||||
jupyter nbextension uninstall --user --py azureml.widgets &&
|
jupyter nbextension uninstall --user --py azureml.widgets &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "***************************************" &&
|
echo "***************************************" &&
|
||||||
echo "* AutoML setup completed successfully *" &&
|
echo "* AutoML setup completed successfully *" &&
|
||||||
echo "***************************************" &&
|
echo "***************************************" &&
|
||||||
if [ "$OPTIONS" != "nolaunch" ]
|
if [ "$OPTIONS" != "nolaunch" ]
|
||||||
then
|
then
|
||||||
echo "" &&
|
echo "" &&
|
||||||
echo "Starting jupyter notebook - please run the configuration notebook" &&
|
echo "Starting jupyter notebook - please run the configuration notebook" &&
|
||||||
echo "" &&
|
echo "" &&
|
||||||
jupyter notebook --log-level=50 --notebook-dir '../..'
|
jupyter notebook --log-level=50 --notebook-dir '../..'
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ $? -gt 0 ]
|
if [ $? -gt 0 ]
|
||||||
then
|
then
|
||||||
echo "Installation failed"
|
echo "Installation failed"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-classification-bank-marketing
|
name: auto-ml-classification-bank-marketing
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-classification-credit-card-fraud
|
name: auto-ml-classification-credit-card-fraud
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-classification-with-deployment
|
name: auto-ml-classification-with-deployment
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,381 +1,381 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
},
|
||||||
"\n",
|
"authors": [
|
||||||
"Licensed under the MIT License."
|
{
|
||||||
]
|
"name": "savitam"
|
||||||
},
|
}
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"language_info": {
|
||||||
"metadata": {},
|
"mimetype": "text/x-python",
|
||||||
"source": [
|
"codemirror_mode": {
|
||||||
""
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Automated Machine Learning\n",
|
"version": "3.6.6"
|
||||||
"_**Classification with Local Compute**_\n",
|
}
|
||||||
"\n",
|
},
|
||||||
"## Contents\n",
|
"nbformat": 4,
|
||||||
"1. [Introduction](#Introduction)\n",
|
"cells": [
|
||||||
"1. [Setup](#Setup)\n",
|
{
|
||||||
"1. [Data](#Data)\n",
|
"metadata": {},
|
||||||
"1. [Train](#Train)\n",
|
"source": [
|
||||||
"1. [Results](#Results)\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"1. [Test](#Test)\n",
|
"\n",
|
||||||
"\n"
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "markdown"
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
""
|
||||||
"\n",
|
],
|
||||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"Please find the ONNX related documentations [here](https://github.com/onnx/onnx).\n",
|
"source": [
|
||||||
"\n",
|
"# Automated Machine Learning\n",
|
||||||
"In this notebook you will learn how to:\n",
|
"_**Classification with Local Compute**_\n",
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"\n",
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"## Contents\n",
|
||||||
"3. Train the model using local compute with ONNX compatible config on.\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"4. Explore the results and save the ONNX model."
|
"1. [Setup](#Setup)\n",
|
||||||
]
|
"1. [Data](#Data)\n",
|
||||||
},
|
"1. [Train](#Train)\n",
|
||||||
{
|
"1. [Results](#Results)\n",
|
||||||
"cell_type": "markdown",
|
"1. [Test](#Test)\n",
|
||||||
"metadata": {},
|
"\n"
|
||||||
"source": [
|
],
|
||||||
"## Setup\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Introduction\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"source": [
|
"\n",
|
||||||
"import logging\n",
|
"Please find the ONNX related documentations [here](https://github.com/onnx/onnx).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"import numpy as np\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"import pandas as pd\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"from sklearn import datasets\n",
|
"3. Train the model using local compute with ONNX compatible config on.\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
"4. Explore the results and save the ONNX model."
|
||||||
"\n",
|
],
|
||||||
"import azureml.core\n",
|
"cell_type": "markdown"
|
||||||
"from azureml.core.experiment import Experiment\n",
|
},
|
||||||
"from azureml.core.workspace import Workspace\n",
|
{
|
||||||
"from azureml.train.automl import AutoMLConfig, constants"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Setup\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"ws = Workspace.from_config()\n",
|
"metadata": {},
|
||||||
"\n",
|
"outputs": [],
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"execution_count": null,
|
||||||
"experiment_name = 'automl-classification-onnx'\n",
|
"source": [
|
||||||
"project_folder = './sample_projects/automl-classification-onnx'\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"\n",
|
"import numpy as np\n",
|
||||||
"output = {}\n",
|
"import pandas as pd\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"from sklearn import datasets\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"import azureml.core\n",
|
||||||
"output['Location'] = ws.location\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
"from azureml.train.automl import AutoMLConfig, constants"
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
],
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"cell_type": "code"
|
||||||
"outputDf.T"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"ws = Workspace.from_config()\n",
|
||||||
"## Data\n",
|
"\n",
|
||||||
"\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
|
"experiment_name = 'automl-classification-onnx'\n",
|
||||||
]
|
"project_folder = './sample_projects/automl-classification-onnx'\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"output = {}\n",
|
||||||
"metadata": {},
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"outputs": [],
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"source": [
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"iris = datasets.load_iris()\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
|
"output['Location'] = ws.location\n",
|
||||||
" iris.target, \n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
" test_size=0.2, \n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
" random_state=0)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"\n"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "code"
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Ensure the x_train and x_test are pandas DataFrame."
|
"## Data\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "markdown"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"outputs": [],
|
||||||
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
"execution_count": null,
|
||||||
"# This is needed for initializing the input variable names of ONNX model, \n",
|
"source": [
|
||||||
"# and the prediction with the ONNX model using the inference helper.\n",
|
"iris = datasets.load_iris()\n",
|
||||||
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
|
||||||
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
|
" iris.target, \n",
|
||||||
]
|
" test_size=0.2, \n",
|
||||||
},
|
" random_state=0)\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"\n"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"## Train with enable ONNX compatible models config on\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
"metadata": {},
|
||||||
"\n",
|
"source": [
|
||||||
"Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n",
|
"### Ensure the x_train and x_test are pandas DataFrame."
|
||||||
"\n",
|
],
|
||||||
"|Property|Description|\n",
|
"cell_type": "markdown"
|
||||||
"|-|-|\n",
|
},
|
||||||
"|**task**|classification or regression|\n",
|
{
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
"metadata": {},
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"outputs": [],
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"execution_count": null,
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"source": [
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
||||||
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n",
|
"# This is needed for initializing the input variable names of ONNX model, \n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"# and the prediction with the ONNX model using the inference helper.\n",
|
||||||
]
|
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
||||||
},
|
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"cell_type": "code"
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"### Set the preprocess=True, currently the InferenceHelper only supports this mode."
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Train with enable ONNX compatible models config on\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"|Property|Description|\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"|-|-|\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
"|**task**|classification or regression|\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
" iterations = 10,\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
" verbosity = logging.INFO, \n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
" X = X_train, \n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
" y = y_train,\n",
|
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n",
|
||||||
" preprocess=True,\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
" enable_onnx_compatible_models=True,\n",
|
],
|
||||||
" path = project_folder)"
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"### Set the preprocess=True, currently the InferenceHelper only supports this mode."
|
||||||
"source": [
|
],
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"cell_type": "markdown"
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"outputs": [],
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"source": [
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
" iteration_timeout_minutes = 60,\n",
|
||||||
]
|
" iterations = 10,\n",
|
||||||
},
|
" verbosity = logging.INFO, \n",
|
||||||
{
|
" X = X_train, \n",
|
||||||
"cell_type": "code",
|
" y = y_train,\n",
|
||||||
"execution_count": null,
|
" preprocess=True,\n",
|
||||||
"metadata": {},
|
" enable_onnx_compatible_models=True,\n",
|
||||||
"outputs": [],
|
" path = project_folder)"
|
||||||
"source": [
|
],
|
||||||
"local_run"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"source": [
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"## Results"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"#### Widget for Monitoring Runs\n",
|
"source": [
|
||||||
"\n",
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
],
|
||||||
"\n",
|
"cell_type": "code"
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"local_run"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"from azureml.widgets import RunDetails\n",
|
},
|
||||||
"RunDetails(local_run).show() "
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Results"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"### Retrieve the Best ONNX Model\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
|
"source": [
|
||||||
"\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
|
"\n",
|
||||||
]
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"cell_type": "code",
|
],
|
||||||
"execution_count": null,
|
"cell_type": "markdown"
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)"
|
"outputs": [],
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"source": [
|
||||||
{
|
"from azureml.widgets import RunDetails\n",
|
||||||
"cell_type": "markdown",
|
"RunDetails(local_run).show() "
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"### Save the best ONNX model"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"### Retrieve the Best ONNX Model\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
|
||||||
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
],
|
||||||
"onnx_fl_path = \"./best_model.onnx\"\n",
|
"cell_type": "markdown"
|
||||||
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)"
|
||||||
"### Predict with the ONNX model, using onnxruntime package"
|
],
|
||||||
]
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"### Save the best ONNX model"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"import sys\n",
|
},
|
||||||
"import json\n",
|
{
|
||||||
"from azureml.automl.core.onnx_convert import OnnxConvertConstants\n",
|
"metadata": {},
|
||||||
"\n",
|
"outputs": [],
|
||||||
"if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n",
|
"execution_count": null,
|
||||||
" python_version_compatible = True\n",
|
"source": [
|
||||||
"else:\n",
|
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
||||||
" python_version_compatible = False\n",
|
"onnx_fl_path = \"./best_model.onnx\"\n",
|
||||||
"\n",
|
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
||||||
"try:\n",
|
],
|
||||||
" import onnxruntime\n",
|
"cell_type": "code"
|
||||||
" from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n",
|
},
|
||||||
" onnxrt_present = True\n",
|
{
|
||||||
"except ImportError:\n",
|
"metadata": {},
|
||||||
" onnxrt_present = False\n",
|
"source": [
|
||||||
"\n",
|
"### Predict with the ONNX model, using onnxruntime package"
|
||||||
"def get_onnx_res(run):\n",
|
],
|
||||||
" res_path = 'onnx_resource.json'\n",
|
"cell_type": "markdown"
|
||||||
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
|
},
|
||||||
" with open(res_path) as f:\n",
|
{
|
||||||
" onnx_res = json.load(f)\n",
|
"metadata": {},
|
||||||
" return onnx_res\n",
|
"outputs": [],
|
||||||
"\n",
|
"execution_count": null,
|
||||||
"if onnxrt_present and python_version_compatible: \n",
|
"source": [
|
||||||
" mdl_bytes = onnx_mdl.SerializeToString()\n",
|
"import sys\n",
|
||||||
" onnx_res = get_onnx_res(best_run)\n",
|
"import json\n",
|
||||||
"\n",
|
"from azureml.automl.core.onnx_convert import OnnxConvertConstants\n",
|
||||||
" onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n",
|
"\n",
|
||||||
" pred_onnx, pred_prob_onnx = onnxrt_helper.predict(X_test)\n",
|
"if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n",
|
||||||
"\n",
|
" python_version_compatible = True\n",
|
||||||
" print(pred_onnx)\n",
|
"else:\n",
|
||||||
" print(pred_prob_onnx)\n",
|
" python_version_compatible = False\n",
|
||||||
"else:\n",
|
"\n",
|
||||||
" if not python_version_compatible:\n",
|
"try:\n",
|
||||||
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
" import onnxruntime\n",
|
||||||
" if not onnxrt_present:\n",
|
" from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n",
|
||||||
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
" onnxrt_present = True\n",
|
||||||
]
|
"except ImportError:\n",
|
||||||
},
|
" onnxrt_present = False\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"def get_onnx_res(run):\n",
|
||||||
"execution_count": null,
|
" res_path = 'onnx_resource.json'\n",
|
||||||
"metadata": {},
|
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
|
||||||
"outputs": [],
|
" with open(res_path) as f:\n",
|
||||||
"source": []
|
" onnx_res = json.load(f)\n",
|
||||||
}
|
" return onnx_res\n",
|
||||||
],
|
"\n",
|
||||||
"metadata": {
|
"if onnxrt_present and python_version_compatible: \n",
|
||||||
"authors": [
|
" mdl_bytes = onnx_mdl.SerializeToString()\n",
|
||||||
{
|
" onnx_res = get_onnx_res(best_run)\n",
|
||||||
"name": "savitam"
|
"\n",
|
||||||
}
|
" onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n",
|
||||||
],
|
" pred_onnx, pred_prob_onnx = onnxrt_helper.predict(X_test)\n",
|
||||||
"kernelspec": {
|
"\n",
|
||||||
"display_name": "Python 3.6",
|
" print(pred_onnx)\n",
|
||||||
"language": "python",
|
" print(pred_prob_onnx)\n",
|
||||||
"name": "python36"
|
"else:\n",
|
||||||
},
|
" if not python_version_compatible:\n",
|
||||||
"language_info": {
|
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
||||||
"codemirror_mode": {
|
" if not onnxrt_present:\n",
|
||||||
"name": "ipython",
|
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
||||||
"version": 3
|
],
|
||||||
},
|
"cell_type": "code"
|
||||||
"file_extension": ".py",
|
},
|
||||||
"mimetype": "text/x-python",
|
{
|
||||||
"name": "python",
|
"metadata": {},
|
||||||
"nbconvert_exporter": "python",
|
"outputs": [],
|
||||||
"pygments_lexer": "ipython3",
|
"execution_count": null,
|
||||||
"version": "3.6.6"
|
"source": [],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,9 +1,9 @@
|
|||||||
name: auto-ml-classification-with-onnx
|
name: auto-ml-classification-with-onnx
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
- onnxruntime
|
- onnxruntime
|
||||||
|
|||||||
@@ -1,399 +1,399 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
},
|
||||||
"\n",
|
"authors": [
|
||||||
"Licensed under the MIT License."
|
{
|
||||||
]
|
"name": "savitam"
|
||||||
},
|
}
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"language_info": {
|
||||||
"metadata": {},
|
"mimetype": "text/x-python",
|
||||||
"source": [
|
"codemirror_mode": {
|
||||||
""
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Automated Machine Learning\n",
|
"version": "3.6.6"
|
||||||
"_**Classification using whitelist models**_\n",
|
}
|
||||||
"\n",
|
},
|
||||||
"## Contents\n",
|
"nbformat": 4,
|
||||||
"1. [Introduction](#Introduction)\n",
|
"cells": [
|
||||||
"1. [Setup](#Setup)\n",
|
{
|
||||||
"1. [Data](#Data)\n",
|
"metadata": {},
|
||||||
"1. [Train](#Train)\n",
|
"source": [
|
||||||
"1. [Results](#Results)\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"1. [Test](#Test)"
|
"\n",
|
||||||
]
|
"Licensed under the MIT License."
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "markdown"
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"## Introduction\n",
|
"source": [
|
||||||
"\n",
|
""
|
||||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
],
|
||||||
"\n",
|
"cell_type": "markdown"
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
},
|
||||||
"This notebooks shows how can automl can be trained on a a selected list of models,see the readme.md for the models.\n",
|
{
|
||||||
"This trains the model exclusively on tensorflow based models.\n",
|
"metadata": {},
|
||||||
"\n",
|
"source": [
|
||||||
"In this notebook you will learn how to:\n",
|
"# Automated Machine Learning\n",
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"_**Classification using whitelist models**_\n",
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"\n",
|
||||||
"3. Train the model on a whilelisted models using local compute. \n",
|
"## Contents\n",
|
||||||
"4. Explore the results.\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"5. Test the best fitted model."
|
"1. [Setup](#Setup)\n",
|
||||||
]
|
"1. [Data](#Data)\n",
|
||||||
},
|
"1. [Train](#Train)\n",
|
||||||
{
|
"1. [Results](#Results)\n",
|
||||||
"cell_type": "markdown",
|
"1. [Test](#Test)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"## Setup\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Introduction\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"outputs": [],
|
"This notebooks shows how can automl can be trained on a a selected list of models,see the readme.md for the models.\n",
|
||||||
"source": [
|
"This trains the model exclusively on tensorflow based models.\n",
|
||||||
"#Note: This notebook will install tensorflow if not already installed in the enviornment..\n",
|
"\n",
|
||||||
"import logging\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"import numpy as np\n",
|
"3. Train the model on a whilelisted models using local compute. \n",
|
||||||
"import pandas as pd\n",
|
"4. Explore the results.\n",
|
||||||
"from sklearn import datasets\n",
|
"5. Test the best fitted model."
|
||||||
"\n",
|
],
|
||||||
"import azureml.core\n",
|
"cell_type": "markdown"
|
||||||
"from azureml.core.experiment import Experiment\n",
|
},
|
||||||
"from azureml.core.workspace import Workspace\n",
|
{
|
||||||
"import sys\n",
|
"metadata": {},
|
||||||
"whitelist_models=[\"LightGBM\"]\n",
|
"source": [
|
||||||
"if \"3.7\" != sys.version[0:3]:\n",
|
"## Setup\n",
|
||||||
" try:\n",
|
"\n",
|
||||||
" import tensorflow as tf1\n",
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
" except ImportError:\n",
|
],
|
||||||
" from pip._internal import main\n",
|
"cell_type": "markdown"
|
||||||
" main(['install', 'tensorflow>=1.10.0,<=1.12.0'])\n",
|
},
|
||||||
" logging.getLogger().setLevel(logging.ERROR)\n",
|
{
|
||||||
" whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"]\n",
|
"metadata": {},
|
||||||
"\n",
|
"outputs": [],
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"execution_count": null,
|
||||||
]
|
"source": [
|
||||||
},
|
"#Note: This notebook will install tensorflow if not already installed in the enviornment..\n",
|
||||||
{
|
"import logging\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"from matplotlib import pyplot as plt\n",
|
||||||
"metadata": {},
|
"import numpy as np\n",
|
||||||
"outputs": [],
|
"import pandas as pd\n",
|
||||||
"source": [
|
"from sklearn import datasets\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"\n",
|
||||||
"\n",
|
"import azureml.core\n",
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"experiment_name = 'automl-local-whitelist'\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"project_folder = './sample_projects/automl-local-whitelist'\n",
|
"import sys\n",
|
||||||
"\n",
|
"whitelist_models=[\"LightGBM\"]\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"if \"3.7\" != sys.version[0:3]:\n",
|
||||||
"\n",
|
" try:\n",
|
||||||
"output = {}\n",
|
" import tensorflow as tf1\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
" except ImportError:\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
" from pip._internal import main\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
" main(['install', 'tensorflow>=1.10.0,<=1.12.0'])\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
" logging.getLogger().setLevel(logging.ERROR)\n",
|
||||||
"output['Location'] = ws.location\n",
|
" whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"]\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"\n",
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
],
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"cell_type": "code"
|
||||||
"outputDf.T"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"ws = Workspace.from_config()\n",
|
||||||
"## Data\n",
|
"\n",
|
||||||
"\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
"experiment_name = 'automl-local-whitelist'\n",
|
||||||
]
|
"project_folder = './sample_projects/automl-local-whitelist'\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"output = {}\n",
|
||||||
"metadata": {},
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"outputs": [],
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"source": [
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"\n",
|
"output['Location'] = ws.location\n",
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"y_train = digits.target[100:]"
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
]
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
},
|
"outputDf.T"
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"cell_type": "code"
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Train\n",
|
"metadata": {},
|
||||||
"\n",
|
"source": [
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
"|-|-|\n",
|
],
|
||||||
"|**task**|classification or regression|\n",
|
"cell_type": "markdown"
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
},
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
{
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"metadata": {},
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"outputs": [],
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"execution_count": null,
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"source": [
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
|
"\n",
|
||||||
]
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
},
|
"X_train = digits.data[100:,:]\n",
|
||||||
{
|
"y_train = digits.target[100:]"
|
||||||
"cell_type": "code",
|
],
|
||||||
"execution_count": null,
|
"cell_type": "code"
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"source": [
|
||||||
" debug_log = 'automl_errors.log',\n",
|
"## Train\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
"\n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
" iterations = 10,\n",
|
"\n",
|
||||||
" verbosity = logging.INFO,\n",
|
"|Property|Description|\n",
|
||||||
" X = X_train, \n",
|
"|-|-|\n",
|
||||||
" y = y_train,\n",
|
"|**task**|classification or regression|\n",
|
||||||
" enable_tf=True,\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
" whitelist_models=whitelist_models,\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
" path = project_folder)"
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
]
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
},
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
{
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"cell_type": "markdown",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"metadata": {},
|
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
|
||||||
"source": [
|
],
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"cell_type": "markdown"
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"outputs": [],
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"source": [
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
" iteration_timeout_minutes = 60,\n",
|
||||||
]
|
" iterations = 10,\n",
|
||||||
},
|
" verbosity = logging.INFO,\n",
|
||||||
{
|
" X = X_train, \n",
|
||||||
"cell_type": "code",
|
" y = y_train,\n",
|
||||||
"execution_count": null,
|
" enable_tf=True,\n",
|
||||||
"metadata": {},
|
" whitelist_models=whitelist_models,\n",
|
||||||
"outputs": [],
|
" path = project_folder)"
|
||||||
"source": [
|
],
|
||||||
"local_run"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"source": [
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"## Results"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"#### Widget for Monitoring Runs\n",
|
"source": [
|
||||||
"\n",
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
],
|
||||||
"\n",
|
"cell_type": "code"
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"local_run"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"from azureml.widgets import RunDetails\n",
|
},
|
||||||
"RunDetails(local_run).show() "
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Results"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Retrieve All Child Runs\n",
|
"metadata": {},
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"source": [
|
||||||
]
|
"#### Widget for Monitoring Runs\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"children = list(local_run.get_children())\n",
|
{
|
||||||
"metricslist = {}\n",
|
"metadata": {},
|
||||||
"for run in children:\n",
|
"outputs": [],
|
||||||
" properties = run.get_properties()\n",
|
"execution_count": null,
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
"source": [
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"RunDetails(local_run).show() "
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
],
|
||||||
"rundata"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"#### Retrieve All Child Runs\n",
|
||||||
"### Retrieve the Best Model\n",
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"\n",
|
],
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"children = list(local_run.get_children())\n",
|
||||||
"source": [
|
"metricslist = {}\n",
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"for run in children:\n",
|
||||||
"print(best_run)\n",
|
" properties = run.get_properties()\n",
|
||||||
"print(fitted_model)"
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
]
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"cell_type": "markdown",
|
"rundata"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
},
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"### Retrieve the Best Model\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"lookup_metric = \"log_loss\"\n",
|
{
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"metadata": {},
|
||||||
"print(best_run)\n",
|
"outputs": [],
|
||||||
"print(fitted_model)"
|
"execution_count": null,
|
||||||
]
|
"source": [
|
||||||
},
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
{
|
"print(best_run)\n",
|
||||||
"cell_type": "markdown",
|
"print(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"#### Model from a Specific Iteration\n",
|
},
|
||||||
"Show the run and the model from the third iteration:"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"cell_type": "code",
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"iteration = 3\n",
|
"metadata": {},
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
"outputs": [],
|
||||||
"print(third_run)\n",
|
"execution_count": null,
|
||||||
"print(third_model)"
|
"source": [
|
||||||
]
|
"lookup_metric = \"log_loss\"\n",
|
||||||
},
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
{
|
"print(best_run)\n",
|
||||||
"cell_type": "markdown",
|
"print(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"## Test\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Load Test Data"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Model from a Specific Iteration\n",
|
||||||
{
|
"Show the run and the model from the third iteration:"
|
||||||
"cell_type": "code",
|
],
|
||||||
"execution_count": null,
|
"cell_type": "markdown"
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"digits = datasets.load_digits()\n",
|
"outputs": [],
|
||||||
"X_test = digits.data[:10, :]\n",
|
"execution_count": null,
|
||||||
"y_test = digits.target[:10]\n",
|
"source": [
|
||||||
"images = digits.images[:10]"
|
"iteration = 3\n",
|
||||||
]
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
},
|
"print(third_run)\n",
|
||||||
{
|
"print(third_model)"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "code"
|
||||||
"source": [
|
},
|
||||||
"#### Testing Our Best Fitted Model\n",
|
{
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Test\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"#### Load Test Data"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"# Randomly select digits and test.\n",
|
"metadata": {},
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"outputs": [],
|
||||||
" print(index)\n",
|
"execution_count": null,
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
"source": [
|
||||||
" label = y_test[index]\n",
|
"digits = datasets.load_digits()\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
" fig = plt.figure(1, figsize = (3,3))\n",
|
"y_test = digits.target[:10]\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
"images = digits.images[:10]"
|
||||||
" ax1.set_title(title)\n",
|
],
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
"cell_type": "code"
|
||||||
" plt.show()"
|
},
|
||||||
]
|
{
|
||||||
}
|
"metadata": {},
|
||||||
],
|
"source": [
|
||||||
"metadata": {
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"authors": [
|
"We will try to predict 2 digits and see how our model works."
|
||||||
{
|
],
|
||||||
"name": "savitam"
|
"cell_type": "markdown"
|
||||||
}
|
},
|
||||||
],
|
{
|
||||||
"kernelspec": {
|
"metadata": {},
|
||||||
"display_name": "Python 3.6",
|
"outputs": [],
|
||||||
"language": "python",
|
"execution_count": null,
|
||||||
"name": "python36"
|
"source": [
|
||||||
},
|
"# Randomly select digits and test.\n",
|
||||||
"language_info": {
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
"codemirror_mode": {
|
" print(index)\n",
|
||||||
"name": "ipython",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
"version": 3
|
" label = y_test[index]\n",
|
||||||
},
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
"file_extension": ".py",
|
" fig = plt.figure(1, figsize = (3,3))\n",
|
||||||
"mimetype": "text/x-python",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
"name": "python",
|
" ax1.set_title(title)\n",
|
||||||
"nbconvert_exporter": "python",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
"pygments_lexer": "ipython3",
|
" plt.show()"
|
||||||
"version": "3.6.6"
|
],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-classification-with-whitelisting
|
name: auto-ml-classification-with-whitelisting
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,482 +1,482 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
},
|
||||||
"\n",
|
"authors": [
|
||||||
"Licensed under the MIT License."
|
{
|
||||||
]
|
"name": "savitam"
|
||||||
},
|
}
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"language_info": {
|
||||||
"metadata": {},
|
"mimetype": "text/x-python",
|
||||||
"source": [
|
"codemirror_mode": {
|
||||||
""
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Automated Machine Learning\n",
|
"version": "3.6.6"
|
||||||
"_**Classification with Local Compute**_\n",
|
}
|
||||||
"\n",
|
},
|
||||||
"## Contents\n",
|
"nbformat": 4,
|
||||||
"1. [Introduction](#Introduction)\n",
|
"cells": [
|
||||||
"1. [Setup](#Setup)\n",
|
{
|
||||||
"1. [Data](#Data)\n",
|
"metadata": {},
|
||||||
"1. [Train](#Train)\n",
|
"source": [
|
||||||
"1. [Results](#Results)\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"1. [Test](#Test)\n",
|
"\n",
|
||||||
"\n"
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "markdown"
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
""
|
||||||
"\n",
|
],
|
||||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"In this notebook you will learn how to:\n",
|
"source": [
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"# Automated Machine Learning\n",
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"_**Classification with Local Compute**_\n",
|
||||||
"3. Train the model using local compute.\n",
|
"\n",
|
||||||
"4. Explore the results.\n",
|
"## Contents\n",
|
||||||
"5. Test the best fitted model."
|
"1. [Introduction](#Introduction)\n",
|
||||||
]
|
"1. [Setup](#Setup)\n",
|
||||||
},
|
"1. [Data](#Data)\n",
|
||||||
{
|
"1. [Train](#Train)\n",
|
||||||
"cell_type": "markdown",
|
"1. [Results](#Results)\n",
|
||||||
"metadata": {},
|
"1. [Test](#Test)\n",
|
||||||
"source": [
|
"\n"
|
||||||
"## Setup\n",
|
],
|
||||||
"\n",
|
"cell_type": "markdown"
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"## Introduction\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"import logging\n",
|
"\n",
|
||||||
"\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"import numpy as np\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"import pandas as pd\n",
|
"3. Train the model using local compute.\n",
|
||||||
"from sklearn import datasets\n",
|
"4. Explore the results.\n",
|
||||||
"\n",
|
"5. Test the best fitted model."
|
||||||
"import azureml.core\n",
|
],
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"cell_type": "markdown"
|
||||||
"from azureml.core.workspace import Workspace\n",
|
},
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Setup\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
"source": [
|
],
|
||||||
"Accessing the Azure ML workspace requires authentication with Azure.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"The default authentication is interactive authentication using the default tenant. Executing the `ws = Workspace.from_config()` line in the cell below will prompt for authentication the first time that it is run.\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"If you have multiple Azure tenants, you can specify the tenant by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
|
"outputs": [],
|
||||||
"\n",
|
"execution_count": null,
|
||||||
"```\n",
|
"source": [
|
||||||
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
|
"import logging\n",
|
||||||
"auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n",
|
"\n",
|
||||||
"ws = Workspace.from_config(auth = auth)\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"```\n",
|
"import numpy as np\n",
|
||||||
"\n",
|
"import pandas as pd\n",
|
||||||
"If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
|
"from sklearn import datasets\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```\n",
|
"import azureml.core\n",
|
||||||
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"ws = Workspace.from_config(auth = auth)\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"```\n",
|
],
|
||||||
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"Accessing the Azure ML workspace requires authentication with Azure.\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"The default authentication is interactive authentication using the default tenant. Executing the `ws = Workspace.from_config()` line in the cell below will prompt for authentication the first time that it is run.\n",
|
||||||
"source": [
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"If you have multiple Azure tenants, you can specify the tenant by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"```\n",
|
||||||
"experiment_name = 'automl-classification'\n",
|
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
|
||||||
"project_folder = './sample_projects/automl-classification'\n",
|
"auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n",
|
||||||
"\n",
|
"ws = Workspace.from_config(auth = auth)\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
"output = {}\n",
|
"If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"```\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n",
|
||||||
"output['Location'] = ws.location\n",
|
"ws = Workspace.from_config(auth = auth)\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"```\n",
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
],
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"cell_type": "markdown"
|
||||||
"outputDf.T"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"ws = Workspace.from_config()\n",
|
||||||
"## Data\n",
|
"\n",
|
||||||
"\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
"experiment_name = 'automl-classification'\n",
|
||||||
]
|
"project_folder = './sample_projects/automl-classification'\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"output = {}\n",
|
||||||
"metadata": {},
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"outputs": [],
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"source": [
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"\n",
|
"output['Location'] = ws.location\n",
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"y_train = digits.target[100:]"
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
]
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
},
|
"outputDf.T"
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"cell_type": "code"
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Train\n",
|
"metadata": {},
|
||||||
"\n",
|
"source": [
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
"|-|-|\n",
|
],
|
||||||
"|**task**|classification or regression|\n",
|
"cell_type": "markdown"
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
},
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
{
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"metadata": {},
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"outputs": [],
|
||||||
"|\n",
|
"execution_count": null,
|
||||||
"\n",
|
"source": [
|
||||||
"Automated machine learning trains multiple machine learning pipelines. Each pipelines training is known as an iteration.\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"* You can specify a maximum number of iterations using the `iterations` parameter.\n",
|
"\n",
|
||||||
"* You can specify a maximum time for the run using the `experiment_timeout_minutes` parameter.\n",
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"* If you specify neither the `iterations` nor the `experiment_timeout_minutes`, automated ML keeps running iterations while it continues to see improvements in the scores.\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"\n",
|
"y_train = digits.target[100:]"
|
||||||
"The following example doesn't specify `iterations` or `experiment_timeout_minutes` and so runs until the scores stop improving.\n"
|
],
|
||||||
]
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"## Train\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
"|Property|Description|\n",
|
||||||
" X = X_train, \n",
|
"|-|-|\n",
|
||||||
" y = y_train,\n",
|
"|**task**|classification or regression|\n",
|
||||||
" n_cross_validations = 3)"
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
]
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
},
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
{
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"cell_type": "markdown",
|
"|\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"Automated machine learning trains multiple machine learning pipelines. Each pipelines training is known as an iteration.\n",
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"* You can specify a maximum number of iterations using the `iterations` parameter.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"* You can specify a maximum time for the run using the `experiment_timeout_minutes` parameter.\n",
|
||||||
]
|
"* If you specify neither the `iterations` nor the `experiment_timeout_minutes`, automated ML keeps running iterations while it continues to see improvements in the scores.\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"The following example doesn't specify `iterations` or `experiment_timeout_minutes` and so runs until the scores stop improving.\n"
|
||||||
"cell_type": "code",
|
],
|
||||||
"execution_count": null,
|
"cell_type": "markdown"
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"outputs": [],
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"source": [
|
||||||
{
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"cell_type": "code",
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"execution_count": null,
|
" X = X_train, \n",
|
||||||
"metadata": {},
|
" y = y_train,\n",
|
||||||
"outputs": [],
|
" n_cross_validations = 3)"
|
||||||
"source": [
|
],
|
||||||
"local_run"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"source": [
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"execution_count": null,
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"local_run = local_run.continue_experiment(X = X_train, \n",
|
],
|
||||||
" y = y_train, \n",
|
"cell_type": "code"
|
||||||
" show_output = True,\n",
|
},
|
||||||
" iterations = 5)"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"local_run"
|
||||||
"source": [
|
],
|
||||||
"## Results"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
||||||
"source": [
|
],
|
||||||
"#### Widget for Monitoring Runs\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"outputs": [],
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"source": [
|
||||||
{
|
"local_run = local_run.continue_experiment(X = X_train, \n",
|
||||||
"cell_type": "code",
|
" y = y_train, \n",
|
||||||
"execution_count": null,
|
" show_output = True,\n",
|
||||||
"metadata": {},
|
" iterations = 5)"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"from azureml.widgets import RunDetails\n",
|
},
|
||||||
"RunDetails(local_run).show() "
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Results"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Retrieve All Child Runs\n",
|
"metadata": {},
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"source": [
|
||||||
]
|
"#### Widget for Monitoring Runs\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"children = list(local_run.get_children())\n",
|
{
|
||||||
"metricslist = {}\n",
|
"metadata": {},
|
||||||
"for run in children:\n",
|
"outputs": [],
|
||||||
" properties = run.get_properties()\n",
|
"execution_count": null,
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
"source": [
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"RunDetails(local_run).show() "
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
],
|
||||||
"rundata"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"#### Retrieve All Child Runs\n",
|
||||||
"### Retrieve the Best Model\n",
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"\n",
|
],
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"children = list(local_run.get_children())\n",
|
||||||
"source": [
|
"metricslist = {}\n",
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"for run in children:\n",
|
||||||
"print(best_run)"
|
" properties = run.get_properties()\n",
|
||||||
]
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
},
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"metadata": {},
|
"rundata"
|
||||||
"source": [
|
],
|
||||||
"#### Print the properties of the model\n",
|
"cell_type": "code"
|
||||||
"The fitted_model is a python object and you can read the different properties of the object.\n",
|
},
|
||||||
"The following shows printing hyperparameters for each step in the pipeline."
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"### Retrieve the Best Model\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"from pprint import pprint\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"def print_model(model, prefix=\"\"):\n",
|
"outputs": [],
|
||||||
" for step in model.steps:\n",
|
"execution_count": null,
|
||||||
" print(prefix + step[0])\n",
|
"source": [
|
||||||
" if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
" pprint({'estimators': list(e[0] for e in step[1].estimators), 'weights': step[1].weights})\n",
|
"print(best_run)"
|
||||||
" print()\n",
|
],
|
||||||
" for estimator in step[1].estimators:\n",
|
"cell_type": "code"
|
||||||
" print_model(estimator[1], estimator[0]+ ' - ')\n",
|
},
|
||||||
" elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):\n",
|
{
|
||||||
" print(\"\\nMeta Learner\")\n",
|
"metadata": {},
|
||||||
" pprint(step[1]._meta_learner)\n",
|
"source": [
|
||||||
" print()\n",
|
"#### Print the properties of the model\n",
|
||||||
" for estimator in step[1]._base_learners:\n",
|
"The fitted_model is a python object and you can read the different properties of the object.\n",
|
||||||
" print_model(estimator[1], estimator[0]+ ' - ')\n",
|
"The following shows printing hyperparameters for each step in the pipeline."
|
||||||
" else:\n",
|
],
|
||||||
" pprint(step[1].get_params())\n",
|
"cell_type": "markdown"
|
||||||
" print()\n",
|
},
|
||||||
" \n",
|
{
|
||||||
"print_model(fitted_model)"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"from pprint import pprint\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"def print_model(model, prefix=\"\"):\n",
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
" for step in model.steps:\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
" print(prefix + step[0])\n",
|
||||||
]
|
" if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):\n",
|
||||||
},
|
" pprint({'estimators': list(e[0] for e in step[1].estimators), 'weights': step[1].weights})\n",
|
||||||
{
|
" print()\n",
|
||||||
"cell_type": "code",
|
" for estimator in step[1].estimators:\n",
|
||||||
"execution_count": null,
|
" print_model(estimator[1], estimator[0]+ ' - ')\n",
|
||||||
"metadata": {},
|
" elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):\n",
|
||||||
"outputs": [],
|
" print(\"\\nMeta Learner\")\n",
|
||||||
"source": [
|
" pprint(step[1]._meta_learner)\n",
|
||||||
"lookup_metric = \"log_loss\"\n",
|
" print()\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
" for estimator in step[1]._base_learners:\n",
|
||||||
"print(best_run)"
|
" print_model(estimator[1], estimator[0]+ ' - ')\n",
|
||||||
]
|
" else:\n",
|
||||||
},
|
" pprint(step[1].get_params())\n",
|
||||||
{
|
" print()\n",
|
||||||
"cell_type": "code",
|
" \n",
|
||||||
"execution_count": null,
|
"print_model(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "code"
|
||||||
"source": [
|
},
|
||||||
"print_model(fitted_model)"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"cell_type": "markdown",
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"#### Model from a Specific Iteration\n",
|
},
|
||||||
"Show the run and the model from the third iteration:"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"metadata": {},
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"outputs": [],
|
"print(best_run)"
|
||||||
"source": [
|
],
|
||||||
"iteration = 3\n",
|
"cell_type": "code"
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
},
|
||||||
"print(third_run)"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"print_model(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "code"
|
||||||
"source": [
|
},
|
||||||
"print_model(third_model)"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"#### Model from a Specific Iteration\n",
|
||||||
"cell_type": "markdown",
|
"Show the run and the model from the third iteration:"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"## Test \n",
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Load Test Data"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"iteration = 3\n",
|
||||||
"execution_count": null,
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"metadata": {},
|
"print(third_run)"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"digits = datasets.load_digits()\n",
|
},
|
||||||
"X_test = digits.data[:10, :]\n",
|
{
|
||||||
"y_test = digits.target[:10]\n",
|
"metadata": {},
|
||||||
"images = digits.images[:10]"
|
"outputs": [],
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"source": [
|
||||||
{
|
"print_model(third_model)"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "code"
|
||||||
"source": [
|
},
|
||||||
"#### Testing Our Best Fitted Model\n",
|
{
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"## Test \n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"#### Load Test Data"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"# Randomly select digits and test.\n",
|
"metadata": {},
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"outputs": [],
|
||||||
" print(index)\n",
|
"execution_count": null,
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
"source": [
|
||||||
" label = y_test[index]\n",
|
"digits = datasets.load_digits()\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
" fig = plt.figure(1, figsize = (3,3))\n",
|
"y_test = digits.target[:10]\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
"images = digits.images[:10]"
|
||||||
" ax1.set_title(title)\n",
|
],
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
"cell_type": "code"
|
||||||
" plt.show()"
|
},
|
||||||
]
|
{
|
||||||
}
|
"metadata": {},
|
||||||
],
|
"source": [
|
||||||
"metadata": {
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"authors": [
|
"We will try to predict 2 digits and see how our model works."
|
||||||
{
|
],
|
||||||
"name": "savitam"
|
"cell_type": "markdown"
|
||||||
}
|
},
|
||||||
],
|
{
|
||||||
"kernelspec": {
|
"metadata": {},
|
||||||
"display_name": "Python 3.6",
|
"outputs": [],
|
||||||
"language": "python",
|
"execution_count": null,
|
||||||
"name": "python36"
|
"source": [
|
||||||
},
|
"# Randomly select digits and test.\n",
|
||||||
"language_info": {
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
"codemirror_mode": {
|
" print(index)\n",
|
||||||
"name": "ipython",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
"version": 3
|
" label = y_test[index]\n",
|
||||||
},
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
"file_extension": ".py",
|
" fig = plt.figure(1, figsize = (3,3))\n",
|
||||||
"mimetype": "text/x-python",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
"name": "python",
|
" ax1.set_title(title)\n",
|
||||||
"nbconvert_exporter": "python",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
"pygments_lexer": "ipython3",
|
" plt.show()"
|
||||||
"version": "3.6.6"
|
],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-classification
|
name: auto-ml-classification
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-dataprep-remote-execution
|
name: auto-ml-dataprep-remote-execution
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,417 +1,417 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
""
|
},
|
||||||
]
|
"authors": [
|
||||||
},
|
{
|
||||||
{
|
"name": "savitam"
|
||||||
"cell_type": "markdown",
|
}
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"language_info": {
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"mimetype": "text/x-python",
|
||||||
"\n",
|
"codemirror_mode": {
|
||||||
"Licensed under the MIT License."
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Automated Machine Learning\n",
|
"version": "3.6.5"
|
||||||
"_**Prepare Data using `azureml.dataprep` for Local Execution**_\n",
|
}
|
||||||
"\n",
|
},
|
||||||
"## Contents\n",
|
"nbformat": 4,
|
||||||
"1. [Introduction](#Introduction)\n",
|
"cells": [
|
||||||
"1. [Setup](#Setup)\n",
|
{
|
||||||
"1. [Data](#Data)\n",
|
"metadata": {},
|
||||||
"1. [Train](#Train)\n",
|
"source": [
|
||||||
"1. [Results](#Results)\n",
|
""
|
||||||
"1. [Test](#Test)"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"## Introduction\n",
|
"\n",
|
||||||
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
|
"Licensed under the MIT License."
|
||||||
"\n",
|
],
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"In this notebook you will learn how to:\n",
|
{
|
||||||
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
"metadata": {},
|
||||||
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
"source": [
|
||||||
"3. Pass the `Dataflow` to AutoML for a remote run."
|
"# Automated Machine Learning\n",
|
||||||
]
|
"_**Prepare Data using `azureml.dataprep` for Local Execution**_\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"## Contents\n",
|
||||||
"cell_type": "markdown",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"metadata": {},
|
"1. [Setup](#Setup)\n",
|
||||||
"source": [
|
"1. [Data](#Data)\n",
|
||||||
"## Setup\n",
|
"1. [Train](#Train)\n",
|
||||||
"\n",
|
"1. [Results](#Results)\n",
|
||||||
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "markdown"
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"## Introduction\n",
|
||||||
]
|
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"In this notebook you will learn how to:\n",
|
||||||
"metadata": {},
|
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
||||||
"outputs": [],
|
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
||||||
"source": [
|
"3. Pass the `Dataflow` to AutoML for a remote run."
|
||||||
"import logging\n",
|
],
|
||||||
"\n",
|
"cell_type": "markdown"
|
||||||
"import pandas as pd\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"import azureml.core\n",
|
"metadata": {},
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"source": [
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"## Setup\n",
|
||||||
"import azureml.dataprep as dprep\n",
|
"\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "markdown"
|
||||||
{
|
},
|
||||||
"cell_type": "code",
|
{
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
"source": [
|
],
|
||||||
"ws = Workspace.from_config()\n",
|
"cell_type": "markdown"
|
||||||
" \n",
|
},
|
||||||
"# choose a name for experiment\n",
|
{
|
||||||
"experiment_name = 'automl-dataprep-local'\n",
|
"metadata": {},
|
||||||
"# project folder\n",
|
"outputs": [],
|
||||||
"project_folder = './sample_projects/automl-dataprep-local'\n",
|
"execution_count": null,
|
||||||
" \n",
|
"source": [
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"import logging\n",
|
||||||
" \n",
|
"\n",
|
||||||
"output = {}\n",
|
"import pandas as pd\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"import azureml.core\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"output['Location'] = ws.location\n",
|
"import azureml.dataprep as dprep\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
],
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"cell_type": "code"
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
},
|
||||||
"outputDf.T"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"ws = Workspace.from_config()\n",
|
||||||
"source": [
|
" \n",
|
||||||
"## Data"
|
"# choose a name for experiment\n",
|
||||||
]
|
"experiment_name = 'automl-dataprep-local'\n",
|
||||||
},
|
"# project folder\n",
|
||||||
{
|
"project_folder = './sample_projects/automl-dataprep-local'\n",
|
||||||
"cell_type": "code",
|
" \n",
|
||||||
"execution_count": null,
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"metadata": {},
|
" \n",
|
||||||
"outputs": [],
|
"output = {}\n",
|
||||||
"source": [
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"# and convert column types manually.\n",
|
"output['Location'] = ws.location\n",
|
||||||
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"dflow.get_profile()"
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
]
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
},
|
"outputDf.T"
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "code"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
"## Data"
|
||||||
"dflow = dflow.drop_nulls('Primary Type')\n",
|
],
|
||||||
"dflow.head(5)"
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"outputs": [],
|
||||||
"metadata": {},
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"### Review the Data Preparation Result\n",
|
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
||||||
"\n",
|
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
|
||||||
"\n",
|
"# and convert column types manually.\n",
|
||||||
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
||||||
]
|
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
||||||
},
|
"dflow.get_profile()"
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "code"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"outputs": [],
|
||||||
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
"execution_count": null,
|
||||||
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
"source": [
|
||||||
]
|
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
||||||
},
|
"dflow = dflow.drop_nulls('Primary Type')\n",
|
||||||
{
|
"dflow.head(5)"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "code"
|
||||||
"source": [
|
},
|
||||||
"## Train\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"This creates a general AutoML settings object applicable for both local and remote runs."
|
"source": [
|
||||||
]
|
"### Review the Data Preparation Result\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"automl_settings = {\n",
|
{
|
||||||
" \"iteration_timeout_minutes\" : 10,\n",
|
"metadata": {},
|
||||||
" \"iterations\" : 2,\n",
|
"outputs": [],
|
||||||
" \"primary_metric\" : 'AUC_weighted',\n",
|
"execution_count": null,
|
||||||
" \"preprocess\" : True,\n",
|
"source": [
|
||||||
" \"verbosity\" : logging.INFO\n",
|
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
||||||
"}"
|
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "code"
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pass Data with `Dataflow` Objects\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
"This creates a general AutoML settings object applicable for both local and remote runs."
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "markdown"
|
||||||
{
|
},
|
||||||
"cell_type": "code",
|
{
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"outputs": [],
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_settings = {\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" \"iteration_timeout_minutes\" : 10,\n",
|
||||||
" X = X,\n",
|
" \"iterations\" : 2,\n",
|
||||||
" y = y,\n",
|
" \"primary_metric\" : 'AUC_weighted',\n",
|
||||||
" **automl_settings)"
|
" \"preprocess\" : True,\n",
|
||||||
]
|
" \"verbosity\" : logging.INFO\n",
|
||||||
},
|
"}"
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "code"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"### Pass Data with `Dataflow` Objects\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "markdown"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"outputs": [],
|
||||||
"local_run"
|
"execution_count": null,
|
||||||
]
|
"source": [
|
||||||
},
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
{
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"cell_type": "markdown",
|
" X = X,\n",
|
||||||
"metadata": {},
|
" y = y,\n",
|
||||||
"source": [
|
" **automl_settings)"
|
||||||
"## Results"
|
],
|
||||||
]
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"#### Widget for Monitoring Runs\n",
|
"source": [
|
||||||
"\n",
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
],
|
||||||
"\n",
|
"cell_type": "code"
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"local_run"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"from azureml.widgets import RunDetails\n",
|
},
|
||||||
"RunDetails(local_run).show()"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Results"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"#### Retrieve All Child Runs\n",
|
{
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### Widget for Monitoring Runs\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"children = list(local_run.get_children())\n",
|
},
|
||||||
"metricslist = {}\n",
|
{
|
||||||
"for run in children:\n",
|
"metadata": {},
|
||||||
" properties = run.get_properties()\n",
|
"outputs": [],
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
"execution_count": null,
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
"source": [
|
||||||
" \n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"RunDetails(local_run).show()"
|
||||||
"rundata"
|
],
|
||||||
]
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"#### Retrieve All Child Runs\n",
|
||||||
"### Retrieve the Best Model\n",
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"\n",
|
],
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"children = list(local_run.get_children())\n",
|
||||||
"source": [
|
"metricslist = {}\n",
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"for run in children:\n",
|
||||||
"print(best_run)\n",
|
" properties = run.get_properties()\n",
|
||||||
"print(fitted_model)"
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
]
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
},
|
" \n",
|
||||||
{
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"cell_type": "markdown",
|
"rundata"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
},
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"### Retrieve the Best Model\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"lookup_metric = \"log_loss\"\n",
|
{
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"metadata": {},
|
||||||
"print(best_run)\n",
|
"outputs": [],
|
||||||
"print(fitted_model)"
|
"execution_count": null,
|
||||||
]
|
"source": [
|
||||||
},
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
{
|
"print(best_run)\n",
|
||||||
"cell_type": "markdown",
|
"print(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"#### Model from a Specific Iteration\n",
|
},
|
||||||
"Show the run and the model from the first iteration:"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"cell_type": "code",
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"iteration = 0\n",
|
"metadata": {},
|
||||||
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
|
"outputs": [],
|
||||||
"print(best_run)\n",
|
"execution_count": null,
|
||||||
"print(fitted_model)"
|
"source": [
|
||||||
]
|
"lookup_metric = \"log_loss\"\n",
|
||||||
},
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
{
|
"print(best_run)\n",
|
||||||
"cell_type": "markdown",
|
"print(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"## Test\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Load Test Data\n",
|
"metadata": {},
|
||||||
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
"source": [
|
||||||
]
|
"#### Model from a Specific Iteration\n",
|
||||||
},
|
"Show the run and the model from the first iteration:"
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "markdown"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"outputs": [],
|
||||||
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
"execution_count": null,
|
||||||
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
"source": [
|
||||||
]
|
"iteration = 0\n",
|
||||||
},
|
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
|
||||||
{
|
"print(best_run)\n",
|
||||||
"cell_type": "markdown",
|
"print(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"#### Testing Our Best Fitted Model\n",
|
},
|
||||||
"We will use confusion matrix to see how our model works."
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Test\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"#### Load Test Data\n",
|
||||||
"metadata": {},
|
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"from pandas_ml import ConfusionMatrix\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
|
"metadata": {},
|
||||||
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
|
"outputs": [],
|
||||||
"\n",
|
"execution_count": null,
|
||||||
"ypred = fitted_model.predict(X_test)\n",
|
"source": [
|
||||||
"\n",
|
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
||||||
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
|
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
||||||
"\n",
|
],
|
||||||
"print(cm)\n",
|
"cell_type": "code"
|
||||||
"\n",
|
},
|
||||||
"cm.plot()"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
}
|
"source": [
|
||||||
],
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"metadata": {
|
"We will use confusion matrix to see how our model works."
|
||||||
"authors": [
|
],
|
||||||
{
|
"cell_type": "markdown"
|
||||||
"name": "savitam"
|
},
|
||||||
}
|
{
|
||||||
],
|
"metadata": {},
|
||||||
"kernelspec": {
|
"outputs": [],
|
||||||
"display_name": "Python 3.6",
|
"execution_count": null,
|
||||||
"language": "python",
|
"source": [
|
||||||
"name": "python36"
|
"from pandas_ml import ConfusionMatrix\n",
|
||||||
},
|
"\n",
|
||||||
"language_info": {
|
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
|
||||||
"codemirror_mode": {
|
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
|
||||||
"name": "ipython",
|
"\n",
|
||||||
"version": 3
|
"ypred = fitted_model.predict(X_test)\n",
|
||||||
},
|
"\n",
|
||||||
"file_extension": ".py",
|
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
|
||||||
"mimetype": "text/x-python",
|
"\n",
|
||||||
"name": "python",
|
"print(cm)\n",
|
||||||
"nbconvert_exporter": "python",
|
"\n",
|
||||||
"pygments_lexer": "ipython3",
|
"cm.plot()"
|
||||||
"version": "3.6.5"
|
],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-dataprep
|
name: auto-ml-dataprep
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,349 +1,349 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
},
|
||||||
"\n",
|
"authors": [
|
||||||
"Licensed under the MIT License."
|
{
|
||||||
]
|
"name": "savitam"
|
||||||
},
|
}
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"language_info": {
|
||||||
"metadata": {},
|
"mimetype": "text/x-python",
|
||||||
"source": [
|
"codemirror_mode": {
|
||||||
""
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Automated Machine Learning\n",
|
"version": "3.6.6"
|
||||||
"_**Exploring Previous Runs**_\n",
|
}
|
||||||
"\n",
|
},
|
||||||
"## Contents\n",
|
"nbformat": 4,
|
||||||
"1. [Introduction](#Introduction)\n",
|
"cells": [
|
||||||
"1. [Setup](#Setup)\n",
|
{
|
||||||
"1. [Explore](#Explore)\n",
|
"metadata": {},
|
||||||
"1. [Download](#Download)\n",
|
"source": [
|
||||||
"1. [Register](#Register)"
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"Licensed under the MIT License."
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown"
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Introduction\n",
|
"metadata": {},
|
||||||
"In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n",
|
"source": [
|
||||||
"\n",
|
""
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
],
|
||||||
"\n",
|
"cell_type": "markdown"
|
||||||
"In this notebook you will learn how to:\n",
|
},
|
||||||
"1. List all experiments in a workspace.\n",
|
{
|
||||||
"2. List all AutoML runs in an experiment.\n",
|
"metadata": {},
|
||||||
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
"source": [
|
||||||
"4. Download a fitted pipeline for any iteration."
|
"# Automated Machine Learning\n",
|
||||||
]
|
"_**Exploring Previous Runs**_\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"## Contents\n",
|
||||||
"cell_type": "markdown",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"metadata": {},
|
"1. [Setup](#Setup)\n",
|
||||||
"source": [
|
"1. [Explore](#Explore)\n",
|
||||||
"## Setup"
|
"1. [Download](#Download)\n",
|
||||||
]
|
"1. [Register](#Register)"
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "markdown"
|
||||||
"cell_type": "code",
|
},
|
||||||
"execution_count": null,
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"## Introduction\n",
|
||||||
"import pandas as pd\n",
|
"In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n",
|
||||||
"import json\n",
|
"\n",
|
||||||
"\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"1. List all experiments in a workspace.\n",
|
||||||
]
|
"2. List all AutoML runs in an experiment.\n",
|
||||||
},
|
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
||||||
{
|
"4. Download a fitted pipeline for any iteration."
|
||||||
"cell_type": "code",
|
],
|
||||||
"execution_count": null,
|
"cell_type": "markdown"
|
||||||
"metadata": {},
|
},
|
||||||
"outputs": [],
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"ws = Workspace.from_config()"
|
"source": [
|
||||||
]
|
"## Setup"
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "markdown"
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"## Explore"
|
"outputs": [],
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"source": [
|
||||||
{
|
"import pandas as pd\n",
|
||||||
"cell_type": "markdown",
|
"import json\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"### List Experiments"
|
"from azureml.core.workspace import Workspace\n",
|
||||||
]
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "code"
|
||||||
"cell_type": "code",
|
},
|
||||||
"execution_count": null,
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"experiment_list = Experiment.list(workspace=ws)\n",
|
"source": [
|
||||||
"\n",
|
"ws = Workspace.from_config()"
|
||||||
"summary_df = pd.DataFrame(index = ['No of Runs'])\n",
|
],
|
||||||
"for experiment in experiment_list:\n",
|
"cell_type": "code"
|
||||||
" automl_runs = list(experiment.get_runs(type='automl'))\n",
|
},
|
||||||
" summary_df[experiment.name] = [len(automl_runs)]\n",
|
{
|
||||||
" \n",
|
"metadata": {},
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"source": [
|
||||||
"summary_df.T"
|
"## Explore"
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "markdown"
|
||||||
{
|
},
|
||||||
"cell_type": "markdown",
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### List runs for an experiment\n",
|
"### List Experiments"
|
||||||
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"outputs": [],
|
||||||
"metadata": {},
|
"execution_count": null,
|
||||||
"outputs": [],
|
"source": [
|
||||||
"source": [
|
"experiment_list = Experiment.list(workspace=ws)\n",
|
||||||
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
"\n",
|
||||||
"\n",
|
"summary_df = pd.DataFrame(index = ['No of Runs'])\n",
|
||||||
"proj = ws.experiments[experiment_name]\n",
|
"for experiment in experiment_list:\n",
|
||||||
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n",
|
" automl_runs = list(experiment.get_runs(type='automl'))\n",
|
||||||
"automl_runs = list(proj.get_runs(type='automl'))\n",
|
" summary_df[experiment.name] = [len(automl_runs)]\n",
|
||||||
"automl_runs_project = []\n",
|
" \n",
|
||||||
"for run in automl_runs:\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
" properties = run.get_properties()\n",
|
"summary_df.T"
|
||||||
" tags = run.get_tags()\n",
|
],
|
||||||
" amlsettings = json.loads(properties['AMLSettingsJsonString'])\n",
|
"cell_type": "code"
|
||||||
" if 'iterations' in tags:\n",
|
},
|
||||||
" iterations = tags['iterations']\n",
|
{
|
||||||
" else:\n",
|
"metadata": {},
|
||||||
" iterations = properties['num_iterations']\n",
|
"source": [
|
||||||
" summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]\n",
|
"### List runs for an experiment\n",
|
||||||
" if run.get_details()['status'] == 'Completed':\n",
|
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
||||||
" automl_runs_project.append(run.id)\n",
|
],
|
||||||
" \n",
|
"cell_type": "markdown"
|
||||||
"from IPython.display import HTML\n",
|
},
|
||||||
"projname_html = HTML(\"<h3>{}</h3>\".format(proj.name))\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"from IPython.display import display\n",
|
"outputs": [],
|
||||||
"display(projname_html)\n",
|
"execution_count": null,
|
||||||
"display(summary_df.T)"
|
"source": [
|
||||||
]
|
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"proj = ws.experiments[experiment_name]\n",
|
||||||
"cell_type": "markdown",
|
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n",
|
||||||
"metadata": {},
|
"automl_runs = list(proj.get_runs(type='automl'))\n",
|
||||||
"source": [
|
"automl_runs_project = []\n",
|
||||||
"### Get details for a run\n",
|
"for run in automl_runs:\n",
|
||||||
"\n",
|
" properties = run.get_properties()\n",
|
||||||
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
" tags = run.get_tags()\n",
|
||||||
]
|
" amlsettings = json.loads(properties['AMLSettingsJsonString'])\n",
|
||||||
},
|
" if 'iterations' in tags:\n",
|
||||||
{
|
" iterations = tags['iterations']\n",
|
||||||
"cell_type": "code",
|
" else:\n",
|
||||||
"execution_count": null,
|
" iterations = properties['num_iterations']\n",
|
||||||
"metadata": {},
|
" summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]\n",
|
||||||
"outputs": [],
|
" if run.get_details()['status'] == 'Completed':\n",
|
||||||
"source": [
|
" automl_runs_project.append(run.id)\n",
|
||||||
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
|
" \n",
|
||||||
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
|
"from IPython.display import HTML\n",
|
||||||
"\n",
|
"projname_html = HTML(\"<h3>{}</h3>\".format(proj.name))\n",
|
||||||
"from azureml.widgets import RunDetails\n",
|
"\n",
|
||||||
"\n",
|
"from IPython.display import display\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"display(projname_html)\n",
|
||||||
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)\n",
|
"display(summary_df.T)"
|
||||||
"\n",
|
],
|
||||||
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name', 'Start Time', 'End Time'])\n",
|
"cell_type": "code"
|
||||||
"properties = ml_run.get_properties()\n",
|
},
|
||||||
"tags = ml_run.get_tags()\n",
|
{
|
||||||
"status = ml_run.get_details()\n",
|
"metadata": {},
|
||||||
"amlsettings = json.loads(properties['AMLSettingsJsonString'])\n",
|
"source": [
|
||||||
"if 'iterations' in tags:\n",
|
"### Get details for a run\n",
|
||||||
" iterations = tags['iterations']\n",
|
"\n",
|
||||||
"else:\n",
|
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
||||||
" iterations = properties['num_iterations']\n",
|
],
|
||||||
"start_time = None\n",
|
"cell_type": "markdown"
|
||||||
"if 'startTimeUtc' in status:\n",
|
},
|
||||||
" start_time = status['startTimeUtc']\n",
|
{
|
||||||
"end_time = None\n",
|
"metadata": {},
|
||||||
"if 'endTimeUtc' in status:\n",
|
"outputs": [],
|
||||||
" end_time = status['endTimeUtc']\n",
|
"execution_count": null,
|
||||||
"summary_df[ml_run.id] = [amlsettings['task_type'], status['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name'], start_time, end_time]\n",
|
"source": [
|
||||||
"display(HTML('<h3>Runtime Details</h3>'))\n",
|
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
|
||||||
"display(summary_df)\n",
|
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#settings_df = pd.DataFrame(data = amlsettings, index = [''])\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"display(HTML('<h3>AutoML Settings</h3>'))\n",
|
"\n",
|
||||||
"display(amlsettings)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"\n",
|
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)\n",
|
||||||
"display(HTML('<h3>Iterations</h3>'))\n",
|
"\n",
|
||||||
"RunDetails(ml_run).show() \n",
|
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name', 'Start Time', 'End Time'])\n",
|
||||||
"\n",
|
"properties = ml_run.get_properties()\n",
|
||||||
"children = list(ml_run.get_children())\n",
|
"tags = ml_run.get_tags()\n",
|
||||||
"metricslist = {}\n",
|
"status = ml_run.get_details()\n",
|
||||||
"for run in children:\n",
|
"amlsettings = json.loads(properties['AMLSettingsJsonString'])\n",
|
||||||
" properties = run.get_properties()\n",
|
"if 'iterations' in tags:\n",
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
" iterations = tags['iterations']\n",
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
"else:\n",
|
||||||
"\n",
|
" iterations = properties['num_iterations']\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"start_time = None\n",
|
||||||
"display(HTML('<h3>Metrics</h3>'))\n",
|
"if 'startTimeUtc' in status:\n",
|
||||||
"display(rundata)\n"
|
" start_time = status['startTimeUtc']\n",
|
||||||
]
|
"end_time = None\n",
|
||||||
},
|
"if 'endTimeUtc' in status:\n",
|
||||||
{
|
" end_time = status['endTimeUtc']\n",
|
||||||
"cell_type": "markdown",
|
"summary_df[ml_run.id] = [amlsettings['task_type'], status['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name'], start_time, end_time]\n",
|
||||||
"metadata": {},
|
"display(HTML('<h3>Runtime Details</h3>'))\n",
|
||||||
"source": [
|
"display(summary_df)\n",
|
||||||
"## Download"
|
"\n",
|
||||||
]
|
"#settings_df = pd.DataFrame(data = amlsettings, index = [''])\n",
|
||||||
},
|
"display(HTML('<h3>AutoML Settings</h3>'))\n",
|
||||||
{
|
"display(amlsettings)\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"display(HTML('<h3>Iterations</h3>'))\n",
|
||||||
"source": [
|
"RunDetails(ml_run).show() \n",
|
||||||
"### Download the Best Model for Any Given Metric"
|
"\n",
|
||||||
]
|
"children = list(ml_run.get_children())\n",
|
||||||
},
|
"metricslist = {}\n",
|
||||||
{
|
"for run in children:\n",
|
||||||
"cell_type": "code",
|
" properties = run.get_properties()\n",
|
||||||
"execution_count": null,
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
"metadata": {},
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
"display(HTML('<h3>Metrics</h3>'))\n",
|
||||||
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
"display(rundata)\n"
|
||||||
"fitted_model"
|
],
|
||||||
]
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"## Download"
|
||||||
"### Download the Model for Any Given Iteration"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"metadata": {},
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"### Download the Best Model for Any Given Metric"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"iteration = 1 # Replace with an iteration number.\n",
|
},
|
||||||
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
{
|
||||||
"fitted_model"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"metadata": {},
|
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
||||||
"source": [
|
"fitted_model"
|
||||||
"## Register"
|
],
|
||||||
]
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"### Download the Model for Any Given Iteration"
|
||||||
"### Register fitted model for deployment\n",
|
],
|
||||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"iteration = 1 # Replace with an iteration number.\n",
|
||||||
"source": [
|
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
||||||
"description = 'AutoML Model'\n",
|
"fitted_model"
|
||||||
"tags = None\n",
|
],
|
||||||
"ml_run.register_model(description = description, tags = tags)\n",
|
"cell_type": "code"
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"## Register"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"### Register the Best Model for Any Given Metric"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"### Register fitted model for deployment\n",
|
||||||
"execution_count": null,
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
{
|
||||||
"description = 'AutoML Model'\n",
|
"metadata": {},
|
||||||
"tags = None\n",
|
"outputs": [],
|
||||||
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
"execution_count": null,
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"source": [
|
||||||
]
|
"description = 'AutoML Model'\n",
|
||||||
},
|
"tags = None\n",
|
||||||
{
|
"ml_run.register_model(description = description, tags = tags)\n",
|
||||||
"cell_type": "markdown",
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"### Register the Model for Any Given Iteration"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"### Register the Best Model for Any Given Metric"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"iteration = 1 # Replace with an iteration number.\n",
|
"metadata": {},
|
||||||
"description = 'AutoML Model'\n",
|
"outputs": [],
|
||||||
"tags = None\n",
|
"execution_count": null,
|
||||||
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
"source": [
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
]
|
"description = 'AutoML Model'\n",
|
||||||
}
|
"tags = None\n",
|
||||||
],
|
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
||||||
"metadata": {
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
"authors": [
|
],
|
||||||
{
|
"cell_type": "code"
|
||||||
"name": "savitam"
|
},
|
||||||
}
|
{
|
||||||
],
|
"metadata": {},
|
||||||
"kernelspec": {
|
"source": [
|
||||||
"display_name": "Python 3.6",
|
"### Register the Model for Any Given Iteration"
|
||||||
"language": "python",
|
],
|
||||||
"name": "python36"
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
"language_info": {
|
{
|
||||||
"codemirror_mode": {
|
"metadata": {},
|
||||||
"name": "ipython",
|
"outputs": [],
|
||||||
"version": 3
|
"execution_count": null,
|
||||||
},
|
"source": [
|
||||||
"file_extension": ".py",
|
"iteration = 1 # Replace with an iteration number.\n",
|
||||||
"mimetype": "text/x-python",
|
"description = 'AutoML Model'\n",
|
||||||
"name": "python",
|
"tags = None\n",
|
||||||
"nbconvert_exporter": "python",
|
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
||||||
"pygments_lexer": "ipython3",
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
"version": "3.6.6"
|
],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-exploring-previous-runs
|
name: auto-ml-exploring-previous-runs
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
name: auto-ml-forecasting-bike-share
|
name: auto-ml-forecasting-bike-share
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
- statsmodels
|
- statsmodels
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
name: auto-ml-forecasting-energy-demand
|
name: auto-ml-forecasting-energy-demand
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
- statsmodels
|
- statsmodels
|
||||||
- azureml-explain-model
|
- azureml-explain-model
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
name: auto-ml-forecasting-orange-juice-sales
|
name: auto-ml-forecasting-orange-juice-sales
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
- statsmodels
|
- statsmodels
|
||||||
|
|||||||
@@ -1,424 +1,424 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
},
|
||||||
"\n",
|
"authors": [
|
||||||
"Licensed under the MIT License."
|
{
|
||||||
]
|
"name": "savitam"
|
||||||
},
|
}
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"language_info": {
|
||||||
"metadata": {},
|
"mimetype": "text/x-python",
|
||||||
"source": [
|
"codemirror_mode": {
|
||||||
""
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Automated Machine Learning\n",
|
"version": "3.6.6"
|
||||||
"_**Blacklisting Models, Early Termination, and Handling Missing Data**_\n",
|
}
|
||||||
"\n",
|
},
|
||||||
"## Contents\n",
|
"nbformat": 4,
|
||||||
"1. [Introduction](#Introduction)\n",
|
"cells": [
|
||||||
"1. [Setup](#Setup)\n",
|
{
|
||||||
"1. [Data](#Data)\n",
|
"metadata": {},
|
||||||
"1. [Train](#Train)\n",
|
"source": [
|
||||||
"1. [Results](#Results)\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"1. [Test](#Test)\n"
|
"\n",
|
||||||
]
|
"Licensed under the MIT License."
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "markdown"
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"## Introduction\n",
|
"source": [
|
||||||
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metrics so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a blacklist of algorithms that AutoML will ignore for this run.\n",
|
""
|
||||||
"\n",
|
],
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"In this notebook you will learn how to:\n",
|
{
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"metadata": {},
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"source": [
|
||||||
"3. Train the model.\n",
|
"# Automated Machine Learning\n",
|
||||||
"4. Explore the results.\n",
|
"_**Blacklisting Models, Early Termination, and Handling Missing Data**_\n",
|
||||||
"5. Viewing the engineered names for featurized data and featurization summary for all raw features.\n",
|
"\n",
|
||||||
"6. Test the best fitted model.\n",
|
"## Contents\n",
|
||||||
"\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
"In addition this notebook showcases the following features\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"- **Blacklisting** certain pipelines\n",
|
"1. [Data](#Data)\n",
|
||||||
"- Specifying **target metrics** to indicate stopping criteria\n",
|
"1. [Train](#Train)\n",
|
||||||
"- Handling **missing data** in the input"
|
"1. [Results](#Results)\n",
|
||||||
]
|
"1. [Test](#Test)\n"
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "markdown"
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"## Setup\n",
|
"source": [
|
||||||
"\n",
|
"## Introduction\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metrics so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a blacklist of algorithms that AutoML will ignore for this run.\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"In this notebook you will learn how to:\n",
|
||||||
"execution_count": null,
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"metadata": {},
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"outputs": [],
|
"3. Train the model.\n",
|
||||||
"source": [
|
"4. Explore the results.\n",
|
||||||
"import logging\n",
|
"5. Viewing the engineered names for featurized data and featurization summary for all raw features.\n",
|
||||||
"\n",
|
"6. Test the best fitted model.\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"\n",
|
||||||
"import numpy as np\n",
|
"In addition this notebook showcases the following features\n",
|
||||||
"import pandas as pd\n",
|
"- **Blacklisting** certain pipelines\n",
|
||||||
"from sklearn import datasets\n",
|
"- Specifying **target metrics** to indicate stopping criteria\n",
|
||||||
"\n",
|
"- Handling **missing data** in the input"
|
||||||
"import azureml.core\n",
|
],
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"cell_type": "markdown"
|
||||||
"from azureml.core.workspace import Workspace\n",
|
},
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Setup\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"ws = Workspace.from_config()\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"# Choose a name for the experiment.\n",
|
"outputs": [],
|
||||||
"experiment_name = 'automl-local-missing-data'\n",
|
"execution_count": null,
|
||||||
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
"source": [
|
||||||
"\n",
|
"import logging\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"\n",
|
||||||
"\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"output = {}\n",
|
"import numpy as np\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"import pandas as pd\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"from sklearn import datasets\n",
|
||||||
"output['Workspace'] = ws.name\n",
|
"\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"import azureml.core\n",
|
||||||
"output['Location'] = ws.location\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
],
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"cell_type": "code"
|
||||||
"outputDf.T"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"ws = Workspace.from_config()\n",
|
||||||
"## Data"
|
"\n",
|
||||||
]
|
"# Choose a name for the experiment.\n",
|
||||||
},
|
"experiment_name = 'automl-local-missing-data'\n",
|
||||||
{
|
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"output = {}\n",
|
||||||
"source": [
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"X_train = digits.data[10:,:]\n",
|
"output['Workspace'] = ws.name\n",
|
||||||
"y_train = digits.target[10:]\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"\n",
|
"output['Location'] = ws.location\n",
|
||||||
"# Add missing values in 75% of the lines.\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"missing_rate = 0.75\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"n_missing_samples = int(np.floor(X_train.shape[0] * missing_rate))\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"missing_samples = np.hstack((np.zeros(X_train.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"rng = np.random.RandomState(0)\n",
|
"outputDf.T"
|
||||||
"rng.shuffle(missing_samples)\n",
|
],
|
||||||
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
"cell_type": "code"
|
||||||
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"## Data"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"df = pd.DataFrame(data = X_train)\n",
|
"metadata": {},
|
||||||
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
"outputs": [],
|
||||||
"df.head()"
|
"execution_count": null,
|
||||||
]
|
"source": [
|
||||||
},
|
"digits = datasets.load_digits()\n",
|
||||||
{
|
"X_train = digits.data[10:,:]\n",
|
||||||
"cell_type": "markdown",
|
"y_train = digits.target[10:]\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"# Add missing values in 75% of the lines.\n",
|
||||||
"## Train\n",
|
"missing_rate = 0.75\n",
|
||||||
"\n",
|
"n_missing_samples = int(np.floor(X_train.shape[0] * missing_rate))\n",
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment. This includes setting `experiment_exit_score`, which should cause the run to complete before the `iterations` count is reached.\n",
|
"missing_samples = np.hstack((np.zeros(X_train.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n",
|
||||||
"\n",
|
"rng = np.random.RandomState(0)\n",
|
||||||
"|Property|Description|\n",
|
"rng.shuffle(missing_samples)\n",
|
||||||
"|-|-|\n",
|
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
||||||
"|**task**|classification or regression|\n",
|
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
],
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"cell_type": "code"
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
},
|
||||||
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
|
{
|
||||||
"|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
|
"metadata": {},
|
||||||
"|**blacklist_models**|*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i>|\n",
|
"outputs": [],
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"execution_count": null,
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"source": [
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"df = pd.DataFrame(data = X_train)\n",
|
||||||
]
|
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
||||||
},
|
"df.head()"
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "code"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"## Train\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
"\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment. This includes setting `experiment_exit_score`, which should cause the run to complete before the `iterations` count is reached.\n",
|
||||||
" iteration_timeout_minutes = 60,\n",
|
"\n",
|
||||||
" iterations = 20,\n",
|
"|Property|Description|\n",
|
||||||
" preprocess = True,\n",
|
"|-|-|\n",
|
||||||
" experiment_exit_score = 0.9984,\n",
|
"|**task**|classification or regression|\n",
|
||||||
" blacklist_models = ['KNN','LinearSVM'],\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
" verbosity = logging.INFO,\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
" X = X_train, \n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
" y = y_train,\n",
|
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
|
||||||
" path = project_folder)"
|
"|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
|
||||||
]
|
"|**blacklist_models**|*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i>|\n",
|
||||||
},
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
{
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"cell_type": "markdown",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
},
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"metadata": {},
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"outputs": [],
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"source": [
|
" iteration_timeout_minutes = 60,\n",
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
" iterations = 20,\n",
|
||||||
]
|
" preprocess = True,\n",
|
||||||
},
|
" experiment_exit_score = 0.9984,\n",
|
||||||
{
|
" blacklist_models = ['KNN','LinearSVM'],\n",
|
||||||
"cell_type": "code",
|
" verbosity = logging.INFO,\n",
|
||||||
"execution_count": null,
|
" X = X_train, \n",
|
||||||
"metadata": {},
|
" y = y_train,\n",
|
||||||
"outputs": [],
|
" path = project_folder)"
|
||||||
"source": [
|
],
|
||||||
"local_run"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"source": [
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"## Results"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"#### Widget for Monitoring Runs\n",
|
"source": [
|
||||||
"\n",
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
],
|
||||||
"\n",
|
"cell_type": "code"
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"local_run"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"from azureml.widgets import RunDetails\n",
|
},
|
||||||
"RunDetails(local_run).show() "
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Results"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Retrieve All Child Runs\n",
|
"metadata": {},
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"source": [
|
||||||
]
|
"#### Widget for Monitoring Runs\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"children = list(local_run.get_children())\n",
|
{
|
||||||
"metricslist = {}\n",
|
"metadata": {},
|
||||||
"for run in children:\n",
|
"outputs": [],
|
||||||
" properties = run.get_properties()\n",
|
"execution_count": null,
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
"source": [
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"RunDetails(local_run).show() "
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
],
|
||||||
"rundata"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"#### Retrieve All Child Runs\n",
|
||||||
"### Retrieve the Best Model\n",
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"\n",
|
],
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"children = list(local_run.get_children())\n",
|
||||||
"source": [
|
"metricslist = {}\n",
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"for run in children:\n",
|
||||||
]
|
" properties = run.get_properties()\n",
|
||||||
},
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
{
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
"cell_type": "markdown",
|
"\n",
|
||||||
"metadata": {},
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"source": [
|
"rundata"
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
],
|
||||||
"Show the run and the model which has the smallest `accuracy` value:"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"### Retrieve the Best Model\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"source": [
|
],
|
||||||
"# lookup_metric = \"accuracy\"\n",
|
"cell_type": "markdown"
|
||||||
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
"#### Model from a Specific Iteration\n",
|
],
|
||||||
"Show the run and the model from the third iteration:"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"metadata": {},
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"# iteration = 3\n",
|
},
|
||||||
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"# lookup_metric = \"accuracy\"\n",
|
||||||
"source": [
|
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
||||||
"#### View the engineered names for featurized data\n",
|
],
|
||||||
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"#### Model from a Specific Iteration\n",
|
||||||
"metadata": {},
|
"Show the run and the model from the third iteration:"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"# iteration = 3\n",
|
||||||
"#### View the featurization summary\n",
|
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
||||||
"Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
|
],
|
||||||
"- Raw feature name\n",
|
"cell_type": "code"
|
||||||
"- Number of engineered features formed out of this raw feature\n",
|
},
|
||||||
"- Type detected\n",
|
{
|
||||||
"- If feature was dropped\n",
|
"metadata": {},
|
||||||
"- List of feature transformations for the raw feature"
|
"source": [
|
||||||
]
|
"#### View the engineered names for featurized data\n",
|
||||||
},
|
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "markdown"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"outputs": [],
|
||||||
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
"execution_count": null,
|
||||||
]
|
"source": [
|
||||||
},
|
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"cell_type": "code"
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Test"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
},
|
"#### View the featurization summary\n",
|
||||||
{
|
"Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
|
||||||
"cell_type": "code",
|
"- Raw feature name\n",
|
||||||
"execution_count": null,
|
"- Number of engineered features formed out of this raw feature\n",
|
||||||
"metadata": {},
|
"- Type detected\n",
|
||||||
"outputs": [],
|
"- If feature was dropped\n",
|
||||||
"source": [
|
"- List of feature transformations for the raw feature"
|
||||||
"digits = datasets.load_digits()\n",
|
],
|
||||||
"X_test = digits.data[:10, :]\n",
|
"cell_type": "markdown"
|
||||||
"y_test = digits.target[:10]\n",
|
},
|
||||||
"images = digits.images[:10]\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"# Randomly select digits and test.\n",
|
"outputs": [],
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"execution_count": null,
|
||||||
" print(index)\n",
|
"source": [
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
||||||
" label = y_test[index]\n",
|
],
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
"cell_type": "code"
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
},
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
{
|
||||||
" ax1.set_title(title)\n",
|
"metadata": {},
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
"source": [
|
||||||
" plt.show()\n"
|
"## Test"
|
||||||
]
|
],
|
||||||
}
|
"cell_type": "markdown"
|
||||||
],
|
},
|
||||||
"metadata": {
|
{
|
||||||
"authors": [
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"name": "savitam"
|
"execution_count": null,
|
||||||
}
|
"source": [
|
||||||
],
|
"digits = datasets.load_digits()\n",
|
||||||
"kernelspec": {
|
"X_test = digits.data[:10, :]\n",
|
||||||
"display_name": "Python 3.6",
|
"y_test = digits.target[:10]\n",
|
||||||
"language": "python",
|
"images = digits.images[:10]\n",
|
||||||
"name": "python36"
|
"\n",
|
||||||
},
|
"# Randomly select digits and test.\n",
|
||||||
"language_info": {
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
"codemirror_mode": {
|
" print(index)\n",
|
||||||
"name": "ipython",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
"version": 3
|
" label = y_test[index]\n",
|
||||||
},
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
"file_extension": ".py",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
"mimetype": "text/x-python",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
"name": "python",
|
" ax1.set_title(title)\n",
|
||||||
"nbconvert_exporter": "python",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
"pygments_lexer": "ipython3",
|
" plt.show()\n"
|
||||||
"version": "3.6.6"
|
],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-missing-data-blacklist-early-termination
|
name: auto-ml-missing-data-blacklist-early-termination
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,357 +1,357 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
},
|
||||||
"\n",
|
"authors": [
|
||||||
"Licensed under the MIT License."
|
{
|
||||||
]
|
"name": "xif"
|
||||||
},
|
}
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"language_info": {
|
||||||
"metadata": {},
|
"mimetype": "text/x-python",
|
||||||
"source": [
|
"codemirror_mode": {
|
||||||
""
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Automated Machine Learning\n",
|
"version": "3.6.6"
|
||||||
"_**Explain classification model and visualize the explanation**_\n",
|
}
|
||||||
"\n",
|
},
|
||||||
"## Contents\n",
|
"nbformat": 4,
|
||||||
"1. [Introduction](#Introduction)\n",
|
"cells": [
|
||||||
"1. [Setup](#Setup)\n",
|
{
|
||||||
"1. [Data](#Data)\n",
|
"metadata": {},
|
||||||
"1. [Train](#Train)\n",
|
"source": [
|
||||||
"1. [Results](#Results)"
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"Licensed under the MIT License."
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown"
|
||||||
"metadata": {},
|
},
|
||||||
"source": [
|
{
|
||||||
"## Introduction\n",
|
"metadata": {},
|
||||||
"In this example we use the sklearn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use the AutoML Classifier for a simple classification problem.\n",
|
"source": [
|
||||||
"\n",
|
""
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
],
|
||||||
"\n",
|
"cell_type": "markdown"
|
||||||
"In this notebook you would see\n",
|
},
|
||||||
"1. Creating an Experiment in an existing Workspace\n",
|
{
|
||||||
"2. Instantiating AutoMLConfig\n",
|
"metadata": {},
|
||||||
"3. Training the Model using local compute and explain the model\n",
|
"source": [
|
||||||
"4. Visualization model's feature importance in widget\n",
|
"# Automated Machine Learning\n",
|
||||||
"5. Explore best model's explanation"
|
"_**Explain classification model and visualize the explanation**_\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"## Contents\n",
|
||||||
{
|
"1. [Introduction](#Introduction)\n",
|
||||||
"cell_type": "markdown",
|
"1. [Setup](#Setup)\n",
|
||||||
"metadata": {},
|
"1. [Data](#Data)\n",
|
||||||
"source": [
|
"1. [Train](#Train)\n",
|
||||||
"## Setup\n",
|
"1. [Results](#Results)"
|
||||||
"\n",
|
],
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"## Introduction\n",
|
||||||
"metadata": {},
|
"In this example we use the sklearn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use the AutoML Classifier for a simple classification problem.\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"import logging\n",
|
"\n",
|
||||||
"\n",
|
"In this notebook you would see\n",
|
||||||
"import pandas as pd\n",
|
"1. Creating an Experiment in an existing Workspace\n",
|
||||||
"import azureml.core\n",
|
"2. Instantiating AutoMLConfig\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"3. Training the Model using local compute and explain the model\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"4. Visualization model's feature importance in widget\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"5. Explore best model's explanation"
|
||||||
]
|
],
|
||||||
},
|
"cell_type": "markdown"
|
||||||
{
|
},
|
||||||
"cell_type": "code",
|
{
|
||||||
"execution_count": null,
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"## Setup\n",
|
||||||
"source": [
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
||||||
"\n",
|
],
|
||||||
"# choose a name for experiment\n",
|
"cell_type": "markdown"
|
||||||
"experiment_name = 'automl-model-explanation'\n",
|
},
|
||||||
"# project folder\n",
|
{
|
||||||
"project_folder = './sample_projects/automl-model-explanation'\n",
|
"metadata": {},
|
||||||
"\n",
|
"outputs": [],
|
||||||
"experiment=Experiment(ws, experiment_name)\n",
|
"execution_count": null,
|
||||||
"\n",
|
"source": [
|
||||||
"output = {}\n",
|
"import logging\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"import pandas as pd\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"import azureml.core\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"output['Location'] = ws.location\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
],
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"cell_type": "code"
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
},
|
||||||
"outputDf.T"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"ws = Workspace.from_config()\n",
|
||||||
"source": [
|
"\n",
|
||||||
"## Data"
|
"# choose a name for experiment\n",
|
||||||
]
|
"experiment_name = 'automl-model-explanation'\n",
|
||||||
},
|
"# project folder\n",
|
||||||
{
|
"project_folder = './sample_projects/automl-model-explanation'\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"experiment=Experiment(ws, experiment_name)\n",
|
||||||
"metadata": {},
|
"\n",
|
||||||
"outputs": [],
|
"output = {}\n",
|
||||||
"source": [
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"from sklearn import datasets\n",
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"iris = datasets.load_iris()\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"y = iris.target\n",
|
"output['Location'] = ws.location\n",
|
||||||
"X = iris.data\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"features = iris.feature_names\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
"outputDf.T"
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(X,\n",
|
],
|
||||||
" y,\n",
|
"cell_type": "code"
|
||||||
" test_size=0.1,\n",
|
},
|
||||||
" random_state=100,\n",
|
{
|
||||||
" stratify=y)\n",
|
"metadata": {},
|
||||||
"\n",
|
"source": [
|
||||||
"X_train = pd.DataFrame(X_train, columns=features)\n",
|
"## Data"
|
||||||
"X_test = pd.DataFrame(X_test, columns=features)"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"## Train\n",
|
"source": [
|
||||||
"\n",
|
"from sklearn import datasets\n",
|
||||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
"\n",
|
||||||
"\n",
|
"iris = datasets.load_iris()\n",
|
||||||
"|Property|Description|\n",
|
"y = iris.target\n",
|
||||||
"|-|-|\n",
|
"X = iris.data\n",
|
||||||
"|**task**|classification or regression|\n",
|
"\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
"features = iris.feature_names\n",
|
||||||
"|**max_time_sec**|Time limit in minutes for each iterations|\n",
|
"\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"X_train, X_test, y_train, y_test = train_test_split(X,\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
" y,\n",
|
||||||
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
" test_size=0.1,\n",
|
||||||
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
" random_state=100,\n",
|
||||||
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
|
" stratify=y)\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
|
"\n",
|
||||||
]
|
"X_train = pd.DataFrame(X_train, columns=features)\n",
|
||||||
},
|
"X_test = pd.DataFrame(X_test, columns=features)"
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "code"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"## Train\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
"\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||||
" iteration_timeout_minutes = 200,\n",
|
"\n",
|
||||||
" iterations = 10,\n",
|
"|Property|Description|\n",
|
||||||
" verbosity = logging.INFO,\n",
|
"|-|-|\n",
|
||||||
" X = X_train, \n",
|
"|**task**|classification or regression|\n",
|
||||||
" y = y_train,\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
" X_valid = X_test,\n",
|
"|**max_time_sec**|Time limit in minutes for each iterations|\n",
|
||||||
" y_valid = y_test,\n",
|
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
|
||||||
" model_explainability=True,\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
" path=project_folder)"
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
]
|
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
},
|
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
{
|
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
|
||||||
"cell_type": "markdown",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
},
|
||||||
"You will see the currently running iterations printing to the console."
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"outputs": [],
|
||||||
{
|
"execution_count": null,
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
"metadata": {},
|
" debug_log = 'automl_errors.log',\n",
|
||||||
"outputs": [],
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
"source": [
|
" iteration_timeout_minutes = 200,\n",
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
" iterations = 10,\n",
|
||||||
]
|
" verbosity = logging.INFO,\n",
|
||||||
},
|
" X = X_train, \n",
|
||||||
{
|
" y = y_train,\n",
|
||||||
"cell_type": "code",
|
" X_valid = X_test,\n",
|
||||||
"execution_count": null,
|
" y_valid = y_test,\n",
|
||||||
"metadata": {},
|
" model_explainability=True,\n",
|
||||||
"outputs": [],
|
" path=project_folder)"
|
||||||
"source": [
|
],
|
||||||
"local_run"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
||||||
"source": [
|
"You will see the currently running iterations printing to the console."
|
||||||
"## Results"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"### Widget for monitoring runs\n",
|
"source": [
|
||||||
"\n",
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
],
|
||||||
"\n",
|
"cell_type": "code"
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"local_run"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"from azureml.widgets import RunDetails\n",
|
},
|
||||||
"RunDetails(local_run).show() "
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Results"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"### Retrieve the Best Model\n",
|
{
|
||||||
"\n",
|
"metadata": {},
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"source": [
|
||||||
]
|
"### Widget for monitoring runs\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
{
|
||||||
"print(best_run)\n",
|
"metadata": {},
|
||||||
"print(fitted_model)"
|
"outputs": [],
|
||||||
]
|
"execution_count": null,
|
||||||
},
|
"source": [
|
||||||
{
|
"from azureml.widgets import RunDetails\n",
|
||||||
"cell_type": "markdown",
|
"RunDetails(local_run).show() "
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"### Best Model 's explanation\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"Retrieve the explanation from the best_run. And explanation information includes:\n",
|
"metadata": {},
|
||||||
"\n",
|
"source": [
|
||||||
"1.\tshap_values: The explanation information generated by shap lib\n",
|
"### Retrieve the Best Model\n",
|
||||||
"2.\texpected_values: The expected value of the model applied to set of X_train data.\n",
|
"\n",
|
||||||
"3.\toverall_summary: The model level feature importance values sorted in descending order\n",
|
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
||||||
"4.\toverall_imp: The feature names sorted in the same order as in overall_summary\n",
|
],
|
||||||
"5.\tper_class_summary: The class level feature importance values sorted in descending order. Only available for the classification case\n",
|
"cell_type": "markdown"
|
||||||
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"Note:- The **retrieve_model_explanation()** API only works in case AutoML has been configured with **'model_explainability'** flag set to **True**. "
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"execution_count": null,
|
"print(best_run)\n",
|
||||||
"metadata": {},
|
"print(fitted_model)"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
"metadata": {},
|
||||||
" retrieve_model_explanation(best_run)"
|
"source": [
|
||||||
]
|
"### Best Model 's explanation\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"Retrieve the explanation from the best_run. And explanation information includes:\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"1.\tshap_values: The explanation information generated by shap lib\n",
|
||||||
"metadata": {},
|
"2.\texpected_values: The expected value of the model applied to set of X_train data.\n",
|
||||||
"outputs": [],
|
"3.\toverall_summary: The model level feature importance values sorted in descending order\n",
|
||||||
"source": [
|
"4.\toverall_imp: The feature names sorted in the same order as in overall_summary\n",
|
||||||
"print(overall_summary)\n",
|
"5.\tper_class_summary: The class level feature importance values sorted in descending order. Only available for the classification case\n",
|
||||||
"print(overall_imp)"
|
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"Note:- The **retrieve_model_explanation()** API only works in case AutoML has been configured with **'model_explainability'** flag set to **True**. "
|
||||||
{
|
],
|
||||||
"cell_type": "code",
|
"cell_type": "markdown"
|
||||||
"execution_count": null,
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"outputs": [],
|
"metadata": {},
|
||||||
"source": [
|
"outputs": [],
|
||||||
"print(per_class_summary)\n",
|
"execution_count": null,
|
||||||
"print(per_class_imp)"
|
"source": [
|
||||||
]
|
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
||||||
"cell_type": "markdown",
|
" retrieve_model_explanation(best_run)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"print(overall_summary)\n",
|
||||||
"outputs": [],
|
"print(overall_imp)"
|
||||||
"source": [
|
],
|
||||||
"from azureml.train.automl.automlexplainer import explain_model\n",
|
"cell_type": "code"
|
||||||
"\n",
|
},
|
||||||
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
{
|
||||||
" explain_model(fitted_model, X_train, X_test, features=features)"
|
"metadata": {},
|
||||||
]
|
"outputs": [],
|
||||||
},
|
"execution_count": null,
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"print(per_class_summary)\n",
|
||||||
"execution_count": null,
|
"print(per_class_imp)"
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "code"
|
||||||
"source": [
|
},
|
||||||
"print(overall_summary)\n",
|
{
|
||||||
"print(overall_imp)"
|
"metadata": {},
|
||||||
]
|
"source": [
|
||||||
}
|
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
|
||||||
],
|
],
|
||||||
"metadata": {
|
"cell_type": "markdown"
|
||||||
"authors": [
|
},
|
||||||
{
|
{
|
||||||
"name": "xif"
|
"metadata": {},
|
||||||
}
|
"outputs": [],
|
||||||
],
|
"execution_count": null,
|
||||||
"kernelspec": {
|
"source": [
|
||||||
"display_name": "Python 3.6",
|
"from azureml.train.automl.automlexplainer import explain_model\n",
|
||||||
"language": "python",
|
"\n",
|
||||||
"name": "python36"
|
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
||||||
},
|
" explain_model(fitted_model, X_train, X_test, features=features)"
|
||||||
"language_info": {
|
],
|
||||||
"codemirror_mode": {
|
"cell_type": "code"
|
||||||
"name": "ipython",
|
},
|
||||||
"version": 3
|
{
|
||||||
},
|
"metadata": {},
|
||||||
"file_extension": ".py",
|
"outputs": [],
|
||||||
"mimetype": "text/x-python",
|
"execution_count": null,
|
||||||
"name": "python",
|
"source": [
|
||||||
"nbconvert_exporter": "python",
|
"print(overall_summary)\n",
|
||||||
"pygments_lexer": "ipython3",
|
"print(overall_imp)"
|
||||||
"version": "3.6.6"
|
],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,9 +1,9 @@
|
|||||||
name: auto-ml-model-explanation
|
name: auto-ml-model-explanation
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
- azureml-explain-model
|
- azureml-explain-model
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-regression-concrete-strength
|
name: auto-ml-regression-concrete-strength
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
name: auto-ml-regression-hardware-performance
|
name: auto-ml-regression-hardware-performance
|
||||||
dependencies:
|
dependencies:
|
||||||
- pip:
|
- pip:
|
||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-train-automl
|
- azureml-train-automl
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- matplotlib
|
- matplotlib
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|||||||
@@ -1,407 +1,407 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"metadata": {
|
||||||
{
|
"kernelspec": {
|
||||||
"cell_type": "markdown",
|
"display_name": "Python 3.6",
|
||||||
"metadata": {},
|
"name": "python36",
|
||||||
"source": [
|
"language": "python"
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
},
|
||||||
"\n",
|
"authors": [
|
||||||
"Licensed under the MIT License."
|
{
|
||||||
]
|
"name": "savitam"
|
||||||
},
|
}
|
||||||
{
|
],
|
||||||
"cell_type": "markdown",
|
"language_info": {
|
||||||
"metadata": {},
|
"mimetype": "text/x-python",
|
||||||
"source": [
|
"codemirror_mode": {
|
||||||
""
|
"name": "ipython",
|
||||||
]
|
"version": 3
|
||||||
},
|
},
|
||||||
{
|
"pygments_lexer": "ipython3",
|
||||||
"cell_type": "markdown",
|
"name": "python",
|
||||||
"metadata": {},
|
"file_extension": ".py",
|
||||||
"source": [
|
"nbconvert_exporter": "python",
|
||||||
"# Automated Machine Learning\n",
|
"version": "3.6.6"
|
||||||
"_**Regression with Local Compute**_\n",
|
}
|
||||||
"\n",
|
},
|
||||||
"## Contents\n",
|
"nbformat": 4,
|
||||||
"1. [Introduction](#Introduction)\n",
|
"cells": [
|
||||||
"1. [Setup](#Setup)\n",
|
{
|
||||||
"1. [Data](#Data)\n",
|
"metadata": {},
|
||||||
"1. [Train](#Train)\n",
|
"source": [
|
||||||
"1. [Results](#Results)\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"1. [Test](#Test)\n"
|
"\n",
|
||||||
]
|
"Licensed under the MIT License."
|
||||||
},
|
],
|
||||||
{
|
"cell_type": "markdown"
|
||||||
"cell_type": "markdown",
|
},
|
||||||
"metadata": {},
|
{
|
||||||
"source": [
|
"metadata": {},
|
||||||
"## Introduction\n",
|
"source": [
|
||||||
"In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/datasets/index.html#diabetes-dataset) to showcase how you can use AutoML for a simple regression problem.\n",
|
""
|
||||||
"\n",
|
],
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"In this notebook you will learn how to:\n",
|
{
|
||||||
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
"metadata": {},
|
||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"source": [
|
||||||
"3. Train the model using local compute.\n",
|
"# Automated Machine Learning\n",
|
||||||
"4. Explore the results.\n",
|
"_**Regression with Local Compute**_\n",
|
||||||
"5. Test the best fitted model."
|
"\n",
|
||||||
]
|
"## Contents\n",
|
||||||
},
|
"1. [Introduction](#Introduction)\n",
|
||||||
{
|
"1. [Setup](#Setup)\n",
|
||||||
"cell_type": "markdown",
|
"1. [Data](#Data)\n",
|
||||||
"metadata": {},
|
"1. [Train](#Train)\n",
|
||||||
"source": [
|
"1. [Results](#Results)\n",
|
||||||
"## Setup\n",
|
"1. [Test](#Test)\n"
|
||||||
"\n",
|
],
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"source": [
|
||||||
"execution_count": null,
|
"## Introduction\n",
|
||||||
"metadata": {},
|
"In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/datasets/index.html#diabetes-dataset) to showcase how you can use AutoML for a simple regression problem.\n",
|
||||||
"outputs": [],
|
"\n",
|
||||||
"source": [
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"import logging\n",
|
"\n",
|
||||||
"\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"import numpy as np\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"import pandas as pd\n",
|
"3. Train the model using local compute.\n",
|
||||||
"\n",
|
"4. Explore the results.\n",
|
||||||
"import azureml.core\n",
|
"5. Test the best fitted model."
|
||||||
"from azureml.core.experiment import Experiment\n",
|
],
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"cell_type": "markdown"
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"## Setup\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "markdown"
|
||||||
"ws = Workspace.from_config()\n",
|
},
|
||||||
"\n",
|
{
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"metadata": {},
|
||||||
"experiment_name = 'automl-local-regression'\n",
|
"outputs": [],
|
||||||
"project_folder = './sample_projects/automl-local-regression'\n",
|
"execution_count": null,
|
||||||
"\n",
|
"source": [
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
"output = {}\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
"import numpy as np\n",
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
"import pandas as pd\n",
|
||||||
"output['Workspace Name'] = ws.name\n",
|
"\n",
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
"import azureml.core\n",
|
||||||
"output['Location'] = ws.location\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"output['Project Directory'] = project_folder\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
],
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"cell_type": "code"
|
||||||
"outputDf.T"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "markdown",
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"ws = Workspace.from_config()\n",
|
||||||
"## Data\n",
|
"\n",
|
||||||
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
]
|
"experiment_name = 'automl-local-regression'\n",
|
||||||
},
|
"project_folder = './sample_projects/automl-local-regression'\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"execution_count": null,
|
"\n",
|
||||||
"metadata": {},
|
"output = {}\n",
|
||||||
"outputs": [],
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
"source": [
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
"output['Workspace Name'] = ws.name\n",
|
||||||
"from sklearn.datasets import load_diabetes\n",
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
"output['Location'] = ws.location\n",
|
||||||
"\n",
|
"output['Project Directory'] = project_folder\n",
|
||||||
"X, y = load_diabetes(return_X_y = True)\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"\n",
|
"outputDf.T"
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
|
],
|
||||||
]
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"source": [
|
||||||
"source": [
|
"## Data\n",
|
||||||
"## Train\n",
|
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
||||||
"\n",
|
],
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"|Property|Description|\n",
|
{
|
||||||
"|-|-|\n",
|
"metadata": {},
|
||||||
"|**task**|classification or regression|\n",
|
"outputs": [],
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
"execution_count": null,
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"source": [
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
"\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"X, y = load_diabetes(return_X_y = True)\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "code",
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "code"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"automl_config = AutoMLConfig(task = 'regression',\n",
|
"metadata": {},
|
||||||
" iteration_timeout_minutes = 10,\n",
|
"source": [
|
||||||
" iterations = 10,\n",
|
"## Train\n",
|
||||||
" primary_metric = 'spearman_correlation',\n",
|
"\n",
|
||||||
" n_cross_validations = 5,\n",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
" debug_log = 'automl.log',\n",
|
"\n",
|
||||||
" verbosity = logging.INFO,\n",
|
"|Property|Description|\n",
|
||||||
" X = X_train, \n",
|
"|-|-|\n",
|
||||||
" y = y_train,\n",
|
"|**task**|classification or regression|\n",
|
||||||
" path = project_folder)"
|
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
]
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
},
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
{
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"cell_type": "markdown",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"metadata": {},
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
"source": [
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
],
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"automl_config = AutoMLConfig(task = 'regression',\n",
|
||||||
"source": [
|
" iteration_timeout_minutes = 10,\n",
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
" iterations = 10,\n",
|
||||||
]
|
" primary_metric = 'spearman_correlation',\n",
|
||||||
},
|
" n_cross_validations = 5,\n",
|
||||||
{
|
" debug_log = 'automl.log',\n",
|
||||||
"cell_type": "code",
|
" verbosity = logging.INFO,\n",
|
||||||
"execution_count": null,
|
" X = X_train, \n",
|
||||||
"metadata": {},
|
" y = y_train,\n",
|
||||||
"outputs": [],
|
" path = project_folder)"
|
||||||
"source": [
|
],
|
||||||
"local_run"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"source": [
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
"## Results"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
"metadata": {},
|
"outputs": [],
|
||||||
"source": [
|
"execution_count": null,
|
||||||
"#### Widget for Monitoring Runs\n",
|
"source": [
|
||||||
"\n",
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
],
|
||||||
"\n",
|
"cell_type": "code"
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"outputs": [],
|
||||||
"cell_type": "code",
|
"execution_count": null,
|
||||||
"execution_count": null,
|
"source": [
|
||||||
"metadata": {},
|
"local_run"
|
||||||
"outputs": [],
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"from azureml.widgets import RunDetails\n",
|
},
|
||||||
"RunDetails(local_run).show() "
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"## Results"
|
||||||
"cell_type": "markdown",
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"\n",
|
{
|
||||||
"#### Retrieve All Child Runs\n",
|
"metadata": {},
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"source": [
|
||||||
]
|
"#### Widget for Monitoring Runs\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"children = list(local_run.get_children())\n",
|
{
|
||||||
"metricslist = {}\n",
|
"metadata": {},
|
||||||
"for run in children:\n",
|
"outputs": [],
|
||||||
" properties = run.get_properties()\n",
|
"execution_count": null,
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
"source": [
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"\n",
|
"RunDetails(local_run).show() "
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
],
|
||||||
"rundata"
|
"cell_type": "code"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "markdown",
|
"source": [
|
||||||
"metadata": {},
|
"\n",
|
||||||
"source": [
|
"#### Retrieve All Child Runs\n",
|
||||||
"### Retrieve the Best Model\n",
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
"\n",
|
],
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"children = list(local_run.get_children())\n",
|
||||||
"source": [
|
"metricslist = {}\n",
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"for run in children:\n",
|
||||||
"print(best_run)\n",
|
" properties = run.get_properties()\n",
|
||||||
"print(fitted_model)"
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
]
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
},
|
"\n",
|
||||||
{
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"cell_type": "markdown",
|
"rundata"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
},
|
||||||
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"### Retrieve the Best Model\n",
|
||||||
"cell_type": "code",
|
"\n",
|
||||||
"execution_count": null,
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
"metadata": {},
|
],
|
||||||
"outputs": [],
|
"cell_type": "markdown"
|
||||||
"source": [
|
},
|
||||||
"lookup_metric = \"root_mean_squared_error\"\n",
|
{
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"metadata": {},
|
||||||
"print(best_run)\n",
|
"outputs": [],
|
||||||
"print(fitted_model)"
|
"execution_count": null,
|
||||||
]
|
"source": [
|
||||||
},
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
{
|
"print(best_run)\n",
|
||||||
"cell_type": "markdown",
|
"print(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"#### Model from a Specific Iteration\n",
|
},
|
||||||
"Show the run and the model from the third iteration:"
|
{
|
||||||
]
|
"metadata": {},
|
||||||
},
|
"source": [
|
||||||
{
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"cell_type": "code",
|
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"iteration = 3\n",
|
"metadata": {},
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
"outputs": [],
|
||||||
"print(third_run)\n",
|
"execution_count": null,
|
||||||
"print(third_model)"
|
"source": [
|
||||||
]
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
},
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
{
|
"print(best_run)\n",
|
||||||
"cell_type": "markdown",
|
"print(fitted_model)"
|
||||||
"metadata": {},
|
],
|
||||||
"source": [
|
"cell_type": "code"
|
||||||
"## Test"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "markdown",
|
"#### Model from a Specific Iteration\n",
|
||||||
"metadata": {},
|
"Show the run and the model from the third iteration:"
|
||||||
"source": [
|
],
|
||||||
"Predict on training and test set, and calculate residual values."
|
"cell_type": "markdown"
|
||||||
]
|
},
|
||||||
},
|
{
|
||||||
{
|
"metadata": {},
|
||||||
"cell_type": "code",
|
"outputs": [],
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"source": [
|
||||||
"outputs": [],
|
"iteration = 3\n",
|
||||||
"source": [
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"y_pred_train = fitted_model.predict(X_train)\n",
|
"print(third_run)\n",
|
||||||
"y_residual_train = y_train - y_pred_train\n",
|
"print(third_model)"
|
||||||
"\n",
|
],
|
||||||
"y_pred_test = fitted_model.predict(X_test)\n",
|
"cell_type": "code"
|
||||||
"y_residual_test = y_test - y_pred_test"
|
},
|
||||||
]
|
{
|
||||||
},
|
"metadata": {},
|
||||||
{
|
"source": [
|
||||||
"cell_type": "code",
|
"## Test"
|
||||||
"execution_count": null,
|
],
|
||||||
"metadata": {},
|
"cell_type": "markdown"
|
||||||
"outputs": [],
|
},
|
||||||
"source": [
|
{
|
||||||
"%matplotlib inline\n",
|
"metadata": {},
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
"source": [
|
||||||
"\n",
|
"Predict on training and test set, and calculate residual values."
|
||||||
"# Set up a multi-plot chart.\n",
|
],
|
||||||
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
"cell_type": "markdown"
|
||||||
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
},
|
||||||
"f.set_figheight(6)\n",
|
{
|
||||||
"f.set_figwidth(16)\n",
|
"metadata": {},
|
||||||
"\n",
|
"outputs": [],
|
||||||
"# Plot residual values of training set.\n",
|
"execution_count": null,
|
||||||
"a0.axis([0, 360, -200, 200])\n",
|
"source": [
|
||||||
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
"\n",
|
||||||
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
"y_residual_test = y_test - y_pred_test"
|
||||||
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
],
|
||||||
"\n",
|
"cell_type": "code"
|
||||||
"# Plot a histogram.\n",
|
},
|
||||||
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
|
{
|
||||||
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
"metadata": {},
|
||||||
"\n",
|
"outputs": [],
|
||||||
"# Plot residual values of test set.\n",
|
"execution_count": null,
|
||||||
"a1.axis([0, 90, -200, 200])\n",
|
"source": [
|
||||||
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
"%matplotlib inline\n",
|
||||||
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
"\n",
|
||||||
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
"# Set up a multi-plot chart.\n",
|
||||||
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
||||||
"a1.set_yticklabels([])\n",
|
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
||||||
"\n",
|
"f.set_figheight(6)\n",
|
||||||
"# Plot a histogram.\n",
|
"f.set_figwidth(16)\n",
|
||||||
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
|
"\n",
|
||||||
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
"# Plot residual values of training set.\n",
|
||||||
"\n",
|
"a0.axis([0, 360, -200, 200])\n",
|
||||||
"plt.show()"
|
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
||||||
]
|
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
}
|
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
||||||
],
|
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
||||||
"metadata": {
|
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
||||||
"authors": [
|
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
||||||
{
|
"\n",
|
||||||
"name": "savitam"
|
"# Plot a histogram.\n",
|
||||||
}
|
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
|
||||||
],
|
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
||||||
"kernelspec": {
|
"\n",
|
||||||
"display_name": "Python 3.6",
|
"# Plot residual values of test set.\n",
|
||||||
"language": "python",
|
"a1.axis([0, 90, -200, 200])\n",
|
||||||
"name": "python36"
|
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
||||||
},
|
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
"language_info": {
|
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
||||||
"codemirror_mode": {
|
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
||||||
"name": "ipython",
|
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
||||||
"version": 3
|
"a1.set_yticklabels([])\n",
|
||||||
},
|
"\n",
|
||||||
"file_extension": ".py",
|
"# Plot a histogram.\n",
|
||||||
"mimetype": "text/x-python",
|
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
|
||||||
"name": "python",
|
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
||||||
"nbconvert_exporter": "python",
|
"\n",
|
||||||
"pygments_lexer": "ipython3",
|
"plt.show()"
|
||||||
"version": "3.6.6"
|
],
|
||||||
}
|
"cell_type": "code"
|
||||||
},
|
}
|
||||||
"nbformat": 4,
|
],
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||