Compare commits
24 Commits
dev01
...
vizhur/aut
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e792ba8278 | ||
|
|
4170a394ed | ||
|
|
475ea36106 | ||
|
|
9e0fc4f0e7 | ||
|
|
e97e4742ba | ||
|
|
14ecfb0bf3 | ||
|
|
61b396be4f | ||
|
|
3d2552174d | ||
|
|
cd3c980a6e | ||
|
|
249bcac3c7 | ||
|
|
4a6bcebccc | ||
|
|
56e0ebc5ac | ||
|
|
2aa39f2f4a | ||
|
|
4d247c1877 | ||
|
|
f6682f6f6d | ||
|
|
26ecf25233 | ||
|
|
44c3a486c0 | ||
|
|
c574f429b8 | ||
|
|
77d557a5dc | ||
|
|
13dedec4a4 | ||
|
|
6f5c52676f | ||
|
|
90c105537c | ||
|
|
ef264b1073 | ||
|
|
824ac5e021 |
@@ -52,7 +52,6 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
|
|||||||
|
|
||||||
Visit following repos to see projects contributed by Azure ML users:
|
Visit following repos to see projects contributed by Azure ML users:
|
||||||
|
|
||||||
- [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis.
|
|
||||||
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
||||||
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
||||||
|
|
||||||
|
|||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Configuration\n",
|
"# Configuration\n",
|
||||||
@@ -58,20 +82,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"### What is an Azure Machine Learning workspace\n",
|
"### What is an Azure Machine Learning workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models."
|
"An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This section describes activities required before you can access any Azure ML services functionality."
|
"This section describes activities required before you can access any Azure ML services functionality."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### 1. Azure Subscription\n",
|
"### 1. Azure Subscription\n",
|
||||||
@@ -89,26 +113,26 @@
|
|||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Once installation is complete, the following cell checks the Azure ML SDK version:"
|
"Once installation is complete, the following cell checks the Azure ML SDK version:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"install"
|
"install"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"This notebook was created using version 1.0.43 of the Azure ML SDK\")\n",
|
"print(\"This notebook was created using version 1.0.48.post1 of the Azure ML SDK\")\n",
|
||||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n",
|
"If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n",
|
||||||
@@ -126,10 +150,10 @@
|
|||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
"---"
|
"---"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configure your Azure ML workspace\n",
|
"## Configure your Azure ML workspace\n",
|
||||||
@@ -155,13 +179,13 @@
|
|||||||
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
|
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Replace the default values in the cell below with your workspace parameters"
|
"Replace the default values in the cell below with your workspace parameters"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -169,22 +193,22 @@
|
|||||||
"resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"<my-resource-group>\")\n",
|
"resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"<my-resource-group>\")\n",
|
||||||
"workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"<my-workspace-name>\")\n",
|
"workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"<my-workspace-name>\")\n",
|
||||||
"workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")"
|
"workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Access your workspace\n",
|
"### Access your workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. "
|
"The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -195,10 +219,10 @@
|
|||||||
" print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n",
|
" print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" print(\"Workspace not accessible. Change your parameters or create a new workspace below\")"
|
" print(\"Workspace not accessible. Change your parameters or create a new workspace below\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a new workspace\n",
|
"### Create a new workspace\n",
|
||||||
@@ -215,17 +239,17 @@
|
|||||||
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -240,10 +264,10 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# write the details of the workspace to a configuration file to the notebook library\n",
|
"# write the details of the workspace to a configuration file to the notebook library\n",
|
||||||
"ws.write_config()"
|
"ws.write_config()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create compute resources for your training experiments\n",
|
"### Create compute resources for your training experiments\n",
|
||||||
@@ -258,18 +282,18 @@
|
|||||||
"```shell\n",
|
"```shell\n",
|
||||||
"az vm list-skus -o tsv\n",
|
"az vm list-skus -o tsv\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while note in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
|
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while not in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
|
||||||
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
|
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
|
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
@@ -294,20 +318,20 @@
|
|||||||
" \n",
|
" \n",
|
||||||
" # Wait for the cluster to complete, show the output log\n",
|
" # Wait for the cluster to complete, show the output log\n",
|
||||||
" cpu_cluster.wait_for_completion(show_output=True)"
|
" cpu_cluster.wait_for_completion(show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
|
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
@@ -331,10 +355,10 @@
|
|||||||
"\n",
|
"\n",
|
||||||
" # Wait for the cluster to complete, show the output log\n",
|
" # Wait for the cluster to complete, show the output log\n",
|
||||||
" gpu_cluster.wait_for_completion(show_output=True)"
|
" gpu_cluster.wait_for_completion(show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"---\n",
|
"---\n",
|
||||||
@@ -344,40 +368,16 @@
|
|||||||
"In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n",
|
"In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process."
|
"If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": []
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "roastala"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
4
configuration.yml
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
name: configuration
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
@@ -1,307 +0,0 @@
|
|||||||
## How to use the RAPIDS on AzureML materials
|
|
||||||
### Setting up requirements
|
|
||||||
The material requires the use of the Azure ML SDK and of the Jupyter Notebook Server to run the interactive execution. Please refer to instructions to [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") Follow the instructions under **Local Computer**, make sure to run the last step: <span style="font-family: Courier New;">pip install \<new package\></span> with <span style="font-family: Courier New;">new package = progressbar2 (pip install progressbar2)</span>
|
|
||||||
|
|
||||||
After following the directions, the user should end up setting a conda environment (<span style="font-family: Courier New;">myenv</span>)that can be activated in an Anaconda prompt
|
|
||||||
|
|
||||||
The user would also require an Azure Subscription with a Machine Learning Services quota on the desired region for 24 nodes or more (to be able to select a vmSize with 4 GPUs as it is used on the Notebook) on the desired VM family ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)), the specific vmSize to be used within the chosen family would also need to be whitelisted for Machine Learning Services usage.
|
|
||||||
|
|
||||||
|
|
||||||
### Getting and running the material
|
|
||||||
Clone the AzureML Notebooks repository in GitHub by running the following command on a local_directory:
|
|
||||||
|
|
||||||
* C:\local_directory>git clone https://github.com/Azure/MachineLearningNotebooks.git
|
|
||||||
|
|
||||||
On a conda prompt navigate to the local directory, activate the conda environment (<span style="font-family: Courier New;">myenv</span>), where the Azure ML SDK was installed and launch Jupyter Notebook.
|
|
||||||
|
|
||||||
* (<span style="font-family: Courier New;">myenv</span>) C:\local_directory>jupyter notebook
|
|
||||||
|
|
||||||
From the resulting browser at http://localhost:8888/tree, navigate to the master notebook:
|
|
||||||
|
|
||||||
* http://localhost:8888/tree/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
|
|
||||||
|
|
||||||
|
|
||||||
The following notebook will appear:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
### Master Jupyter Notebook
|
|
||||||
The notebook can be executed interactively step by step, by pressing the Run button (In a red circle in the above image.)
|
|
||||||
|
|
||||||
The first couple of functional steps import the necessary AzureML libraries. If you experience any errors please refer back to the [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") instructions.
|
|
||||||
|
|
||||||
|
|
||||||
#### Setting up a Workspace
|
|
||||||
The following step gathers the information necessary to set up a workspace to execute the RAPIDS script. This needs to be done only once, or not at all if you already have a workspace you can use set up on the Azure Portal:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
It is important to be sure to set the correct values for the subscription\_id, resource\_group, workspace\_name, and region before executing the step. An example is:
|
|
||||||
|
|
||||||
subscription_id = os.environ.get("SUBSCRIPTION_ID", "1358e503-xxxx-4043-xxxx-65b83xxxx32d")
|
|
||||||
resource_group = os.environ.get("RESOURCE_GROUP", "AML-Rapids-Testing")
|
|
||||||
workspace_name = os.environ.get("WORKSPACE_NAME", "AML_Rapids_Tester")
|
|
||||||
workspace_region = os.environ.get("WORKSPACE_REGION", "West US 2")
|
|
||||||
|
|
||||||
|
|
||||||
The resource\_group and workspace_name could take any value, the region should match the region for which the subscription has the required Machine Learning Services node quota.
|
|
||||||
|
|
||||||
The first time the code is executed it will redirect to the Azure Portal to validate subscription credentials. After the workspace is created, its related information is stored on a local file so that this step can be subsequently skipped. The immediate step will just load the saved workspace
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Once a workspace has been created the user could skip its creation and just jump to this step. The configuration file resides in:
|
|
||||||
|
|
||||||
* C:\local_directory\\MachineLearningNotebooks\contrib\RAPIDS\aml_config\config.json
|
|
||||||
|
|
||||||
|
|
||||||
#### Creating an AML Compute Target
|
|
||||||
Following step, creates an AML Compute Target
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Parameter vm\_size on function call AmlCompute.provisioning\_configuration() has to be a member of the VM families ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)) that are the ones provided with P40 or V100 GPUs, that are the ones supported by RAPIDS. In this particular case an Standard\_NC24s\_V2 was used.
|
|
||||||
|
|
||||||
|
|
||||||
If the output of running the step has an error of the form:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
It is an indication that even though the subscription has a node quota for VMs for that family, it does not have a node quota for Machine Learning Services for that family.
|
|
||||||
You will need to request an increase node quota for that family in that region for **Machine Learning Services**.
|
|
||||||
|
|
||||||
|
|
||||||
Another possible error is the following:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Which indicates that specified vmSize has not been whitelisted for usage on Machine Learning Services and a request to do so should be filled.
|
|
||||||
|
|
||||||
The successful creation of the compute target would have an output like the following:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### RAPIDS script uploading and viewing
|
|
||||||
The next step copies the RAPIDS script process_data.py, which is a slightly modified implementation of the [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb), into a script processing folder and it presents its contents to the user. (The script is discussed in the next section in detail).
|
|
||||||
If the user wants to use a different RAPIDS script, the references to the <span style="font-family: Courier New;">process_data.py</span> script have to be changed
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### Data Uploading
|
|
||||||
The RAPIDS script loads and extracts features from the Fannie Mae’s Mortgage Dataset to train an XGBoost prediction model. The script uses two years of data
|
|
||||||
|
|
||||||
The next few steps download and decompress the data and is made available to the script as an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data).
|
|
||||||
|
|
||||||
|
|
||||||
The following functions are used to download and decompress the input data
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||

|
|
||||||

|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The next step uses those functions to download locally file:
|
|
||||||
http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/mortgage_2000-2001.tgz'
|
|
||||||
And to decompress it, into local folder path = .\mortgage_2000-2001
|
|
||||||
The step takes several minutes, the intermediate outputs provide progress indicators.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The decompressed data should have the following structure:
|
|
||||||
* .\mortgage_2000-2001\acq\Acquisition_<year>Q<num>.txt
|
|
||||||
* .\mortgage_2000-2001\perf\Performance_<year>Q<num>.txt
|
|
||||||
* .\mortgage_2000-2001\names.csv
|
|
||||||
|
|
||||||
The data is divided in partitions that roughly correspond to yearly quarters. RAPIDS includes support for multi-node, multi-GPU deployments, enabling scaling up and out on much larger dataset sizes. The user will be able to verify that the number of partitions that the script is able to process increases with the number of GPUs used. The RAPIDS script is implemented for single-machine scenarios. An example supporting multiple nodes will be published later.
|
|
||||||
|
|
||||||
|
|
||||||
The next step upload the data into the [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) under reference <span style="font-family: Courier New;">fileroot = mortgage_2000-2001</span>
|
|
||||||
|
|
||||||
The step takes several minutes to load the data, the output provides a progress indicator.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Once the data has been loaded into the Azure Machine LEarning Data Store, in subsequent run, the user can comment out the ds.upload line and just make reference to the <span style="font-family: Courier New;">mortgage_2000-2001</blog> data store reference
|
|
||||||
|
|
||||||
|
|
||||||
#### Setting up required libraries and environment to run RAPIDS code
|
|
||||||
There are two options to setup the environment to run RAPIDS code. The following steps shows how to ues a prebuilt conda environment. A recommended alternative is to specify a base Docker image and package dependencies. You can find sample code for that in the notebook.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Wrapper function to submit the RAPIDS script as an Azure Machine Learning experiment
|
|
||||||
|
|
||||||
The next step consists of the definition of a wrapper function to be used when the user attempts to run the RAPIDS script with different arguments. It takes as arguments: <span style="font-family: Times New Roman;">*cpu\_training*</span>; a flag that indicates if the run is meant to be processed with CPU-only, <span style="font-family: Times New Roman;">*gpu\_count*</span>; the number of GPUs to be used if they are meant to be used and part_count: the number of data partitions to be used
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The core of the function resides in configuring the run by the instantiation of a ScriptRunConfig object, which defines the source_directory for the script to be executed, the name of the script and the arguments to be passed to the script.
|
|
||||||
In addition to the wrapper function arguments, two other arguments are passed: <span style="font-family: Times New Roman;">*data\_dir*</span>, the directory where the data is stored and <span style="font-family: Times New Roman;">*end_year*</span> is the largest year to use partition from.
|
|
||||||
|
|
||||||
|
|
||||||
As mentioned earlier the size of the data that can be processed increases with the number of gpus, in the function, dictionary <span style="font-family: Times New Roman;">*max\_gpu\_count\_data\_partition_mapping*</span> maps the maximum number of partitions that we empirically found that the system can handle given the number of GPUs used. The function throws a warning when the number of partitions for a given number of gpus exceeds the maximum but the script is still executed, however the user should expect an error as an out of memory situation would be encountered
|
|
||||||
If the user wants to use a different RAPIDS script, the reference to the process_data.py script has to be changed
|
|
||||||
|
|
||||||
|
|
||||||
#### Submitting Experiments
|
|
||||||
We are ready to submit experiments: launching the RAPIDS script with different sets of parameters.
|
|
||||||
|
|
||||||
|
|
||||||
The following couple of steps submit experiments under different conditions.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The user can change variable num\_gpu between one and the number of GPUs supported by the chosen vmSize. Variable part\_count can take any value between 1 and 11, but if it exceeds the maximum for num_gpu, the run would result in an error
|
|
||||||
|
|
||||||
|
|
||||||
If the experiment is successfully submitted, it would be placed on a queue for processing, its status would appeared as Queued and an output like the following would appear
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
When the experiment starts running, its status would appeared as Running and the output would change to something like this:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Reproducing the performance gains plot results on the Blog Post
|
|
||||||
When the run has finished successfully, its status would appeared as Completed and the output would change to something like this:
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Which is the output for an experiment run with three partitions and one GPU, notice that the reported processing time is 49.16 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
This output corresponds to a run with three partitions and two GPUs, notice that the reported processing time is 37.50 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and three GPUs, notice that the reported processing time is 24.40 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and four GPUs, notice that the reported processing time is 23.33 seconds just as depicted on the performance gains plot on the blogpost
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and using only CPU, notice that the reported processing time is 9 minutes and 1.21 seconds or 541.21 second just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with nine partitions and four GPUs, notice that the notebook throws a warning signaling that the number of partitions exceed the maximum that the system can handle with those many GPUs and the run ends up failing, hence having and status of Failed.
|
|
||||||
|
|
||||||
|
|
||||||
##### Freeing Resources
|
|
||||||
In the last step the notebook deletes the compute target. (This step is optional especially if the min_nodes in the cluster is set to 0 with which the cluster will scale down to 0 nodes when there is no usage.)
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
### RAPIDS Script
|
|
||||||
The Master Notebook runs experiments by launching a RAPIDS script with different sets of parameters. In this section, the RAPIDS script, process_data.py in the material, is analyzed
|
|
||||||
|
|
||||||
The script first imports all the necessary libraries and parses the arguments passed by the Master Notebook.
|
|
||||||
|
|
||||||
The all internal functions to be used by the script are defined.
|
|
||||||
|
|
||||||
|
|
||||||
#### Wrapper Auxiliary Functions:
|
|
||||||
The below functions are wrappers for a configuration module for librmm, the RAPIDS Memory Manager python interface:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
A couple of other functions are wrappers for the submission of jobs to the DASK client:
|
|
||||||
|
|
||||||

|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Data Loading Functions:
|
|
||||||
The data is loaded through the use of the following three functions
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
All three functions use library function cudf.read_csv(), cuDF version for the well known counterpart on Pandas.
|
|
||||||
|
|
||||||
|
|
||||||
#### Data Transformation and Feature Extraction Functions:
|
|
||||||
The raw data is transformed and processed to extract features by joining, slicing, grouping, aggregating, factoring, etc, the original dataframes just as is done with Pandas. The following functions in the script are used for that purpose:
|
|
||||||

|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Main() Function
|
|
||||||
The previous functions are used in the Main function to accomplish several steps: Set up the Dask client, do all ETL operations, set up and train an XGBoost model, the function also assigns which data needs to be processed by each Dask client
|
|
||||||
|
|
||||||
|
|
||||||
##### Setting Up DASK client:
|
|
||||||
The following lines:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
Initialize and set up a DASK client with a number of workers corresponding to the number of GPUs to be used on the run. A successful execution of the set up will result on the following output:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
##### All ETL functions are used on single calls to process\_quarter_gpu, one per data partition
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
##### Concentrating the data assigned to each DASK worker
|
|
||||||
The partitions assigned to each worker are concatenated and set up for training.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
##### Setting Training Parameters
|
|
||||||
The parameters used for the training of a gradient boosted decision tree model are set up in the following code block:
|
|
||||||

|
|
||||||
|
|
||||||
Notice how the parameters are modified when using the CPU-only mode.
|
|
||||||
|
|
||||||
|
|
||||||
##### Launching the training of a gradient boosted decision tree model using XGBoost.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
The outputs of the script can be observed in the master notebook as the script is executed
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Before Width: | Height: | Size: 180 KiB |
|
Before Width: | Height: | Size: 183 KiB |
|
Before Width: | Height: | Size: 183 KiB |
|
Before Width: | Height: | Size: 177 KiB |
|
Before Width: | Height: | Size: 5.0 KiB |
|
Before Width: | Height: | Size: 4.8 KiB |
|
Before Width: | Height: | Size: 3.2 KiB |
|
Before Width: | Height: | Size: 70 KiB |
|
Before Width: | Height: | Size: 64 KiB |
|
Before Width: | Height: | Size: 554 KiB |
|
Before Width: | Height: | Size: 213 KiB |
|
Before Width: | Height: | Size: 58 KiB |
|
Before Width: | Height: | Size: 34 KiB |
|
Before Width: | Height: | Size: 4.5 KiB |
|
Before Width: | Height: | Size: 187 KiB |
|
Before Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 9.7 KiB |
|
Before Width: | Height: | Size: 163 KiB |
|
Before Width: | Height: | Size: 3.5 KiB |
|
Before Width: | Height: | Size: 2.9 KiB |
|
Before Width: | Height: | Size: 2.5 KiB |
|
Before Width: | Height: | Size: 3.0 KiB |
|
Before Width: | Height: | Size: 60 KiB |
|
Before Width: | Height: | Size: 3.5 KiB |
|
Before Width: | Height: | Size: 3.9 KiB |
|
Before Width: | Height: | Size: 5.0 KiB |
|
Before Width: | Height: | Size: 4.0 KiB |
|
Before Width: | Height: | Size: 4.1 KiB |
|
Before Width: | Height: | Size: 4.5 KiB |
|
Before Width: | Height: | Size: 5.1 KiB |
|
Before Width: | Height: | Size: 3.9 KiB |
|
Before Width: | Height: | Size: 3.6 KiB |
|
Before Width: | Height: | Size: 120 KiB |
|
Before Width: | Height: | Size: 55 KiB |
|
Before Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 181 KiB |
|
Before Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 19 KiB |
|
Before Width: | Height: | Size: 45 KiB |
|
Before Width: | Height: | Size: 31 KiB |
|
Before Width: | Height: | Size: 29 KiB |
|
Before Width: | Height: | Size: 10 KiB |
|
Before Width: | Height: | Size: 18 KiB |
|
Before Width: | Height: | Size: 2.4 KiB |
|
Before Width: | Height: | Size: 2.5 KiB |
|
Before Width: | Height: | Size: 3.4 KiB |
|
Before Width: | Height: | Size: 4.8 KiB |
|
Before Width: | Height: | Size: 99 KiB |
@@ -1,35 +0,0 @@
|
|||||||
name: rapids
|
|
||||||
channels:
|
|
||||||
- nvidia
|
|
||||||
- numba
|
|
||||||
- conda-forge
|
|
||||||
- rapidsai
|
|
||||||
- defaults
|
|
||||||
- pytorch
|
|
||||||
|
|
||||||
dependencies:
|
|
||||||
- arrow-cpp=0.12.0
|
|
||||||
- bokeh
|
|
||||||
- cffi=1.11.5
|
|
||||||
- cmake=3.12
|
|
||||||
- cuda92
|
|
||||||
- cython==0.29
|
|
||||||
- dask=1.1.1
|
|
||||||
- distributed=1.25.3
|
|
||||||
- faiss-gpu=1.5.0
|
|
||||||
- numba=0.42
|
|
||||||
- numpy=1.15.4
|
|
||||||
- nvstrings
|
|
||||||
- pandas=0.23.4
|
|
||||||
- pyarrow=0.12.0
|
|
||||||
- scikit-learn
|
|
||||||
- scipy
|
|
||||||
- cudf
|
|
||||||
- cuml
|
|
||||||
- python=3.6.2
|
|
||||||
- jupyterlab
|
|
||||||
- pip:
|
|
||||||
- file:/rapids/xgboost/python-package/dist/xgboost-0.81-py3-none-any.whl
|
|
||||||
- git+https://github.com/rapidsai/dask-xgboost@dask-cudf
|
|
||||||
- git+https://github.com/rapidsai/dask-cudf@master
|
|
||||||
- git+https://github.com/rapidsai/dask-cuda@master
|
|
||||||
723
contrib/datadrift/azure-ml-datadrift.ipynb
Normal file
@@ -0,0 +1,723 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "rafarmah"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Track Data Drift between Training and Inference Data in Production \n",
|
||||||
|
"\n",
|
||||||
|
"With this notebook, you will learn how to enable the DataDrift service to automatically track and determine whether your inference data is drifting from the data your model was initially trained on. The DataDrift service provides metrics and visualizations to help stakeholders identify which specific features cause the concept drift to occur.\n",
|
||||||
|
"\n",
|
||||||
|
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
|
||||||
|
"\n",
|
||||||
|
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Prerequisites and Setup"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Install the DataDrift package\n",
|
||||||
|
"\n",
|
||||||
|
"Install the azureml-contrib-datadrift, azureml-opendatasets and lightgbm packages before running this notebook.\n",
|
||||||
|
"```\n",
|
||||||
|
"pip install azureml-contrib-datadrift\n",
|
||||||
|
"pip install lightgbm\n",
|
||||||
|
"```"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Import Dependencies"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"import json\n",
|
||||||
|
"import os\n",
|
||||||
|
"import time\n",
|
||||||
|
"from datetime import datetime, timedelta\n",
|
||||||
|
"\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import requests\n",
|
||||||
|
"from azureml.contrib.datadrift import DataDriftDetector, AlertConfiguration\n",
|
||||||
|
"from azureml.opendatasets import NoaaIsdWeather\n",
|
||||||
|
"from azureml.core import Dataset, Workspace, Run\n",
|
||||||
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Set up Configuraton and Create Azure ML Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
|
||||||
|
"prefix = \"dd\"\n",
|
||||||
|
"\n",
|
||||||
|
"# NOTE: Please do not change the model_name, as it's required by the score.py file\n",
|
||||||
|
"model_name = \"driftmodel\"\n",
|
||||||
|
"image_name = \"{}driftimage\".format(prefix)\n",
|
||||||
|
"service_name = \"{}driftservice\".format(prefix)\n",
|
||||||
|
"\n",
|
||||||
|
"# optionally, set email address to receive an email alert for DataDrift\n",
|
||||||
|
"email_address = \"\""
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Generate Train/Testing Data\n",
|
||||||
|
"\n",
|
||||||
|
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
|
||||||
|
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
|
||||||
|
" '725486', '727883', '723177', '722075', '723086', '724053',\n",
|
||||||
|
" '725070', '722073', '726060', '725224', '725260', '724520',\n",
|
||||||
|
" '720305', '724020', '726510', '725126', '722523', '703333',\n",
|
||||||
|
" '722249', '722728', '725483', '722972', '724975', '742079',\n",
|
||||||
|
" '727468', '722193', '725624', '722030', '726380', '720309',\n",
|
||||||
|
" '722071', '720326', '725415', '724504', '725665', '725424',\n",
|
||||||
|
" '725066']\n",
|
||||||
|
"\n",
|
||||||
|
"columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"def enrich_weather_noaa_data(noaa_df):\n",
|
||||||
|
" hours_in_day = 23\n",
|
||||||
|
" week_in_year = 52\n",
|
||||||
|
" \n",
|
||||||
|
" noaa_df[\"hour\"] = noaa_df[\"datetime\"].dt.hour\n",
|
||||||
|
" noaa_df[\"weekofyear\"] = noaa_df[\"datetime\"].dt.week\n",
|
||||||
|
" \n",
|
||||||
|
" noaa_df[\"sine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||||
|
" noaa_df[\"cosine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||||
|
"\n",
|
||||||
|
" noaa_df[\"sine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||||
|
" noaa_df[\"cosine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||||
|
" \n",
|
||||||
|
" return noaa_df\n",
|
||||||
|
"\n",
|
||||||
|
"def add_window_col(input_df):\n",
|
||||||
|
" shift_interval = pd.Timedelta('-7 days') # your X days interval\n",
|
||||||
|
" df_shifted = input_df.copy()\n",
|
||||||
|
" df_shifted['datetime'] = df_shifted['datetime'] - shift_interval\n",
|
||||||
|
" df_shifted.drop(list(input_df.columns.difference(['datetime', 'usaf', 'wban', 'sine_hourofday', 'temperature'])), axis=1, inplace=True)\n",
|
||||||
|
"\n",
|
||||||
|
" # merge, keeping only observations where -1 lag is present\n",
|
||||||
|
" df2 = pd.merge(input_df,\n",
|
||||||
|
" df_shifted,\n",
|
||||||
|
" on=['datetime', 'usaf', 'wban', 'sine_hourofday'],\n",
|
||||||
|
" how='inner', # use 'left' to keep observations without lags\n",
|
||||||
|
" suffixes=['', '-7'])\n",
|
||||||
|
" return df2\n",
|
||||||
|
"\n",
|
||||||
|
"def get_noaa_data(start_time, end_time, cols, station_list):\n",
|
||||||
|
" isd = NoaaIsdWeather(start_time, end_time, cols=cols)\n",
|
||||||
|
" # Read into Pandas data frame.\n",
|
||||||
|
" noaa_df = isd.to_pandas_dataframe()\n",
|
||||||
|
" noaa_df = noaa_df.rename(columns={\"stationName\": \"station_name\"})\n",
|
||||||
|
" \n",
|
||||||
|
" df_filtered = noaa_df[noaa_df[\"usaf\"].isin(station_list)]\n",
|
||||||
|
" df_filtered.reset_index(drop=True)\n",
|
||||||
|
" \n",
|
||||||
|
" # Enrich with time features\n",
|
||||||
|
" df_enriched = enrich_weather_noaa_data(df_filtered)\n",
|
||||||
|
" \n",
|
||||||
|
" return df_enriched\n",
|
||||||
|
"\n",
|
||||||
|
"def get_featurized_noaa_df(start_time, end_time, cols, station_list):\n",
|
||||||
|
" df_1 = get_noaa_data(start_time - timedelta(days=7), start_time - timedelta(seconds=1), cols, station_list)\n",
|
||||||
|
" df_2 = get_noaa_data(start_time, end_time, cols, station_list)\n",
|
||||||
|
" noaa_df = pd.concat([df_1, df_2])\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Adding window feature\")\n",
|
||||||
|
" df_window = add_window_col(noaa_df)\n",
|
||||||
|
" \n",
|
||||||
|
" cat_columns = df_window.dtypes == object\n",
|
||||||
|
" cat_columns = cat_columns[cat_columns == True]\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Encoding categorical columns\")\n",
|
||||||
|
" df_encoded = pd.get_dummies(df_window, columns=cat_columns.keys().tolist())\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Dropping unnecessary columns\")\n",
|
||||||
|
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
|
||||||
|
" \n",
|
||||||
|
" return df_featurized"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Train model on Jan 1 - 14, 2009 data\n",
|
||||||
|
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
|
||||||
|
"df.head()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"label = \"temperature\"\n",
|
||||||
|
"x_df = df.drop(label, axis=1)\n",
|
||||||
|
"y_df = df[[label]]\n",
|
||||||
|
"x_train, x_test, y_train, y_test = train_test_split(df, y_df, test_size=0.2, random_state=223)\n",
|
||||||
|
"print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)\n",
|
||||||
|
"\n",
|
||||||
|
"training_dir = 'outputs/training'\n",
|
||||||
|
"training_file = \"training.csv\"\n",
|
||||||
|
"\n",
|
||||||
|
"# Generate training dataframe to register as Training Dataset\n",
|
||||||
|
"os.makedirs(training_dir, exist_ok=True)\n",
|
||||||
|
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
|
||||||
|
"training_df.to_csv(training_dir + \"/\" + training_file)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create/Register Training Dataset"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"dataset_name = \"dataset\"\n",
|
||||||
|
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
|
||||||
|
"snapshot_name = \"snapshot-{}\".format(name_suffix)\n",
|
||||||
|
"\n",
|
||||||
|
"dstore = ws.get_default_datastore()\n",
|
||||||
|
"dstore.upload(training_dir, \"data/training\", show_progress=True)\n",
|
||||||
|
"dpath = dstore.path(\"data/training/training.csv\")\n",
|
||||||
|
"trainingDataset = Dataset.auto_read_files(dpath, include_path=True)\n",
|
||||||
|
"trainingDataset = trainingDataset.register(workspace=ws, name=dataset_name, description=\"dset\", exist_ok=True)\n",
|
||||||
|
"\n",
|
||||||
|
"datasets = [(Dataset.Scenario.TRAINING, trainingDataset)]\n",
|
||||||
|
"print(\"dataset registration done.\\n\")\n",
|
||||||
|
"datasets"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train and Save Model"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"import lightgbm as lgb\n",
|
||||||
|
"\n",
|
||||||
|
"train = lgb.Dataset(data=x_train, \n",
|
||||||
|
" label=y_train)\n",
|
||||||
|
"\n",
|
||||||
|
"test = lgb.Dataset(data=x_test, \n",
|
||||||
|
" label=y_test,\n",
|
||||||
|
" reference=train)\n",
|
||||||
|
"\n",
|
||||||
|
"params = {'learning_rate' : 0.1,\n",
|
||||||
|
" 'boosting' : 'gbdt',\n",
|
||||||
|
" 'metric' : 'rmse',\n",
|
||||||
|
" 'feature_fraction' : 1,\n",
|
||||||
|
" 'bagging_fraction' : 1,\n",
|
||||||
|
" 'max_depth': 6,\n",
|
||||||
|
" 'num_leaves' : 31,\n",
|
||||||
|
" 'objective' : 'regression',\n",
|
||||||
|
" 'bagging_freq' : 1,\n",
|
||||||
|
" \"verbose\": -1,\n",
|
||||||
|
" 'min_data_per_leaf': 100}\n",
|
||||||
|
"\n",
|
||||||
|
"model = lgb.train(params, \n",
|
||||||
|
" num_boost_round=500,\n",
|
||||||
|
" train_set=train,\n",
|
||||||
|
" valid_sets=[train, test],\n",
|
||||||
|
" verbose_eval=50,\n",
|
||||||
|
" early_stopping_rounds=25)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
|
||||||
|
"\n",
|
||||||
|
"os.makedirs('outputs', exist_ok=True)\n",
|
||||||
|
"joblib.dump(model, model_file)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register Model"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"model = Model.register(model_path=model_file,\n",
|
||||||
|
" model_name=model_name,\n",
|
||||||
|
" workspace=ws,\n",
|
||||||
|
" datasets=datasets)\n",
|
||||||
|
"\n",
|
||||||
|
"print(model_name, image_name, service_name, model)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Deploy Model To AKS"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prepare Environment"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
|
||||||
|
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
|
" f.write(myenv.serialize_to_string())"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Image"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Image creation may take up to 15 minutes.\n",
|
||||||
|
"\n",
|
||||||
|
"image_name = image_name + str(model.version)\n",
|
||||||
|
"\n",
|
||||||
|
"if not image_name in ws.images:\n",
|
||||||
|
" # Use the score.py defined in this directory as the execution script\n",
|
||||||
|
" # NOTE: The Model Data Collector must be enabled in the execution script for DataDrift to run correctly\n",
|
||||||
|
" image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
|
||||||
|
" runtime=\"python\",\n",
|
||||||
|
" conda_file=\"myenv.yml\",\n",
|
||||||
|
" description=\"Image with weather dataset model\")\n",
|
||||||
|
" image = ContainerImage.create(name=image_name,\n",
|
||||||
|
" models=[model],\n",
|
||||||
|
" image_config=image_config,\n",
|
||||||
|
" workspace=ws)\n",
|
||||||
|
"\n",
|
||||||
|
" image.wait_for_creation(show_output=True)\n",
|
||||||
|
"else:\n",
|
||||||
|
" image = ws.images[image_name]"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Compute Target"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"aks_name = 'dd-demo-e2e'\n",
|
||||||
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
|
"\n",
|
||||||
|
"if not aks_name in ws.compute_targets:\n",
|
||||||
|
" aks_target = ComputeTarget.create(workspace=ws,\n",
|
||||||
|
" name=aks_name,\n",
|
||||||
|
" provisioning_configuration=prov_config)\n",
|
||||||
|
"\n",
|
||||||
|
" aks_target.wait_for_completion(show_output=True)\n",
|
||||||
|
" print(aks_target.provisioning_state)\n",
|
||||||
|
" print(aks_target.provisioning_errors)\n",
|
||||||
|
"else:\n",
|
||||||
|
" aks_target=ws.compute_targets[aks_name]"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy Service"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"aks_service_name = service_name\n",
|
||||||
|
"\n",
|
||||||
|
"if not aks_service_name in ws.webservices:\n",
|
||||||
|
" aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)\n",
|
||||||
|
" aks_service = Webservice.deploy_from_image(workspace=ws,\n",
|
||||||
|
" name=aks_service_name,\n",
|
||||||
|
" image=image,\n",
|
||||||
|
" deployment_config=aks_config,\n",
|
||||||
|
" deployment_target=aks_target)\n",
|
||||||
|
" aks_service.wait_for_deployment(show_output=True)\n",
|
||||||
|
" print(aks_service.state)\n",
|
||||||
|
"else:\n",
|
||||||
|
" aks_service = ws.webservices[aks_service_name]"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Run DataDrift Analysis"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Send Scoring Data to Service"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Download Scoring Data"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Score Model on March 15, 2016 data\n",
|
||||||
|
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
|
||||||
|
"# Add the window feature column\n",
|
||||||
|
"scoring_df = add_window_col(scoring_df)\n",
|
||||||
|
"\n",
|
||||||
|
"# Drop features not used by the model\n",
|
||||||
|
"print(\"Dropping unnecessary columns\")\n",
|
||||||
|
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
|
||||||
|
"scoring_df.head()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
|
||||||
|
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
|
||||||
|
"extra_cols = ('Path', 'Column1')\n",
|
||||||
|
"for k in extra_cols:\n",
|
||||||
|
" columns_dict.pop(k, None)\n",
|
||||||
|
"training_columns = list(columns_dict.keys())\n",
|
||||||
|
"\n",
|
||||||
|
"categorical_columns = scoring_df.dtypes == object\n",
|
||||||
|
"categorical_columns = categorical_columns[categorical_columns == True]\n",
|
||||||
|
"\n",
|
||||||
|
"test_df = pd.get_dummies(scoring_df[categorical_columns.keys().tolist()])\n",
|
||||||
|
"encoded_df = scoring_df.join(test_df)\n",
|
||||||
|
"\n",
|
||||||
|
"# Populate missing OHE columns with 0 values to match traning dataset schema\n",
|
||||||
|
"difference = list(set(training_columns) - set(encoded_df.columns.tolist()))\n",
|
||||||
|
"for col in difference:\n",
|
||||||
|
" encoded_df[col] = 0\n",
|
||||||
|
"encoded_df.head()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Serialize dataframe to list of row dictionaries\n",
|
||||||
|
"encoded_dict = encoded_df.to_dict('records')"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Submit Scoring Data to Service"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"\n",
|
||||||
|
"# retreive the API keys. AML generates two keys.\n",
|
||||||
|
"key1, key2 = aks_service.get_keys()\n",
|
||||||
|
"\n",
|
||||||
|
"total_count = len(scoring_df)\n",
|
||||||
|
"i = 0\n",
|
||||||
|
"load = []\n",
|
||||||
|
"for row in encoded_dict:\n",
|
||||||
|
" load.append(row)\n",
|
||||||
|
" i = i + 1\n",
|
||||||
|
" if i % 100 == 0:\n",
|
||||||
|
" payload = json.dumps({\"data\": load})\n",
|
||||||
|
" \n",
|
||||||
|
" # construct raw HTTP request and send to the service\n",
|
||||||
|
" payload_binary = bytes(payload,encoding = 'utf8')\n",
|
||||||
|
" headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
|
||||||
|
" resp = requests.post(aks_service.scoring_uri, payload_binary, headers=headers)\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"prediction:\", resp.content, \"Progress: {}/{}\".format(i, total_count)) \n",
|
||||||
|
"\n",
|
||||||
|
" load = []\n",
|
||||||
|
" time.sleep(3)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We need to wait up to 10 minutes for the Model Data Collector to dump the model input and inference data to storage in the Workspace, where it's used by the DataDriftDetector job."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"time.sleep(600)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Configure DataDrift"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"services = [service_name]\n",
|
||||||
|
"start = datetime.now() - timedelta(days=2)\n",
|
||||||
|
"end = datetime(year=2020, month=1, day=22, hour=15, minute=16)\n",
|
||||||
|
"feature_list = ['usaf', 'wban', 'latitude', 'longitude', 'station_name', 'p_k', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n",
|
||||||
|
"alert_config = AlertConfiguration([email_address]) if email_address else None\n",
|
||||||
|
"\n",
|
||||||
|
"# there will be an exception indicating using get() method if DataDrift object already exist\n",
|
||||||
|
"try:\n",
|
||||||
|
" datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", alert_config=alert_config)\n",
|
||||||
|
"except KeyError:\n",
|
||||||
|
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
|
||||||
|
" \n",
|
||||||
|
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Run an Adhoc DataDriftDetector Run"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"target_date = datetime.today()\n",
|
||||||
|
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"exp = Experiment(ws, datadrift._id)\n",
|
||||||
|
"dd_run = Run(experiment=exp, run_id=run)\n",
|
||||||
|
"RunDetails(dd_run).show()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Get Drift Analysis Results"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"children = list(dd_run.get_children())\n",
|
||||||
|
"for child in children:\n",
|
||||||
|
" child.wait_for_completion()\n",
|
||||||
|
"\n",
|
||||||
|
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
|
||||||
|
"drift_metrics"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Show all drift figures, one per serivice.\n",
|
||||||
|
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
|
||||||
|
"\n",
|
||||||
|
"drift_figures = datadrift.show(with_details=True)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Enable DataDrift Schedule"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"datadrift.enable_schedule()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
8
contrib/datadrift/azure-ml-datadrift.yml
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
name: azure-ml-datadrift
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-contrib-datadrift
|
||||||
|
- azureml-opendatasets
|
||||||
|
- lightgbm
|
||||||
|
- azureml-widgets
|
||||||
58
contrib/datadrift/score.py
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
import pickle
|
||||||
|
import json
|
||||||
|
import numpy
|
||||||
|
import azureml.train.automl
|
||||||
|
from sklearn.externals import joblib
|
||||||
|
from sklearn.linear_model import Ridge
|
||||||
|
from azureml.core.model import Model
|
||||||
|
from azureml.core.run import Run
|
||||||
|
from azureml.monitoring import ModelDataCollector
|
||||||
|
import time
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
|
||||||
|
def init():
|
||||||
|
global model, inputs_dc, prediction_dc, feature_names, categorical_features
|
||||||
|
|
||||||
|
print("Model is initialized" + time.strftime("%H:%M:%S"))
|
||||||
|
model_path = Model.get_model_path(model_name="driftmodel")
|
||||||
|
model = joblib.load(model_path)
|
||||||
|
|
||||||
|
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
|
||||||
|
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
|
||||||
|
"temperature-7"]
|
||||||
|
|
||||||
|
categorical_features = ["usaf", "wban", "p_k", "station_name"]
|
||||||
|
|
||||||
|
inputs_dc = ModelDataCollector(model_name="driftmodel",
|
||||||
|
identifier="inputs",
|
||||||
|
feature_names=feature_names)
|
||||||
|
|
||||||
|
prediction_dc = ModelDataCollector("driftmodel",
|
||||||
|
identifier="predictions",
|
||||||
|
feature_names=["temperature"])
|
||||||
|
|
||||||
|
|
||||||
|
def run(raw_data):
|
||||||
|
global inputs_dc, prediction_dc
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = json.loads(raw_data)["data"]
|
||||||
|
data = pd.DataFrame(data)
|
||||||
|
|
||||||
|
# Remove the categorical features as the model expects OHE values
|
||||||
|
input_data = data.drop(categorical_features, axis=1)
|
||||||
|
|
||||||
|
result = model.predict(input_data)
|
||||||
|
|
||||||
|
# Collect the non-OHE dataframe
|
||||||
|
collected_df = data[feature_names]
|
||||||
|
|
||||||
|
inputs_dc.collect(collected_df.values)
|
||||||
|
prediction_dc.collect(result)
|
||||||
|
return result.tolist()
|
||||||
|
except Exception as e:
|
||||||
|
error = str(e)
|
||||||
|
|
||||||
|
print(error + time.strftime("%H:%M:%S"))
|
||||||
|
return error
|
||||||
@@ -179,6 +179,26 @@ jupyter notebook
|
|||||||
- Simple example of using automated ML for classification with ONNX models
|
- Simple example of using automated ML for classification with ONNX models
|
||||||
- Uses local compute for training
|
- Uses local compute for training
|
||||||
|
|
||||||
|
- [auto-ml-bank-marketing-subscribers-with-deployment.ipynb](bank-marketing-subscribers-with-deployment/auto-ml-bank-marketing-with-deployment.ipynb)
|
||||||
|
- Dataset: UCI's [bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
|
||||||
|
- Simple example of using automated ML for classification to predict term deposit subscriptions for a bank
|
||||||
|
- Uses azure compute for training
|
||||||
|
|
||||||
|
- [auto-ml-creditcard-with-deployment.ipynb](credit-card-fraud-detection-with-deployment/auto-ml-creditcard-with-deployment.ipynb)
|
||||||
|
- Dataset: Kaggle's [credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
|
||||||
|
- Simple example of using automated ML for classification to fraudulent credit card transactions
|
||||||
|
- Uses azure compute for training
|
||||||
|
|
||||||
|
- [auto-ml-hardware-performance-with-deployment.ipynb](hardware-performance-prediction-with-deployment/auto-ml-hardware-performance-with-deployment.ipynb)
|
||||||
|
- Dataset: UCI's [computer hardware dataset](https://archive.ics.uci.edu/ml/datasets/Computer+Hardware)
|
||||||
|
- Simple example of using automated ML for regression to predict the performance of certain combinations of hardware components
|
||||||
|
- Uses azure compute for training
|
||||||
|
|
||||||
|
- [auto-ml-concrete-strength-with-deployment.ipynb](predicting-concrete-strength-with-deployment/auto-ml-concrete-strength-with-deployment.ipynb)
|
||||||
|
- Dataset: UCI's [concrete compressive strength dataset](https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set)
|
||||||
|
- Simple example of using automated ML for regression to predict the strength predict the compressive strength of concrete based off of different ingredient combinations and quantities of those ingredients
|
||||||
|
- Uses azure compute for training
|
||||||
|
|
||||||
<a name="documentation"></a>
|
<a name="documentation"></a>
|
||||||
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
|
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
|
||||||
|
|
||||||
|
|||||||
@@ -2,6 +2,7 @@ name: azure_automl
|
|||||||
dependencies:
|
dependencies:
|
||||||
# The python interpreter version.
|
# The python interpreter version.
|
||||||
# Currently Azure ML only supports 3.5.2 and later.
|
# Currently Azure ML only supports 3.5.2 and later.
|
||||||
|
- pip
|
||||||
- python>=3.5.2,<3.6.8
|
- python>=3.5.2,<3.6.8
|
||||||
- nb_conda
|
- nb_conda
|
||||||
- matplotlib==2.1.0
|
- matplotlib==2.1.0
|
||||||
|
|||||||
@@ -1,22 +0,0 @@
|
|||||||
name: azure_automl
|
|
||||||
dependencies:
|
|
||||||
# The python interpreter version.
|
|
||||||
# Currently Azure ML only supports 3.5.2 and later.
|
|
||||||
- nomkl
|
|
||||||
- python>=3.5.2,<3.6.8
|
|
||||||
- nb_conda
|
|
||||||
- matplotlib==2.1.0
|
|
||||||
- numpy>=1.11.0,<=1.16.2
|
|
||||||
- cython
|
|
||||||
- urllib3<1.24
|
|
||||||
- scipy>=1.0.0,<=1.1.0
|
|
||||||
- scikit-learn>=0.19.0,<=0.20.3
|
|
||||||
- pandas>=0.22.0,<0.23.0
|
|
||||||
- py-xgboost<=0.80
|
|
||||||
|
|
||||||
- pip:
|
|
||||||
# Required packages for AzureML execution, history, and data preparation.
|
|
||||||
- azureml-sdk[automl,explain]
|
|
||||||
- azureml-widgets
|
|
||||||
- pandas_ml
|
|
||||||
|
|
||||||
@@ -9,6 +9,8 @@ IF "%automl_env_file%"=="" SET automl_env_file="automl_env.yml"
|
|||||||
|
|
||||||
IF NOT EXIST %automl_env_file% GOTO YmlMissing
|
IF NOT EXIST %automl_env_file% GOTO YmlMissing
|
||||||
|
|
||||||
|
IF "%CONDA_EXE%"=="" GOTO CondaMissing
|
||||||
|
|
||||||
call conda activate %conda_env_name% 2>nul:
|
call conda activate %conda_env_name% 2>nul:
|
||||||
|
|
||||||
if not errorlevel 1 (
|
if not errorlevel 1 (
|
||||||
@@ -42,6 +44,15 @@ IF NOT "%options%"=="nolaunch" (
|
|||||||
|
|
||||||
goto End
|
goto End
|
||||||
|
|
||||||
|
:CondaMissing
|
||||||
|
echo Please run this script from an Anaconda Prompt window.
|
||||||
|
echo You can start an Anaconda Prompt window by
|
||||||
|
echo typing Anaconda Prompt on the Start menu.
|
||||||
|
echo If you don't see the Anaconda Prompt app, install Miniconda.
|
||||||
|
echo If you are running an older version of Miniconda or Anaconda,
|
||||||
|
echo you can upgrade using the command: conda update conda
|
||||||
|
goto End
|
||||||
|
|
||||||
:YmlMissing
|
:YmlMissing
|
||||||
echo File %automl_env_file% not found.
|
echo File %automl_env_file% not found.
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,729 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Classification with Deployment using a Bank Marketing Dataset**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Results](#Results)\n",
|
||||||
|
"1. [Deploy](#Deploy)\n",
|
||||||
|
"1. [Test](#Test)\n",
|
||||||
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"\n",
|
||||||
|
"In this example we use the UCI Bank Marketing dataset to showcase how you can use AutoML for a classification problem and deploy it to an Azure Container Instance (ACI). The classification goal is to predict if the client will subscribe to a term deposit with the bank.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will learn how to:\n",
|
||||||
|
"1. Create an experiment using an existing workspace.\n",
|
||||||
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
|
"3. Train the model using local compute.\n",
|
||||||
|
"4. Explore the results.\n",
|
||||||
|
"5. Register the model.\n",
|
||||||
|
"6. Create a container image.\n",
|
||||||
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
|
"8. Test the ACI service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"\n",
|
||||||
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"import json\n",
|
||||||
|
"import logging\n",
|
||||||
|
"\n",
|
||||||
|
"from matplotlib import pyplot as plt\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import os\n",
|
||||||
|
"from sklearn import datasets\n",
|
||||||
|
"import azureml.dataprep as dprep\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# choose a name for experiment\n",
|
||||||
|
"experiment_name = 'automl-classification-bmarketing'\n",
|
||||||
|
"# project folder\n",
|
||||||
|
"project_folder = './sample_projects/automl-classification-bankmarketing'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment=Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Project Directory'] = project_folder\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||||
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
" \n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster.\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
" \n",
|
||||||
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
" \n",
|
||||||
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here load the data in the get_data() script to be utilized in azure compute. To do this first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_Run_config."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"if not os.path.isdir('data'):\n",
|
||||||
|
" os.mkdir('data')\n",
|
||||||
|
" \n",
|
||||||
|
"if not os.path.exists(project_folder):\n",
|
||||||
|
" os.makedirs(project_folder)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to AmlCompute\n",
|
||||||
|
"conda_run_config.target = compute_target\n",
|
||||||
|
"conda_run_config.environment.docker.enabled = True\n",
|
||||||
|
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here we create the script to be run in azure comput for loading the data, we load the bank marketing dataset into X_train and y_train. Next X_train and y_train is returned for training the model."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
|
||||||
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
|
"dflow.get_profile()\n",
|
||||||
|
"X_train = dflow.drop_columns(columns=['y'])\n",
|
||||||
|
"y_train = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
|
||||||
|
"dflow.head()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train\n",
|
||||||
|
"\n",
|
||||||
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"|Property|Description|\n",
|
||||||
|
"|-|-|\n",
|
||||||
|
"|**task**|classification or regression|\n",
|
||||||
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
|
"\n",
|
||||||
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
|
" \"iterations\": 10,\n",
|
||||||
|
" \"n_cross_validations\": 2,\n",
|
||||||
|
" \"primary_metric\": 'AUC_weighted',\n",
|
||||||
|
" \"preprocess\": True,\n",
|
||||||
|
" \"max_concurrent_iterations\": 5,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
|
" debug_log = 'automl_errors.log',\n",
|
||||||
|
" path = project_folder,\n",
|
||||||
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" X = X_train,\n",
|
||||||
|
" y = y_train,\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Results"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Widget for Monitoring Runs\n",
|
||||||
|
"\n",
|
||||||
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(remote_run).show() "
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy\n",
|
||||||
|
"\n",
|
||||||
|
"### Retrieve the Best Model\n",
|
||||||
|
"\n",
|
||||||
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"description = 'AutoML Model trained on bank marketing data to predict if a client will subscribe to a term deposit'\n",
|
||||||
|
"tags = None\n",
|
||||||
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
|
"\n",
|
||||||
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Scoring Script\n",
|
||||||
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy\n",
|
||||||
|
"import azureml.train.automl\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
"\n",
|
||||||
|
"def run(rawdata):\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(rawdata)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return json.dumps({\"error\": result})\n",
|
||||||
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a YAML File for the Environment"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
|
||||||
|
" pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||||
|
"\n",
|
||||||
|
"# Substitute the actual model id in the script file.\n",
|
||||||
|
"\n",
|
||||||
|
"script_file_name = 'score.py'\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a Container Image\n",
|
||||||
|
"\n",
|
||||||
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
|
"or when testing a model that is under development."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
|
" execution_script = script_file_name,\n",
|
||||||
|
" conda_file = conda_env_file_name,\n",
|
||||||
|
" tags = {'area': \"bmData\", 'type': \"automl_classification\"},\n",
|
||||||
|
" description = \"Image for automl classification sample\")\n",
|
||||||
|
"\n",
|
||||||
|
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||||
|
" # this is the model object \n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config, \n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"if image.creation_state == 'Failed':\n",
|
||||||
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'area': \"bmData\", 'type': \"automl_classification\"}, \n",
|
||||||
|
" description = 'sample service for Automl Classification')"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-sample-bankmarketing'\n",
|
||||||
|
"print(aci_service_name)\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Delete a Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Deletes the specified web service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Gets logs from a deployed web service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#aci_service.get_logs()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Test\n",
|
||||||
|
"\n",
|
||||||
|
"Now that the model is trained split our data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Load the bank marketing datasets.\n",
|
||||||
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"from numpy import array"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n",
|
||||||
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
|
"dflow.get_profile()\n",
|
||||||
|
"X_test = dflow.drop_columns(columns=['y'])\n",
|
||||||
|
"y_test = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
|
||||||
|
"dflow.head()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"X_test = X_test.to_pandas_dataframe()\n",
|
||||||
|
"y_test = y_test.to_pandas_dataframe()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
|
"actual = array(y_test)\n",
|
||||||
|
"actual = actual[:,0]\n",
|
||||||
|
"print(y_pred.shape, \" \", actual.shape)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Calculate metrics for the prediction\n",
|
||||||
|
"\n",
|
||||||
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
|
"from the trained model that was returned."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"test_pred = plt.scatter(actual, y_pred, color='b')\n",
|
||||||
|
"test_test = plt.scatter(actual, actual, color='g')\n",
|
||||||
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Acknowledgements"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .\n",
|
||||||
|
"\n",
|
||||||
|
"_**Acknowledgements**_\n",
|
||||||
|
"This data set is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing\n",
|
||||||
|
"\n",
|
||||||
|
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-classification-bank-marketing
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -0,0 +1,712 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Classification with Deployment using Credit Card Dataset**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Results](#Results)\n",
|
||||||
|
"1. [Deploy](#Deploy)\n",
|
||||||
|
"1. [Test](#Test)\n",
|
||||||
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"\n",
|
||||||
|
"In this example we use the associated credit card dataset to showcase how you can use AutoML for a simple classification problem and deploy it to an Azure Container Instance (ACI). The classification goal is to predict if a creditcard transaction is or is not considered a fraudulent charge.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will learn how to:\n",
|
||||||
|
"1. Create an experiment using an existing workspace.\n",
|
||||||
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
|
"3. Train the model using local compute.\n",
|
||||||
|
"4. Explore the results.\n",
|
||||||
|
"5. Register the model.\n",
|
||||||
|
"6. Create a container image.\n",
|
||||||
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
|
"8. Test the ACI service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"\n",
|
||||||
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"import logging\n",
|
||||||
|
"\n",
|
||||||
|
"from matplotlib import pyplot as plt\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import os\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"import azureml.dataprep as dprep\n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# choose a name for experiment\n",
|
||||||
|
"experiment_name = 'automl-classification-ccard'\n",
|
||||||
|
"# project folder\n",
|
||||||
|
"project_folder = './sample_projects/automl-classification-creditcard'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment=Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Project Directory'] = project_folder\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||||
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
" \n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster.\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
" \n",
|
||||||
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
" \n",
|
||||||
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"if not os.path.isdir('data'):\n",
|
||||||
|
" os.mkdir('data')\n",
|
||||||
|
" \n",
|
||||||
|
"if not os.path.exists(project_folder):\n",
|
||||||
|
" os.makedirs(project_folder)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to AmlCompute\n",
|
||||||
|
"conda_run_config.target = compute_target\n",
|
||||||
|
"conda_run_config.environment.docker.enabled = True\n",
|
||||||
|
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data using train_test_split and return X_train and y_train for training the model."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
|
||||||
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
|
"dflow.get_profile()\n",
|
||||||
|
"X = dflow.drop_columns(columns=['Class'])\n",
|
||||||
|
"y = dflow.keep_columns(columns=['Class'], validate_column_exists=True)\n",
|
||||||
|
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
||||||
|
"y_train, y_test = y.random_split(percentage=0.8, seed=223)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train\n",
|
||||||
|
"\n",
|
||||||
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"|Property|Description|\n",
|
||||||
|
"|-|-|\n",
|
||||||
|
"|**task**|classification or regression|\n",
|
||||||
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
|
"\n",
|
||||||
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
|
" \"iterations\": 10,\n",
|
||||||
|
" \"n_cross_validations\": 2,\n",
|
||||||
|
" \"primary_metric\": 'average_precision_score_weighted',\n",
|
||||||
|
" \"preprocess\": True,\n",
|
||||||
|
" \"max_concurrent_iterations\": 5,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
|
" debug_log = 'automl_errors_20190417.log',\n",
|
||||||
|
" path = project_folder,\n",
|
||||||
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" X = X_train,\n",
|
||||||
|
" y = y_train,\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Results"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Widget for Monitoring Runs\n",
|
||||||
|
"\n",
|
||||||
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(remote_run).show() "
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy\n",
|
||||||
|
"\n",
|
||||||
|
"### Retrieve the Best Model\n",
|
||||||
|
"\n",
|
||||||
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"description = 'AutoML Model'\n",
|
||||||
|
"tags = None\n",
|
||||||
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
|
"\n",
|
||||||
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Scoring Script\n",
|
||||||
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy\n",
|
||||||
|
"import azureml.train.automl\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
"\n",
|
||||||
|
"def run(rawdata):\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(rawdata)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return json.dumps({\"error\": result})\n",
|
||||||
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a YAML File for the Environment"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
|
||||||
|
" pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||||
|
"\n",
|
||||||
|
"# Substitute the actual model id in the script file.\n",
|
||||||
|
"\n",
|
||||||
|
"script_file_name = 'score.py'\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a Container Image\n",
|
||||||
|
"\n",
|
||||||
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
|
"or when testing a model that is under development."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
|
" execution_script = script_file_name,\n",
|
||||||
|
" conda_file = conda_env_file_name,\n",
|
||||||
|
" tags = {'area': \"cards\", 'type': \"automl_classification\"},\n",
|
||||||
|
" description = \"Image for automl classification sample\")\n",
|
||||||
|
"\n",
|
||||||
|
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||||
|
" # this is the model object \n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config, \n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"if image.creation_state == 'Failed':\n",
|
||||||
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'area': \"cards\", 'type': \"automl_classification\"}, \n",
|
||||||
|
" description = 'sample service for Automl Classification')"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-sample-creditcard'\n",
|
||||||
|
"print(aci_service_name)\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Delete a Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Deletes the specified web service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Gets logs from a deployed web service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#aci_service.get_logs()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Test\n",
|
||||||
|
"\n",
|
||||||
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#Randomly select and test\n",
|
||||||
|
"X_test = X_test.to_pandas_dataframe()\n",
|
||||||
|
"y_test = y_test.to_pandas_dataframe()\n"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
|
"y_pred"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Calculate metrics for the prediction\n",
|
||||||
|
"\n",
|
||||||
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
|
"from the trained model that was returned."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#Randomly select and test\n",
|
||||||
|
"# Plot outputs\n",
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"test_pred = plt.scatter(y_test, y_pred, color='b')\n",
|
||||||
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.show()\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Acknowledgements"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"This Credit Card fraud Detection dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ and is available at: https://www.kaggle.com/mlg-ulb/creditcardfraud\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Universit\u00c3\u00a9 Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project\n",
|
||||||
|
"Please cite the following works: \n",
|
||||||
|
"\u00e2\u20ac\u00a2\tAndrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015\n",
|
||||||
|
"\u00e2\u20ac\u00a2\tDal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon\n",
|
||||||
|
"\u00e2\u20ac\u00a2\tDal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE\n",
|
||||||
|
"o\tDal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)\n",
|
||||||
|
"\u00e2\u20ac\u00a2\tCarcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier\n",
|
||||||
|
"\u00e2\u20ac\u00a2\tCarcillo, Fabrizio; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-classification-credit-card-fraud
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,10 +53,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Deploy](#Deploy)\n",
|
"1. [Deploy](#Deploy)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -50,22 +74,22 @@
|
|||||||
"6. Create a container image.\n",
|
"6. Create a container image.\n",
|
||||||
"7. Create an Azure Container Instance (ACI) service.\n",
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
"8. Test the ACI service."
|
"8. Test the ACI service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
@@ -80,13 +104,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -108,10 +132,10 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -128,13 +152,13 @@
|
|||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_train = digits.data[10:,:]\n",
|
"X_train = digits.data[10:,:]\n",
|
||||||
@@ -150,36 +174,36 @@
|
|||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy\n",
|
"## Deploy\n",
|
||||||
@@ -187,50 +211,50 @@
|
|||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register the Fitted Model for Deployment\n",
|
"### Register the Fitted Model for Deployment\n",
|
||||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = local_run.register_model(description = description, tags = tags)\n",
|
"model = local_run.register_model(description = description, tags = tags)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(local_run.model_id) # This will be written to the script file later in the notebook."
|
"print(local_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Scoring Script"
|
"### Create Scoring Script"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -256,56 +280,56 @@
|
|||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return json.dumps({\"error\": result})\n",
|
" return json.dumps({\"error\": result})\n",
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a YAML File for the Environment"
|
"### Create a YAML File for the Environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. The following cells create a file, myenv.yml, which specifies the dependencies from the run."
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. The following cells create a file, myenv.yml, which specifies the dependencies from the run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)"
|
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dependencies = ml_run.get_run_sdk_dependencies(iteration = 7)"
|
"dependencies = ml_run.get_run_sdk_dependencies(iteration = 7)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -314,13 +338,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"conda_env_file_name = 'myenv.yml'\n",
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Substitute the actual version number in the environment file.\n",
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
@@ -341,20 +365,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a Container Image"
|
"### Create a Container Image"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -374,20 +398,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if image.creation_state == 'Failed':\n",
|
"if image.creation_state == 'Failed':\n",
|
||||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy the Image as a Web Service on Azure Container Instance"
|
"### Deploy the Image as a Web Service on Azure Container Instance"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -395,13 +419,13 @@
|
|||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n",
|
" tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n",
|
||||||
" description = 'sample service for Automl Classification')"
|
" description = 'sample service for Automl Classification')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -413,52 +437,52 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete a Web Service"
|
"### Delete a Web Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"#aci_service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get Logs from a Deployed Web Service"
|
"### Get Logs from a Deployed Web Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.get_logs()"
|
"#aci_service.get_logs()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test"
|
"## Test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Randomly select digits and test\n",
|
"#Randomly select digits and test\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
@@ -478,33 +502,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-classification-with-deployment
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -31,10 +55,10 @@
|
|||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -50,22 +74,22 @@
|
|||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"3. Train the model using local compute with ONNX compatible config on.\n",
|
"3. Train the model using local compute with ONNX compatible config on.\n",
|
||||||
"4. Explore the results and save the ONNX model."
|
"4. Explore the results and save the ONNX model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -79,13 +103,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig, constants"
|
"from azureml.train.automl import AutoMLConfig, constants"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -106,22 +130,22 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
|
"This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iris = datasets.load_iris()\n",
|
"iris = datasets.load_iris()\n",
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
|
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
|
||||||
@@ -129,15 +153,31 @@
|
|||||||
" test_size=0.2, \n",
|
" test_size=0.2, \n",
|
||||||
" random_state=0)\n",
|
" random_state=0)\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Ensure the x_train and x_test are pandas DataFrame."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
||||||
"# This is needed for initializing the input variable names of ONNX model, \n",
|
"# This is needed for initializing the input variable names of ONNX model, \n",
|
||||||
"# and the prediction with the ONNX model using the inference helper.\n",
|
"# and the prediction with the ONNX model using the inference helper.\n",
|
||||||
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
||||||
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
|
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train with enable ONNX compatible models config on\n",
|
"## Train with enable ONNX compatible models config on\n",
|
||||||
@@ -156,13 +196,20 @@
|
|||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n",
|
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Set the preprocess=True, currently the InferenceHelper only supports this mode."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -175,43 +222,43 @@
|
|||||||
" preprocess=True,\n",
|
" preprocess=True,\n",
|
||||||
" enable_onnx_compatible_models=True,\n",
|
" enable_onnx_compatible_models=True,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -219,20 +266,20 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best ONNX Model\n",
|
"### Retrieve the Best ONNX Model\n",
|
||||||
@@ -240,47 +287,47 @@
|
|||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
|
"Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)"
|
"best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Save the best ONNX model"
|
"### Save the best ONNX model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
||||||
"onnx_fl_path = \"./best_model.onnx\"\n",
|
"onnx_fl_path = \"./best_model.onnx\"\n",
|
||||||
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Predict with the ONNX model, using onnxruntime package"
|
"### Predict with the ONNX model, using onnxruntime package"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import sys\n",
|
"import sys\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
@@ -299,7 +346,7 @@
|
|||||||
" onnxrt_present = False\n",
|
" onnxrt_present = False\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def get_onnx_res(run):\n",
|
"def get_onnx_res(run):\n",
|
||||||
" res_path = '_debug_y_trans_converter.json'\n",
|
" res_path = 'onnx_resource.json'\n",
|
||||||
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
|
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
|
||||||
" with open(res_path) as f:\n",
|
" with open(res_path) as f:\n",
|
||||||
" onnx_res = json.load(f)\n",
|
" onnx_res = json.load(f)\n",
|
||||||
@@ -316,43 +363,19 @@
|
|||||||
" print(pred_prob_onnx)\n",
|
" print(pred_prob_onnx)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" if not python_version_compatible:\n",
|
" if not python_version_compatible:\n",
|
||||||
" print('Please use Python version 3.6 to run the inference helper.') \n",
|
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
||||||
" if not onnxrt_present:\n",
|
" if not onnxrt_present:\n",
|
||||||
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": []
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
name: auto-ml-classification-with-onnx
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
|
- onnxruntime
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -50,22 +74,22 @@
|
|||||||
"3. Train the model on a whilelisted models using local compute. \n",
|
"3. Train the model on a whilelisted models using local compute. \n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Note: This notebook will install tensorflow if not already installed in the enviornment..\n",
|
"#Note: This notebook will install tensorflow if not already installed in the enviornment..\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
@@ -90,13 +114,13 @@
|
|||||||
" whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"]\n",
|
" whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -117,32 +141,32 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"y_train = digits.target[100:]"
|
"y_train = digits.target[100:]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -160,13 +184,13 @@
|
|||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
|
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -179,43 +203,43 @@
|
|||||||
" enable_tf=True,\n",
|
" enable_tf=True,\n",
|
||||||
" whitelist_models=whitelist_models,\n",
|
" whitelist_models=whitelist_models,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -223,32 +247,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -259,102 +283,102 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"print(third_run)\n",
|
"print(third_run)\n",
|
||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_test = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"We will try to predict 2 digits and see how our model works."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
@@ -367,33 +391,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-classification-with-whitelisting
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -31,10 +55,10 @@
|
|||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -49,22 +73,22 @@
|
|||||||
"3. Train the model using local compute.\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -77,10 +101,10 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Accessing the Azure ML workspace requires authentication with Azure.\n",
|
"Accessing the Azure ML workspace requires authentication with Azure.\n",
|
||||||
@@ -103,13 +127,13 @@
|
|||||||
"ws = Workspace.from_config(auth = auth)\n",
|
"ws = Workspace.from_config(auth = auth)\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
|
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -130,32 +154,32 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"y_train = digits.target[100:]"
|
"y_train = digits.target[100:]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -177,75 +201,75 @@
|
|||||||
"* If you specify neither the `iterations` nor the `experiment_timeout_minutes`, automated ML keeps running iterations while it continues to see improvements in the scores.\n",
|
"* If you specify neither the `iterations` nor the `experiment_timeout_minutes`, automated ML keeps running iterations while it continues to see improvements in the scores.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The following example doesn't specify `iterations` or `experiment_timeout_minutes` and so runs until the scores stop improving.\n"
|
"The following example doesn't specify `iterations` or `experiment_timeout_minutes` and so runs until the scores stop improving.\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" n_cross_validations = 3)"
|
" n_cross_validations = 3)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = local_run.continue_experiment(X = X_train, \n",
|
"local_run = local_run.continue_experiment(X = X_train, \n",
|
||||||
" y = y_train, \n",
|
" y = y_train, \n",
|
||||||
" show_output = True,\n",
|
" show_output = True,\n",
|
||||||
" iterations = 5)"
|
" iterations = 5)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -253,32 +277,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -289,41 +313,41 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)"
|
"print(best_run)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Print the properties of the model\n",
|
"#### Print the properties of the model\n",
|
||||||
"The fitted_model is a python object and you can read the different properties of the object.\n",
|
"The fitted_model is a python object and you can read the different properties of the object.\n",
|
||||||
"The following shows printing hyperparameters for each step in the pipeline."
|
"The following shows printing hyperparameters for each step in the pipeline."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from pprint import pprint\n",
|
"from pprint import pprint\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -346,98 +370,98 @@
|
|||||||
" print()\n",
|
" print()\n",
|
||||||
" \n",
|
" \n",
|
||||||
"print_model(fitted_model)"
|
"print_model(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)"
|
"print(best_run)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print_model(fitted_model)"
|
"print_model(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"print(third_run)"
|
"print(third_run)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print_model(third_model)"
|
"print_model(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test \n",
|
"## Test \n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_test = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"We will try to predict 2 digits and see how our model works."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
@@ -450,33 +474,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-classification
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,27 +1,51 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Prepare Data using `azureml.dataprep` for Remote Execution (DSVM)**_\n",
|
"_**Prepare Data using `azureml.dataprep` for Remote Execution (AmlCompute)**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -45,29 +69,29 @@
|
|||||||
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
||||||
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
||||||
"3. Pass the `Dataflow` to AutoML for a remote run."
|
"3. Pass the `Dataflow` to AutoML for a remote run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import time\n",
|
"import time\n",
|
||||||
@@ -80,13 +104,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import azureml.dataprep as dprep\n",
|
"import azureml.dataprep as dprep\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
" \n",
|
" \n",
|
||||||
@@ -108,20 +132,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
||||||
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
||||||
@@ -130,21 +154,21 @@
|
|||||||
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
||||||
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
||||||
"dflow.get_profile()"
|
"dflow.get_profile()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
||||||
"dflow = dflow.drop_nulls('Primary Type')\n",
|
"dflow = dflow.drop_nulls('Primary Type')\n",
|
||||||
"dflow.head(5)"
|
"dflow.head(5)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Review the Data Preparation Result\n",
|
"### Review the Data Preparation Result\n",
|
||||||
@@ -152,32 +176,32 @@
|
|||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
||||||
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This creates a general AutoML settings object applicable for both local and remote runs."
|
"This creates a general AutoML settings object applicable for both local and remote runs."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\" : 10,\n",
|
" \"iteration_timeout_minutes\" : 10,\n",
|
||||||
@@ -186,20 +210,20 @@
|
|||||||
" \"preprocess\" : True,\n",
|
" \"preprocess\" : True,\n",
|
||||||
" \"verbosity\" : logging.INFO\n",
|
" \"verbosity\" : logging.INFO\n",
|
||||||
"}"
|
"}"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create or Attach an AmlCompute cluster"
|
"### Create or Attach an AmlCompute cluster"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
@@ -231,13 +255,13 @@
|
|||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -252,22 +276,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pass Data with `Dataflow` Objects\n",
|
"### Pass Data with `Dataflow` Objects\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `Dataflow` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `Dataflow` object and send it to the remote compute target. The `Dataflow` will not be evaluated locally."
|
"The `Dataflow` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `Dataflow` object and send it to the remote compute target. The `Dataflow` will not be evaluated locally."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -276,73 +300,73 @@
|
|||||||
" X = X,\n",
|
" X = X,\n",
|
||||||
" y = y,\n",
|
" y = y,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = experiment.submit(automl_config, show_output = True)"
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pre-process cache cleanup\n",
|
"### Pre-process cache cleanup\n",
|
||||||
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
|
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run.clean_preprocessor_cache()"
|
"remote_run.clean_preprocessor_cache()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Cancelling Runs\n",
|
"### Cancelling Runs\n",
|
||||||
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
||||||
"# remote_run.cancel()\n",
|
"# remote_run.cancel()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Cancel iteration 1 and move onto iteration 2.\n",
|
"# Cancel iteration 1 and move onto iteration 2.\n",
|
||||||
"# remote_run.cancel_iteration(1)"
|
"# remote_run.cancel_iteration(1)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -350,31 +374,31 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(remote_run).show()"
|
"RunDetails(remote_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(remote_run.get_children())\n",
|
"children = list(remote_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -385,101 +409,101 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the first iteration:"
|
"Show the run and the model from the first iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 0\n",
|
"iteration = 0\n",
|
||||||
"best_run, fitted_model = remote_run.get_output(iteration = iteration)\n",
|
"best_run, fitted_model = remote_run.get_output(iteration = iteration)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data\n",
|
"#### Load Test Data\n",
|
||||||
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
||||||
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will use confusion matrix to see how our model works."
|
"We will use confusion matrix to see how our model works."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from pandas_ml import ConfusionMatrix\n",
|
"from pandas_ml import ConfusionMatrix\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -494,33 +518,9 @@
|
|||||||
"print(cm)\n",
|
"print(cm)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"cm.plot()"
|
"cm.plot()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-dataprep-remote-execution
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -45,29 +69,29 @@
|
|||||||
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
||||||
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
||||||
"3. Pass the `Dataflow` to AutoML for a remote run."
|
"3. Pass the `Dataflow` to AutoML for a remote run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -78,13 +102,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import azureml.dataprep as dprep\n",
|
"import azureml.dataprep as dprep\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
" \n",
|
" \n",
|
||||||
@@ -106,20 +130,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
||||||
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
||||||
@@ -128,21 +152,21 @@
|
|||||||
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
||||||
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
||||||
"dflow.get_profile()"
|
"dflow.get_profile()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
||||||
"dflow = dflow.drop_nulls('Primary Type')\n",
|
"dflow = dflow.drop_nulls('Primary Type')\n",
|
||||||
"dflow.head(5)"
|
"dflow.head(5)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Review the Data Preparation Result\n",
|
"### Review the Data Preparation Result\n",
|
||||||
@@ -150,32 +174,32 @@
|
|||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
||||||
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This creates a general AutoML settings object applicable for both local and remote runs."
|
"This creates a general AutoML settings object applicable for both local and remote runs."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\" : 10,\n",
|
" \"iteration_timeout_minutes\" : 10,\n",
|
||||||
@@ -184,57 +208,57 @@
|
|||||||
" \"preprocess\" : True,\n",
|
" \"preprocess\" : True,\n",
|
||||||
" \"verbosity\" : logging.INFO\n",
|
" \"verbosity\" : logging.INFO\n",
|
||||||
"}"
|
"}"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pass Data with `Dataflow` Objects\n",
|
"### Pass Data with `Dataflow` Objects\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
" X = X,\n",
|
" X = X,\n",
|
||||||
" y = y,\n",
|
" y = y,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -242,31 +266,31 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show()"
|
"RunDetails(local_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -277,101 +301,101 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the first iteration:"
|
"Show the run and the model from the first iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 0\n",
|
"iteration = 0\n",
|
||||||
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
|
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data\n",
|
"#### Load Test Data\n",
|
||||||
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
||||||
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will use confusion matrix to see how our model works."
|
"We will use confusion matrix to see how our model works."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from pandas_ml import ConfusionMatrix\n",
|
"from pandas_ml import ConfusionMatrix\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -385,33 +409,9 @@
|
|||||||
"print(cm)\n",
|
"print(cm)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"cm.plot()"
|
"cm.plot()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-dataprep
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,10 +53,10 @@
|
|||||||
"1. [Explore](#Explore)\n",
|
"1. [Explore](#Explore)\n",
|
||||||
"1. [Download](#Download)\n",
|
"1. [Download](#Download)\n",
|
||||||
"1. [Register](#Register)"
|
"1. [Register](#Register)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -45,20 +69,20 @@
|
|||||||
"2. List all AutoML runs in an experiment.\n",
|
"2. List all AutoML runs in an experiment.\n",
|
||||||
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
||||||
"4. Download a fitted pipeline for any iteration."
|
"4. Download a fitted pipeline for any iteration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup"
|
"## Setup"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
@@ -66,36 +90,36 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()"
|
"ws = Workspace.from_config()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Explore"
|
"## Explore"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### List Experiments"
|
"### List Experiments"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"experiment_list = Experiment.list(workspace=ws)\n",
|
"experiment_list = Experiment.list(workspace=ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -106,21 +130,21 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"summary_df.T"
|
"summary_df.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### List runs for an experiment\n",
|
"### List runs for an experiment\n",
|
||||||
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -146,22 +170,22 @@
|
|||||||
"from IPython.display import display\n",
|
"from IPython.display import display\n",
|
||||||
"display(projname_html)\n",
|
"display(projname_html)\n",
|
||||||
"display(summary_df.T)"
|
"display(summary_df.T)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get details for a run\n",
|
"### Get details for a run\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
|
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
|
||||||
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
|
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
|
||||||
@@ -207,143 +231,119 @@
|
|||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"display(HTML('<h3>Metrics</h3>'))\n",
|
"display(HTML('<h3>Metrics</h3>'))\n",
|
||||||
"display(rundata)\n"
|
"display(rundata)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Download"
|
"## Download"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Download the Best Model for Any Given Metric"
|
"### Download the Best Model for Any Given Metric"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
||||||
"fitted_model"
|
"fitted_model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Download the Model for Any Given Iteration"
|
"### Download the Model for Any Given Iteration"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 1 # Replace with an iteration number.\n",
|
"iteration = 1 # Replace with an iteration number.\n",
|
||||||
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
||||||
"fitted_model"
|
"fitted_model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register"
|
"## Register"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register fitted model for deployment\n",
|
"### Register fitted model for deployment\n",
|
||||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags)\n",
|
"ml_run.register_model(description = description, tags = tags)\n",
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register the Best Model for Any Given Metric"
|
"### Register the Best Model for Any Given Metric"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register the Model for Any Given Iteration"
|
"### Register the Model for Any Given Iteration"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 1 # Replace with an iteration number.\n",
|
"iteration = 1 # Replace with an iteration number.\n",
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-exploring-previous-runs
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "xiaga@microsoft.com, tosingli@microsoft.com, erwright@microsoft.com"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.8"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,69 +53,69 @@
|
|||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Evaluate](#Evaluate)"
|
"1. [Evaluate](#Evaluate)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example, we show how AutoML can be used for bike share forecasting.\n",
|
"This notebook demonstrates demand forecasting for a bike-sharing service using AutoML.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The purpose is to demonstrate how to take advantage of the built-in holiday featurization, access the feature names, and further demonstrate how to work with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n",
|
"AutoML highlights here include built-in holiday featurization, accessing engineered feature names, and working with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"Notebook synopsis:\n",
|
||||||
"1. Creating an Experiment in an existing Workspace\n",
|
"1. Creating an Experiment in an existing Workspace\n",
|
||||||
"2. Instantiating AutoMLConfig with new task type \"forecasting\" for timeseries data training, and other timeseries related settings: for this dataset we use the basic one: \"time_column_name\" \n",
|
"2. Configuration and local run of AutoML for a time-series model with lag and holiday features \n",
|
||||||
"3. Training the Model using local compute\n",
|
"3. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
|
||||||
"4. Exploring the results\n",
|
"4. Evaluating the fitted model using a rolling test "
|
||||||
"5. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
|
],
|
||||||
"6. Testing the fitted model"
|
"cell_type": "markdown"
|
||||||
]
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n"
|
"## Setup\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import warnings\n",
|
"import warnings\n",
|
||||||
|
"\n",
|
||||||
|
"from pandas.tseries.frequencies import to_offset\n",
|
||||||
|
"\n",
|
||||||
"# Squash warning messages for cleaner output in the notebook\n",
|
"# Squash warning messages for cleaner output in the notebook\n",
|
||||||
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -113,30 +137,31 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"Read bike share demand data from file, and preview data."
|
"Read bike share demand data from file, and preview data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = pd.read_csv('bike-no.csv', parse_dates=['date'])"
|
"data = pd.read_csv('bike-no.csv', parse_dates=['date'])\n",
|
||||||
]
|
"data.head()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Let's set up what we know abou the dataset. \n",
|
"Let's set up what we know about the dataset. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Target column** is what we want to forecast.\n",
|
"**Target column** is what we want to forecast.\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -145,33 +170,33 @@
|
|||||||
"**Grain** is another word for an individual time series in your dataset. Grains are identified by values of the columns listed `grain_column_names`, for example \"store\" and \"item\" if your data has multiple time series of sales, one series for each combination of store and item sold.\n",
|
"**Grain** is another word for an individual time series in your dataset. Grains are identified by values of the columns listed `grain_column_names`, for example \"store\" and \"item\" if your data has multiple time series of sales, one series for each combination of store and item sold.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This dataset has only one time series. Please see the [orange juice notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales) for an example of a multi-time series dataset."
|
"This dataset has only one time series. Please see the [orange juice notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales) for an example of a multi-time series dataset."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"target_column_name = 'cnt'\n",
|
"target_column_name = 'cnt'\n",
|
||||||
"time_column_name = 'date'\n",
|
"time_column_name = 'date'\n",
|
||||||
"grain_column_names = []"
|
"grain_column_names = []"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Split the data\n",
|
"## Split the data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The first split we make is into train and test sets. Note we are splitting on time."
|
"The first split we make is into train and test sets. Note we are splitting on time."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"train = data[data[time_column_name] < '2012-09-01']\n",
|
"train = data[data[time_column_name] < '2012-09-01']\n",
|
||||||
"test = data[data[time_column_name] >= '2012-09-01']\n",
|
"test = data[data[time_column_name] >= '2012-09-01']\n",
|
||||||
@@ -186,32 +211,28 @@
|
|||||||
"print(y_train.shape)\n",
|
"print(y_train.shape)\n",
|
||||||
"print(X_test.shape)\n",
|
"print(X_test.shape)\n",
|
||||||
"print(y_test.shape)"
|
"print(y_test.shape)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Setting forecaster maximum horizon \n",
|
"### Setting forecaster maximum horizon \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Assuming your test data forms a full and regular time series(regular time intervals and no holes), \n",
|
"The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 14 periods (i.e. 14 days). Notice that this is much shorter than the number of days in the test set; we will need to use a rolling test to evaluate the performance on the whole test set. For more discussion of forecast horizons and guiding principles for setting them, please see the [energy demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand). "
|
||||||
"the maximum horizon you will need to forecast is the length of the longest grain in your test set."
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"if len(grain_column_names) == 0:\n",
|
"max_horizon = 14"
|
||||||
" max_horizon = len(X_test)\n",
|
],
|
||||||
"else:\n",
|
"cell_type": "code"
|
||||||
" max_horizon = X_test.groupby(grain_column_names)[time_column_name].count().max()"
|
|
||||||
]
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -229,107 +250,106 @@
|
|||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
|
"|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"time_column_name = 'date'\n",
|
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"time_column_name\": time_column_name,\n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
" # these columns are a breakdown of the total and therefore a leak\n",
|
" 'max_horizon': max_horizon,\n",
|
||||||
" \"drop_column_names\": ['casual', 'registered'],\n",
|
|
||||||
" # knowing the country/region allows Automated ML to bring in holidays\n",
|
" # knowing the country/region allows Automated ML to bring in holidays\n",
|
||||||
" \"country_or_region\" : 'US',\n",
|
" 'country_or_region': 'US',\n",
|
||||||
" \"max_horizon\" : max_horizon,\n",
|
" 'target_lags': 1,\n",
|
||||||
" \"target_lags\": 1 \n",
|
" # these columns are a breakdown of the total and therefore a leak\n",
|
||||||
|
" 'drop_column_names': ['casual', 'registered']\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"automl_config = AutoMLConfig(task = 'forecasting', \n",
|
"automl_config = AutoMLConfig(task='forecasting', \n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" iterations = 10,\n",
|
" iterations=10,\n",
|
||||||
" iteration_timeout_minutes = 5,\n",
|
" iteration_timeout_minutes=5,\n",
|
||||||
" X = X_train,\n",
|
" X=X_train,\n",
|
||||||
" y = y_train,\n",
|
" y=y_train,\n",
|
||||||
" n_cross_validations = 3, \n",
|
" n_cross_validations=3, \n",
|
||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We will now run the experiment, starting with 10 iterations of model search. Experiment can be continued for more iterations if the results are not yet good. You will see the currently running iterations printing to the console."
|
"We will now run the experiment, starting with 10 iterations of model search. The experiment can be continued for more iterations if more accurate results are required. You will see the currently running iterations printing to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!"
|
"Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"fitted_model.steps"
|
"fitted_model.steps"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### View the engineered names for featurized data\n",
|
"### View the engineered names for featurized data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can accees the engineered feature names generated in time-series featurization. Note that a number of named holiday periods are represented. We recommend that you have at least one year of data when using this feature to ensure that all yearly holidays are captured in the training featurization."
|
"You can accees the engineered feature names generated in time-series featurization. Note that a number of named holiday periods are represented. We recommend that you have at least one year of data when using this feature to ensure that all yearly holidays are captured in the training featurization."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
|
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### View the featurization summary\n",
|
"### View the featurization summary\n",
|
||||||
@@ -341,62 +361,59 @@
|
|||||||
"- Type detected\n",
|
"- Type detected\n",
|
||||||
"- If feature was dropped\n",
|
"- If feature was dropped\n",
|
||||||
"- List of feature transformations for the raw feature"
|
"- List of feature transformations for the raw feature"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()"
|
"fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Test the Best Fitted Model\n",
|
"## Evaluate"
|
||||||
"\n",
|
],
|
||||||
"Predict on training and test set, and calculate residual values.\n",
|
"cell_type": "markdown"
|
||||||
"\n",
|
},
|
||||||
"We always score on the original dataset whose schema matches the scheme of the training dataset."
|
{
|
||||||
]
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We now use the best fitted model from the AutoML Run to make forecasts for the test set. \n",
|
||||||
|
"\n",
|
||||||
|
"We always score on the original dataset whose schema matches the training set schema."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_test.head()"
|
"X_test.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"y_query = y_test.copy().astype(np.float)\n",
|
|
||||||
"y_query.fill(np.NaN)\n",
|
|
||||||
"y_fcst, X_trans = fitted_model.forecast(X_test, y_query)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
|
"We now define some functions for aligning output to input and for producing rolling forecasts over the full test set. As previously stated, the forecast horizon of 14 days is shorter than the length of the test set - which is about 120 days. To get predictions over the full test set, we iterate over the test set, making forecasts 14 days at a time and combining the results. We also make sure that each 14-day forecast uses up-to-date actuals - the current context - to construct lag features. \n",
|
||||||
|
"\n",
|
||||||
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
|
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name='predicted',\n",
|
||||||
|
" horizon_colname='horizon_origin'):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
" Demonstrates how to get the output aligned to the inputs\n",
|
" Demonstrates how to get the output aligned to the inputs\n",
|
||||||
" using pandas indexes. Helps understand what happened if\n",
|
" using pandas indexes. Helps understand what happened if\n",
|
||||||
@@ -408,7 +425,8 @@
|
|||||||
" * model was asked to predict past max_horizon -> increase max horizon\n",
|
" * model was asked to predict past max_horizon -> increase max horizon\n",
|
||||||
" * data at start of X_test was needed for lags -> provide previous periods\n",
|
" * data at start of X_test was needed for lags -> provide previous periods\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
" df_fcst = pd.DataFrame({predicted_column_name : y_predicted})\n",
|
" df_fcst = pd.DataFrame({predicted_column_name : y_predicted,\n",
|
||||||
|
" horizon_colname: X_trans[horizon_colname]})\n",
|
||||||
" # y and X outputs are aligned by forecast() function contract\n",
|
" # y and X outputs are aligned by forecast() function contract\n",
|
||||||
" df_fcst.index = X_trans.index\n",
|
" df_fcst.index = X_trans.index\n",
|
||||||
" \n",
|
" \n",
|
||||||
@@ -427,15 +445,81 @@
|
|||||||
" clean = together[together[[target_column_name, predicted_column_name]].notnull().all(axis=1)]\n",
|
" clean = together[together[[target_column_name, predicted_column_name]].notnull().all(axis=1)]\n",
|
||||||
" return(clean)\n",
|
" return(clean)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"df_all = align_outputs(y_fcst, X_trans, X_test, y_test)\n"
|
"def do_rolling_forecast(fitted_model, X_test, y_test, max_horizon, freq='D'):\n",
|
||||||
]
|
" \"\"\"\n",
|
||||||
|
" Produce forecasts on a rolling origin over the given test set.\n",
|
||||||
|
" \n",
|
||||||
|
" Each iteration makes a forecast for the next 'max_horizon' periods \n",
|
||||||
|
" with respect to the current origin, then advances the origin by the horizon time duration. \n",
|
||||||
|
" The prediction context for each forecast is set so that the forecaster uses \n",
|
||||||
|
" the actual target values prior to the current origin time for constructing lag features.\n",
|
||||||
|
" \n",
|
||||||
|
" This function returns a concatenated DataFrame of rolling forecasts.\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" df_list = []\n",
|
||||||
|
" origin_time = X_test[time_column_name].min()\n",
|
||||||
|
" while origin_time <= X_test[time_column_name].max():\n",
|
||||||
|
" # Set the horizon time - end date of the forecast\n",
|
||||||
|
" horizon_time = origin_time + max_horizon * to_offset(freq)\n",
|
||||||
|
" \n",
|
||||||
|
" # Extract test data from an expanding window up-to the horizon \n",
|
||||||
|
" expand_wind = (X_test[time_column_name] < horizon_time)\n",
|
||||||
|
" X_test_expand = X_test[expand_wind]\n",
|
||||||
|
" y_query_expand = np.zeros(len(X_test_expand)).astype(np.float)\n",
|
||||||
|
" y_query_expand.fill(np.NaN)\n",
|
||||||
|
" \n",
|
||||||
|
" if origin_time != X_test[time_column_name].min():\n",
|
||||||
|
" # Set the context by including actuals up-to the origin time\n",
|
||||||
|
" test_context_expand_wind = (X_test[time_column_name] < origin_time)\n",
|
||||||
|
" context_expand_wind = (X_test_expand[time_column_name] < origin_time)\n",
|
||||||
|
" y_query_expand[context_expand_wind] = y_test[test_context_expand_wind]\n",
|
||||||
|
" \n",
|
||||||
|
" # Make a forecast out to the maximum horizon\n",
|
||||||
|
" y_fcst, X_trans = fitted_model.forecast(X_test_expand, y_query_expand)\n",
|
||||||
|
" \n",
|
||||||
|
" # Align forecast with test set for dates within the current rolling window \n",
|
||||||
|
" trans_tindex = X_trans.index.get_level_values(time_column_name)\n",
|
||||||
|
" trans_roll_wind = (trans_tindex >= origin_time) & (trans_tindex < horizon_time)\n",
|
||||||
|
" test_roll_wind = expand_wind & (X_test[time_column_name] >= origin_time)\n",
|
||||||
|
" df_list.append(align_outputs(y_fcst[trans_roll_wind], X_trans[trans_roll_wind],\n",
|
||||||
|
" X_test[test_roll_wind], y_test[test_roll_wind]))\n",
|
||||||
|
" \n",
|
||||||
|
" # Advance the origin time\n",
|
||||||
|
" origin_time = horizon_time\n",
|
||||||
|
" \n",
|
||||||
|
" return pd.concat(df_list, ignore_index=True)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
|
"df_all = do_rolling_forecast(fitted_model, X_test, y_test, max_horizon)\n",
|
||||||
|
"df_all"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We now calculate some error metrics for the forecasts and vizualize the predictions vs. the actuals."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"def APE(actual, pred):\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" Calculate absolute percentage error.\n",
|
||||||
|
" Returns a vector of APE values with same length as actual/pred.\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" return 100*np.abs((actual - pred)/actual)\n",
|
||||||
|
"\n",
|
||||||
"def MAPE(actual, pred):\n",
|
"def MAPE(actual, pred):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
" Calculate mean absolute percentage error.\n",
|
" Calculate mean absolute percentage error.\n",
|
||||||
@@ -445,15 +529,14 @@
|
|||||||
" not_zero = ~np.isclose(actual, 0.0)\n",
|
" not_zero = ~np.isclose(actual, 0.0)\n",
|
||||||
" actual_safe = actual[not_na & not_zero]\n",
|
" actual_safe = actual[not_na & not_zero]\n",
|
||||||
" pred_safe = pred[not_na & not_zero]\n",
|
" pred_safe = pred[not_na & not_zero]\n",
|
||||||
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
" return np.mean(APE(actual_safe, pred_safe))"
|
||||||
" return np.mean(APE)"
|
],
|
||||||
]
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"Simple forecasting model\")\n",
|
"print(\"Simple forecasting model\")\n",
|
||||||
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
||||||
@@ -468,33 +551,54 @@
|
|||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The MAPE seems high; it is being skewed by an actual with a small absolute value. For a more informative evaluation, we can calculate the metrics by forecast horizon:"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"df_all.groupby('horizon_origin').apply(\n",
|
||||||
|
" lambda df: pd.Series({'MAPE': MAPE(df[target_column_name], df['predicted']),\n",
|
||||||
|
" 'RMSE': np.sqrt(mean_squared_error(df[target_column_name], df['predicted'])),\n",
|
||||||
|
" 'MAE': mean_absolute_error(df[target_column_name], df['predicted'])}))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"It's also interesting to see the distributions of APE (absolute percentage error) by horizon. On a log scale, the outlying APE in the horizon-3 group is clear."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"df_all_APE = df_all.assign(APE=APE(df_all[target_column_name], df_all['predicted']))\n",
|
||||||
|
"APEs = [df_all_APE[df_all['horizon_origin'] == h].APE.values for h in range(1, max_horizon + 1)]\n",
|
||||||
|
"\n",
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"plt.boxplot(APEs)\n",
|
||||||
|
"plt.yscale('log')\n",
|
||||||
|
"plt.xlabel('horizon')\n",
|
||||||
|
"plt.ylabel('APE (%)')\n",
|
||||||
|
"plt.title('Absolute Percentage Errors by Forecast Horizon')\n",
|
||||||
|
"\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "xiaga@microsoft.com, tosingli@microsoft.com"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.7"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
name: auto-ml-forecasting-bike-share
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
|
- statsmodels
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "xiaga, tosingli, erwright"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.8"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -28,67 +52,66 @@
|
|||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)"
|
"1. [Train](#Train)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example, we show how AutoML can be used for energy demand forecasting.\n",
|
"In this example, we show how AutoML can be used to forecast a single time-series in the energy demand application area. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"Notebook synopsis:\n",
|
||||||
"1. Creating an Experiment in an existing Workspace\n",
|
"1. Creating an Experiment in an existing Workspace\n",
|
||||||
"2. Instantiating AutoMLConfig with new task type \"forecasting\" for timeseries data training, and other timeseries related settings: for this dataset we use the basic one: \"time_column_name\" \n",
|
"2. Configuration and local run of AutoML for a simple time-series model\n",
|
||||||
"3. Training the Model using local compute\n",
|
"3. View engineered features and prediction results\n",
|
||||||
"4. Exploring the results\n",
|
"4. Configuration and local run of AutoML for a time-series model with lag and rolling window features\n",
|
||||||
"5. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
|
"5. Estimate feature importance"
|
||||||
"6. Testing the fitted model"
|
],
|
||||||
]
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n"
|
"## Setup\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import warnings\n",
|
"import warnings\n",
|
||||||
|
"\n",
|
||||||
"# Squash warning messages for cleaner output in the notebook\n",
|
"# Squash warning messages for cleaner output in the notebook\n",
|
||||||
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score"
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -110,63 +133,101 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"Read energy demanding data from file, and preview data."
|
"We will use energy consumption data from New York City for model training. The data is stored in a tabular format and includes energy demand and basic weather data at an hourly frequency. Pandas CSV reader is used to read the file into memory. Special attention is given to the \"timeStamp\" column in the data since it contains text which should be parsed as datetime-type objects. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = pd.read_csv(\"nyc_energy.csv\", parse_dates=['timeStamp'])\n",
|
"data = pd.read_csv(\"nyc_energy.csv\", parse_dates=['timeStamp'])\n",
|
||||||
"data.head()"
|
"data.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We must now define the schema of this dataset. Every time-series must have a time column and a target. The target quantity is what will be eventually forecasted by a trained model. In this case, the target is the \"demand\" column. The other columns, \"temp\" and \"precip,\" are implicitly designated as features."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# let's take note of what columns means what in the data\n",
|
"# Dataset schema\n",
|
||||||
"time_column_name = 'timeStamp'\n",
|
"time_column_name = 'timeStamp'\n",
|
||||||
"target_column_name = 'demand'"
|
"target_column_name = 'demand'"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Split the data into train and test sets\n"
|
"### Forecast Horizon\n",
|
||||||
]
|
"\n",
|
||||||
|
"In addition to the data schema, we must also specify the forecast horizon. A forecast horizon is a time span into the future (or just beyond the latest date in the training data) where forecasts of the target quantity are needed. Choosing a forecast horizon is application specific, but a rule-of-thumb is that **the horizon should be the time-frame where you need actionable decisions based on the forecast.** The horizon usually has a strong relationship with the frequency of the time-series data, that is, the sampling interval of the target quantity and the features. For instance, the NYC energy demand data has an hourly frequency. A decision that requires a demand forecast to the hour is unlikely to be made weeks or months in advance, particularly if we expect weather to be a strong determinant of demand. We may have fairly accurate meteorological forecasts of the hourly temperature and precipitation on a the time-scale of a day or two, however.\n",
|
||||||
|
"\n",
|
||||||
|
"Given the above discussion, we generally recommend that users set forecast horizons to less than 100 time periods (i.e. less than 100 hours in the NYC energy example). Furthermore, **AutoML's memory use and computation time increase in proportion to the length of the horizon**, so the user should consider carefully how they set this value. If a long horizon forecast really is necessary, it may be good practice to aggregate the series to a coarser time scale. \n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"Forecast horizons in AutoML are given as integer multiples of the time-series frequency. In this example, we set the horizon to 48 hours."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_train = data[data[time_column_name] < '2017-02-01']\n",
|
"max_horizon = 48"
|
||||||
"X_test = data[data[time_column_name] >= '2017-02-01']\n",
|
],
|
||||||
"y_train = X_train.pop(target_column_name).values\n",
|
"cell_type": "code"
|
||||||
"y_test = X_test.pop(target_column_name).values"
|
},
|
||||||
]
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Split the data into train and test sets\n",
|
||||||
|
"We now split the data into a train and a test set so that we may evaluate model performance. We note that the tail of the dataset contains a large number of NA values in the target column, so we designate the test set as the 48 hour window ending on the latest date of known energy demand. "
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Find time point to split on\n",
|
||||||
|
"latest_known_time = data[~pd.isnull(data[target_column_name])][time_column_name].max()\n",
|
||||||
|
"split_time = latest_known_time - pd.Timedelta(hours=max_horizon)\n",
|
||||||
|
"\n",
|
||||||
|
"# Split into train/test sets\n",
|
||||||
|
"X_train = data[data[time_column_name] <= split_time]\n",
|
||||||
|
"X_test = data[(data[time_column_name] > split_time) & (data[time_column_name] <= latest_known_time)]\n",
|
||||||
|
"\n",
|
||||||
|
"# Move the target values into their own arrays \n",
|
||||||
|
"y_train = X_train.pop(target_column_name).values\n",
|
||||||
|
"y_test = X_test.pop(target_column_name).values"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
"We now instantiate an AutoMLConfig object. This config defines the settings and data used to run the experiment. For forecasting tasks, we must provide extra configuration related to the time-series data schema and forecasting context. Here, only the name of the time column and the maximum forecast horizon are needed. Other settings are described below:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
@@ -176,98 +237,98 @@
|
|||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"|**n_cross_validations**|Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"time_series_settings = {\n",
|
||||||
" \"time_column_name\": time_column_name \n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
|
" 'max_horizon': max_horizon\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"automl_config = AutoMLConfig(task='forecasting',\n",
|
||||||
"automl_config = AutoMLConfig(task = 'forecasting',\n",
|
" debug_log='automl_nyc_energy_errors.log',\n",
|
||||||
" debug_log = 'automl_nyc_energy_errors.log',\n",
|
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" iterations = 10,\n",
|
" iterations=10,\n",
|
||||||
" iteration_timeout_minutes = 5,\n",
|
" iteration_timeout_minutes=5,\n",
|
||||||
" X = X_train,\n",
|
" X=X_train,\n",
|
||||||
" y = y_train,\n",
|
" y=y_train,\n",
|
||||||
" n_cross_validations = 3,\n",
|
" n_cross_validations=3,\n",
|
||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" **automl_settings)"
|
" **time_series_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Submitting the configuration will start a new run in this experiment. For local runs, the execution is synchronous. Depending on the data and number of iterations, this can run for a while. Parameters controlling concurrency may speed up the process, depending on your hardware.\n",
|
"Submitting the configuration will start a new run in this experiment. For local runs, the execution is synchronous. Depending on the data and number of iterations, this can run for a while. Parameters controlling concurrency may speed up the process, depending on your hardware.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"You will see the currently running iterations printing to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"fitted_model.steps"
|
"fitted_model.steps"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### View the engineered names for featurized data\n",
|
"### View the engineered names for featurized data\n",
|
||||||
"Below we display the engineered feature names generated for the featurized data using the time-series featurization."
|
"Below we display the engineered feature names generated for the featurized data using the time-series featurization."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
|
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Test the Best Fitted Model\n",
|
"### Test the Best Fitted Model\n",
|
||||||
@@ -277,13 +338,13 @@
|
|||||||
"We need to pass the recent values of the target variable `y`, whereas the scikit-compatible `predict` function only takes the non-target variables `X`. In our case, the test data immediately follows the training data, and we fill the `y` variable with `NaN`. The `NaN` serves as a question mark for the forecaster to fill with the actuals. Using the forecast function will produce forecasts using the shortest possible forecast horizon. The last time at which a definite (non-NaN) value is seen is the _forecast origin_ - the last time when the value of the target is known. \n",
|
"We need to pass the recent values of the target variable `y`, whereas the scikit-compatible `predict` function only takes the non-target variables `X`. In our case, the test data immediately follows the training data, and we fill the `y` variable with `NaN`. The `NaN` serves as a question mark for the forecaster to fill with the actuals. Using the forecast function will produce forecasts using the shortest possible forecast horizon. The last time at which a definite (non-NaN) value is seen is the _forecast origin_ - the last time when the value of the target is known. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Using the `predict` method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use."
|
"Using the `predict` method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Replace ALL values in y_pred by NaN. \n",
|
"# Replace ALL values in y_pred by NaN. \n",
|
||||||
"# The forecast origin will be at the beginning of the first forecast period\n",
|
"# The forecast origin will be at the beginning of the first forecast period\n",
|
||||||
@@ -294,13 +355,13 @@
|
|||||||
"# This contains the assumptions that were made in the forecast\n",
|
"# This contains the assumptions that were made in the forecast\n",
|
||||||
"# and helps align the forecast to the original data\n",
|
"# and helps align the forecast to the original data\n",
|
||||||
"y_fcst, X_trans = fitted_model.forecast(X_test, y_query)"
|
"y_fcst, X_trans = fitted_model.forecast(X_test, y_query)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# limit the evaluation to data where y_test has actuals\n",
|
"# limit the evaluation to data where y_test has actuals\n",
|
||||||
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
||||||
@@ -336,36 +397,37 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"df_all = align_outputs(y_fcst, X_trans, X_test, y_test)\n",
|
"df_all = align_outputs(y_fcst, X_trans, X_test, y_test)\n",
|
||||||
"df_all.head()"
|
"df_all.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Looking at `X_trans` is also useful to see what featurization happened to the data."
|
"Looking at `X_trans` is also useful to see what featurization happened to the data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_trans"
|
"X_trans"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Calculate accuracy metrics\n"
|
"### Calculate accuracy metrics\n",
|
||||||
]
|
"Finally, we calculate some accuracy metrics for the forecast and plot the predictions vs. the actuals over the time range in the test set."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def MAPE(actual, pred):\n",
|
"def MAPE(actual, pred):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
@@ -378,13 +440,13 @@
|
|||||||
" pred_safe = pred[not_na & not_zero]\n",
|
" pred_safe = pred[not_na & not_zero]\n",
|
||||||
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
||||||
" return np.mean(APE)"
|
" return np.mean(APE)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"Simple forecasting model\")\n",
|
"print(\"Simple forecasting model\")\n",
|
||||||
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
||||||
@@ -395,99 +457,107 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# Plot outputs\n",
|
"# Plot outputs\n",
|
||||||
"%matplotlib notebook\n",
|
"%matplotlib notebook\n",
|
||||||
"test_pred = plt.scatter(df_all[target_column_name], df_all['predicted'], color='b')\n",
|
"pred, = plt.plot(df_all[time_column_name], df_all['predicted'], color='b')\n",
|
||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
"actual, = plt.plot(df_all[time_column_name], df_all[target_column_name], color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.xticks(fontsize=8)\n",
|
||||||
|
"plt.legend((pred, actual), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.title('Prediction vs. Actual Time-Series')\n",
|
||||||
|
"\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The distribution looks a little heavy tailed: we underestimate the excursions of the extremes. A normal-quantile transform of the target might help, but let's first try using some past data with the lags and rolling window transforms.\n"
|
"The distribution looks a little heavy tailed: we underestimate the excursions of the extremes. A normal-quantile transform of the target might help, but let's first try using some past data with the lags and rolling window transforms.\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Using lags and rolling window features to improve the forecast"
|
"### Using lags and rolling window features"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data.\n",
|
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that we configured target lags, that is the previous values of the target variables, and the prediction is no longer horizon-less. We therefore must specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features."
|
"Now that we configured target lags, that is the previous values of the target variables, and the prediction is no longer horizon-less. We therefore must still specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings_lags = {\n",
|
"time_series_settings_with_lags = {\n",
|
||||||
" 'time_column_name': time_column_name,\n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
" 'target_lags': 1,\n",
|
" 'max_horizon': max_horizon,\n",
|
||||||
" 'target_rolling_window_size': 5,\n",
|
" 'target_lags': 12,\n",
|
||||||
" # you MUST set the max_horizon when using lags and rolling windows\n",
|
" 'target_rolling_window_size': 4\n",
|
||||||
" # it is optional when looking-back features are not used \n",
|
|
||||||
" 'max_horizon': len(y_test), # only one grain\n",
|
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"automl_config_lags = AutoMLConfig(task='forecasting',\n",
|
||||||
"automl_config_lags = AutoMLConfig(task = 'forecasting',\n",
|
" debug_log='automl_nyc_energy_errors.log',\n",
|
||||||
" debug_log = 'automl_nyc_energy_errors.log',\n",
|
" primary_metric='normalized_root_mean_squared_error',\n",
|
||||||
" primary_metric='normalized_root_mean_squared_error',\n",
|
" blacklist_models=['ElasticNet'],\n",
|
||||||
" iterations = 10,\n",
|
" iterations=10,\n",
|
||||||
" iteration_timeout_minutes = 5,\n",
|
" iteration_timeout_minutes=10,\n",
|
||||||
" X = X_train,\n",
|
" X=X_train,\n",
|
||||||
" y = y_train,\n",
|
" y=y_train,\n",
|
||||||
" n_cross_validations = 3,\n",
|
" n_cross_validations=3,\n",
|
||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" **automl_settings_lags)"
|
" **time_series_settings_with_lags)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We now start a new local run, this time with lag and rolling window featurization. AutoML applies featurizations in the setup stage, prior to iterating over ML models. The full training set is featurized first, followed by featurization of each of the CV splits. Lag and rolling window features introduce additional complexity, so the run will take longer than in the previous example that lacked these featurizations."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run_lags = experiment.submit(automl_config_lags, show_output=True)"
|
"local_run_lags = experiment.submit(automl_config_lags, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run_lags, fitted_model_lags = local_run_lags.get_output()\n",
|
"best_run_lags, fitted_model_lags = local_run_lags.get_output()\n",
|
||||||
"y_fcst_lags, X_trans_lags = fitted_model_lags.forecast(X_test, y_query)\n",
|
"y_fcst_lags, X_trans_lags = fitted_model_lags.forecast(X_test, y_query)\n",
|
||||||
"df_lags = align_outputs(y_fcst_lags, X_trans_lags, X_test, y_test)\n",
|
"df_lags = align_outputs(y_fcst_lags, X_trans_lags, X_test, y_test)\n",
|
||||||
"df_lags.head()"
|
"df_lags.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_trans_lags"
|
"X_trans_lags"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"Forecasting model with lags\")\n",
|
"print(\"Forecasting model with lags\")\n",
|
||||||
"rmse = np.sqrt(mean_squared_error(df_lags[target_column_name], df_lags['predicted']))\n",
|
"rmse = np.sqrt(mean_squared_error(df_lags[target_column_name], df_lags['predicted']))\n",
|
||||||
@@ -498,69 +568,46 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# Plot outputs\n",
|
"# Plot outputs\n",
|
||||||
"%matplotlib notebook\n",
|
"%matplotlib notebook\n",
|
||||||
"test_pred = plt.scatter(df_lags[target_column_name], df_lags['predicted'], color='b')\n",
|
"pred, = plt.plot(df_lags[time_column_name], df_lags['predicted'], color='b')\n",
|
||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
"actual, = plt.plot(df_lags[time_column_name], df_lags[target_column_name], color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.xticks(fontsize=8)\n",
|
||||||
|
"plt.legend((pred, actual), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### What features matter for the forecast?"
|
"### What features matter for the forecast?"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl.automlexplainer import explain_model\n",
|
"from azureml.train.automl.automlexplainer import explain_model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# feature names are everything in the transformed data except the target\n",
|
"# feature names are everything in the transformed data except the target\n",
|
||||||
"features = X_trans.columns[:-1]\n",
|
"features = X_trans_lags.columns[:-1]\n",
|
||||||
"expl = explain_model(fitted_model, X_train, X_test, features = features, best_run=best_run_lags, y_train = y_train)\n",
|
"expl = explain_model(fitted_model_lags, X_train.copy(), X_test.copy(), features=features, best_run=best_run_lags, y_train=y_train)\n",
|
||||||
"# unpack the tuple\n",
|
"# unpack the tuple\n",
|
||||||
"shap_values, expected_values, feat_overall_imp, feat_names, per_class_summary, per_class_imp = expl\n",
|
"shap_values, expected_values, feat_overall_imp, feat_names, per_class_summary, per_class_imp = expl\n",
|
||||||
"best_run_lags"
|
"best_run_lags"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Please go to the Azure Portal's best run to see the top features chart.\n",
|
"Please go to the Azure Portal's best run to see the top features chart.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The informative features make all sorts of intuitive sense. Temperature is a strong driver of heating and cooling demand in NYC. Apart from that, the daily life cycle, expressed by `hour`, and the weekly cycle, expressed by `wday` drives people's energy use habits."
|
"The informative features make all sorts of intuitive sense. Temperature is a strong driver of heating and cooling demand in NYC. Apart from that, the daily life cycle, expressed by `hour`, and the weekly cycle, expressed by `wday` drives people's energy use habits."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "xiaga, tosingli"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.7"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,10 @@
|
|||||||
|
name: auto-ml-forecasting-energy-demand
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
|
- statsmodels
|
||||||
|
- azureml-explain-model
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "erwright, tosingli"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.8"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,66 +54,60 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Predict](#Predict)\n",
|
"1. [Predict](#Predict)\n",
|
||||||
"1. [Operationalize](#Operationalize)"
|
"1. [Operationalize](#Operationalize)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"In this example, we use AutoML to find and tune a time-series forecasting model.\n",
|
"In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook, you will:\n",
|
|
||||||
"1. Create an Experiment in an existing Workspace\n",
|
|
||||||
"2. Instantiate an AutoMLConfig \n",
|
|
||||||
"3. Find and train a forecasting model using local compute\n",
|
|
||||||
"4. Evaluate the performance of the model\n",
|
|
||||||
"\n",
|
|
||||||
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
|
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup"
|
"## Setup"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import warnings\n",
|
"import warnings\n",
|
||||||
|
"\n",
|
||||||
"# Squash warning messages for cleaner output in the notebook\n",
|
"# Squash warning messages for cleaner output in the notebook\n",
|
||||||
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment is a named object in a Workspace which represents a predictive task, the output of which is a trained model and a set of evaluation metrics for the model. "
|
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -111,79 +129,79 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
|
"You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"time_column_name = 'WeekStarting'\n",
|
"time_column_name = 'WeekStarting'\n",
|
||||||
"data = pd.read_csv(\"dominicks_OJ.csv\", parse_dates=[time_column_name])\n",
|
"data = pd.read_csv(\"dominicks_OJ.csv\", parse_dates=[time_column_name])\n",
|
||||||
"data.head()"
|
"data.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
|
"Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we thus define the **grain** - the columns whose values determine the boundaries between time-series: "
|
"The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we thus define the **grain** - the columns whose values determine the boundaries between time-series: "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"grain_column_names = ['Store', 'Brand']\n",
|
"grain_column_names = ['Store', 'Brand']\n",
|
||||||
"nseries = data.groupby(grain_column_names).ngroups\n",
|
"nseries = data.groupby(grain_column_names).ngroups\n",
|
||||||
"print('Data contains {0} individual time-series.'.format(nseries))"
|
"print('Data contains {0} individual time-series.'.format(nseries))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"For demonstration purposes, we extract sales time-series for just a few of the stores:"
|
"For demonstration purposes, we extract sales time-series for just a few of the stores:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"use_stores = [2, 5, 8]\n",
|
"use_stores = [2, 5, 8]\n",
|
||||||
"data_subset = data[data.Store.isin(use_stores)]\n",
|
"data_subset = data[data.Store.isin(use_stores)]\n",
|
||||||
"nseries = data_subset.groupby(grain_column_names).ngroups\n",
|
"nseries = data_subset.groupby(grain_column_names).ngroups\n",
|
||||||
"print('Data subset contains {0} individual time-series.'.format(nseries))"
|
"print('Data subset contains {0} individual time-series.'.format(nseries))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Data Splitting\n",
|
"### Data Splitting\n",
|
||||||
"We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the grain columns."
|
"We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the grain columns."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"n_test_periods = 20\n",
|
"n_test_periods = 20\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -196,10 +214,10 @@
|
|||||||
" return df_head, df_tail\n",
|
" return df_head, df_tail\n",
|
||||||
"\n",
|
"\n",
|
||||||
"X_train, X_test = split_last_n_by_grain(data_subset, n_test_periods)"
|
"X_train, X_test = split_last_n_by_grain(data_subset, n_test_periods)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Modeling\n",
|
"## Modeling\n",
|
||||||
@@ -214,20 +232,20 @@
|
|||||||
"AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n",
|
"AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
|
"You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"target_column_name = 'Quantity'\n",
|
"target_column_name = 'Quantity'\n",
|
||||||
"y_train = X_train.pop(target_column_name).values"
|
"y_train = X_train.pop(target_column_name).values"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -236,7 +254,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
|
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up-to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. \n",
|
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up-to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *X_valid* and *y_valid* parameters of AutoMLConfig.\n",
|
"Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *X_valid* and *y_valid* parameters of AutoMLConfig.\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -257,19 +275,19 @@
|
|||||||
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
|
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
|
||||||
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
|
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
|
||||||
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|"
|
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"time_series_settings = {\n",
|
"time_series_settings = {\n",
|
||||||
" 'time_column_name': time_column_name,\n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
" 'grain_column_names': grain_column_names,\n",
|
" 'grain_column_names': grain_column_names,\n",
|
||||||
" 'drop_column_names': ['logQuantity'],\n",
|
" 'drop_column_names': ['logQuantity'],\n",
|
||||||
" 'max_horizon': n_test_periods # optional\n",
|
" 'max_horizon': n_test_periods\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"automl_config = AutoMLConfig(task='forecasting',\n",
|
"automl_config = AutoMLConfig(task='forecasting',\n",
|
||||||
@@ -278,88 +296,89 @@
|
|||||||
" iterations=10,\n",
|
" iterations=10,\n",
|
||||||
" X=X_train,\n",
|
" X=X_train,\n",
|
||||||
" y=y_train,\n",
|
" y=y_train,\n",
|
||||||
" n_cross_validations=5,\n",
|
" n_cross_validations=3,\n",
|
||||||
" enable_ensembling=False,\n",
|
" enable_ensembling=False,\n",
|
||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" verbosity=logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" **time_series_settings)"
|
" **time_series_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can now submit a new training run. For local runs, the execution is synchronous. Depending on the data and number of iterations this operation may take several minutes.\n",
|
"You can now submit a new training run. For local runs, the execution is synchronous. Depending on the data and number of iterations this operation may take several minutes.\n",
|
||||||
"Information from each iteration will be printed to the console."
|
"Information from each iteration will be printed to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"Each run within an Experiment stores serialized (i.e. pickled) pipelines from the AutoML iterations. We can now retrieve the pipeline with the best performance on the validation dataset:"
|
"Each run within an Experiment stores serialized (i.e. pickled) pipelines from the AutoML iterations. We can now retrieve the pipeline with the best performance on the validation dataset:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_pipeline = local_run.get_output()\n",
|
"best_run, fitted_pipeline = local_run.get_output()\n",
|
||||||
"fitted_pipeline.steps"
|
"fitted_pipeline.steps"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Predict\n",
|
"# Forecasting\n",
|
||||||
|
"\n",
|
||||||
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:"
|
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_test = X_test.pop(target_column_name).values"
|
"y_test = X_test.pop(target_column_name).values"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_test.head()"
|
"X_test.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data. \n",
|
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"We will first create a query `y_query`, which is aligned index-for-index to `X_test`. This is a vector of target values where each `NaN` serves the function of the question mark to be replaced by forecast. Passing definite values in the `y` argument allows the `forecast` function to make predictions on data that does not immediately follow the train data which contains `y`. In each grain, the last time point where the model sees a definite value of `y` is that grain's _forecast origin_."
|
"We will first create a query `y_query`, which is aligned index-for-index to `X_test`. This is a vector of target values where each `NaN` serves the function of the question mark to be replaced by forecast. Passing definite values in the `y` argument allows the `forecast` function to make predictions on data that does not immediately follow the train data which contains `y`. In each grain, the last time point where the model sees a definite value of `y` is that grain's _forecast origin_."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Replace ALL values in y_pred by NaN.\n",
|
"# Replace ALL values in y_pred by NaN.\n",
|
||||||
"# The forecast origin will be at the beginning of the first forecast period.\n",
|
"# The forecast origin will be at the beginning of the first forecast period.\n",
|
||||||
@@ -370,19 +389,19 @@
|
|||||||
"# This contains the assumptions that were made in the forecast\n",
|
"# This contains the assumptions that were made in the forecast\n",
|
||||||
"# and helps align the forecast to the original data\n",
|
"# and helps align the forecast to the original data\n",
|
||||||
"y_pred, X_trans = fitted_pipeline.forecast(X_test, y_query)"
|
"y_pred, X_trans = fitted_pipeline.forecast(X_test, y_query)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"If you are used to scikit pipelines, perhaps you expected `predict(X_test)`. However, forecasting requires a more general interface that also supplies the past target `y` values. Please use `forecast(X,y)` as `predict(X)` is reserved for internal purposes on forecasting models.\n",
|
"If you are used to scikit pipelines, perhaps you expected `predict(X_test)`. However, forecasting requires a more general interface that also supplies the past target `y` values. Please use `forecast(X,y)` as `predict(X)` is reserved for internal purposes on forecasting models.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The [energy demand forecasting notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) demonstrates the use of the forecast function in more detail in the context of using lags and rolling window features. "
|
"The [energy demand forecasting notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) demonstrates the use of the forecast function in more detail in the context of using lags and rolling window features. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Evaluate\n",
|
"# Evaluate\n",
|
||||||
@@ -390,13 +409,13 @@
|
|||||||
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). \n",
|
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). \n",
|
||||||
"\n",
|
"\n",
|
||||||
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
|
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
@@ -431,13 +450,13 @@
|
|||||||
" return(clean)\n",
|
" return(clean)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"df_all = align_outputs(y_pred, X_trans, X_test, y_test)"
|
"df_all = align_outputs(y_pred, X_trans, X_test, y_test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def MAPE(actual, pred):\n",
|
"def MAPE(actual, pred):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
@@ -450,13 +469,13 @@
|
|||||||
" pred_safe = pred[not_na & not_zero]\n",
|
" pred_safe = pred[not_na & not_zero]\n",
|
||||||
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
||||||
" return np.mean(APE)"
|
" return np.mean(APE)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"Simple forecasting model\")\n",
|
"print(\"Simple forecasting model\")\n",
|
||||||
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
||||||
@@ -473,49 +492,49 @@
|
|||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Operationalize"
|
"# Operationalize"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"_Operationalization_ means getting the model into the cloud so that other can run it after you close the notebook. We will create a docker running on Azure Container Instances with the model."
|
"_Operationalization_ means getting the model into the cloud so that other can run it after you close the notebook. We will create a docker running on Azure Container Instances with the model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML OJ forecaster'\n",
|
"description = 'AutoML OJ forecaster'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = local_run.register_model(description = description, tags = tags)\n",
|
"model = local_run.register_model(description = description, tags = tags)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(local_run.model_id)"
|
"print(local_run.model_id)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Develop the scoring script\n",
|
"### Develop the scoring script\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Serializing and deserializing complex data frames may be tricky. We first develop the `run()` function of the scoring script locally, then write it into a scoring script. It is much easier to debug any quirks of the scoring function without crossing two compute environments. For this exercise, we handle a common quirk of how pandas dataframes serialize time stamp values."
|
"Serializing and deserializing complex data frames may be tricky. We first develop the `run()` function of the scoring script locally, then write it into a scoring script. It is much easier to debug any quirks of the scoring function without crossing two compute environments. For this exercise, we handle a common quirk of how pandas dataframes serialize time stamp values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# this is where we test the run function of the scoring script interactively\n",
|
"# this is where we test the run function of the scoring script interactively\n",
|
||||||
"# before putting it in the scoring script\n",
|
"# before putting it in the scoring script\n",
|
||||||
@@ -554,13 +573,13 @@
|
|||||||
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
||||||
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
||||||
" })"
|
" })"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# test the run function here before putting in the scoring script\n",
|
"# test the run function here before putting in the scoring script\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
@@ -574,20 +593,20 @@
|
|||||||
"y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
|
"y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
|
||||||
"y_fcst_all['forecast'] = res_dict['forecast']\n",
|
"y_fcst_all['forecast'] = res_dict['forecast']\n",
|
||||||
"y_fcst_all.head()"
|
"y_fcst_all.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now that the function works locally in the notebook, let's write it down into the scoring script. The scoring script is authored by the data scientist. Adjust it to taste, adding inputs, outputs and processing as needed."
|
"Now that the function works locally in the notebook, let's write it down into the scoring script. The scoring script is authored by the data scientist. Adjust it to taste, adding inputs, outputs and processing as needed."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score_fcast.py\n",
|
"%%writefile score_fcast.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -640,13 +659,13 @@
|
|||||||
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
||||||
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
||||||
" })"
|
" })"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# get the model\n",
|
"# get the model\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun\n",
|
"from azureml.train.automl.run import AutoMLRun\n",
|
||||||
@@ -654,13 +673,13 @@
|
|||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)\n",
|
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)\n",
|
||||||
"best_iteration = int(str.split(best_run.id,'_')[-1]) # the iteration number is a postfix of the run ID."
|
"best_iteration = int(str.split(best_run.id,'_')[-1]) # the iteration number is a postfix of the run ID."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# get the best model's dependencies and write them into this file\n",
|
"# get the best model's dependencies and write them into this file\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -674,13 +693,13 @@
|
|||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# this is the script file name we wrote a few cells above\n",
|
"# this is the script file name we wrote a few cells above\n",
|
||||||
"script_file_name = 'score_fcast.py'\n",
|
"script_file_name = 'score_fcast.py'\n",
|
||||||
@@ -702,20 +721,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a Container Image"
|
"### Create a Container Image"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -735,20 +754,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if image.creation_state == 'Failed':\n",
|
"if image.creation_state == 'Failed':\n",
|
||||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy the Image as a Web Service on Azure Container Instance"
|
"### Deploy the Image as a Web Service on Azure Container Instance"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -756,13 +775,13 @@
|
|||||||
" memory_gb = 2, \n",
|
" memory_gb = 2, \n",
|
||||||
" tags = {'type': \"automl-forecasting\"},\n",
|
" tags = {'type': \"automl-forecasting\"},\n",
|
||||||
" description = \"Automl forecasting sample service\")"
|
" description = \"Automl forecasting sample service\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -775,20 +794,20 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Call the service"
|
"### Call the service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# we send the data to the service serialized into a json string\n",
|
"# we send the data to the service serialized into a json string\n",
|
||||||
"test_sample = json.dumps({'X':X_test.to_json(), 'y' : y_query.tolist()})\n",
|
"test_sample = json.dumps({'X':X_test.to_json(), 'y' : y_query.tolist()})\n",
|
||||||
@@ -802,59 +821,35 @@
|
|||||||
" y_fcst_all['forecast'] = res_dict['forecast'] \n",
|
" y_fcst_all['forecast'] = res_dict['forecast'] \n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" print(res_dict)"
|
" print(res_dict)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_fcst_all.head()"
|
"y_fcst_all.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete the web service if desired"
|
"### Delete the web service if desired"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"serv = Webservice(ws, 'automl-forecast-01')\n",
|
"serv = Webservice(ws, 'automl-forecast-01')\n",
|
||||||
"# serv.delete() # don't do it accidentally"
|
"# serv.delete() # don't do it accidentally"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "erwright, tosingli"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.7"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
name: auto-ml-forecasting-orange-juice-sales
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
|
- statsmodels
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -53,22 +77,22 @@
|
|||||||
"- **Blacklisting** certain pipelines\n",
|
"- **Blacklisting** certain pipelines\n",
|
||||||
"- Specifying **target metrics** to indicate stopping criteria\n",
|
"- Specifying **target metrics** to indicate stopping criteria\n",
|
||||||
"- Handling **missing data** in the input"
|
"- Handling **missing data** in the input"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -81,13 +105,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -108,20 +132,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_train = digits.data[10:,:]\n",
|
"X_train = digits.data[10:,:]\n",
|
||||||
@@ -135,21 +159,21 @@
|
|||||||
"rng.shuffle(missing_samples)\n",
|
"rng.shuffle(missing_samples)\n",
|
||||||
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
||||||
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"df = pd.DataFrame(data = X_train)\n",
|
"df = pd.DataFrame(data = X_train)\n",
|
||||||
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
||||||
"df.head()"
|
"df.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -168,13 +192,13 @@
|
|||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -188,43 +212,43 @@
|
|||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -232,32 +256,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -268,81 +292,81 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model which has the smallest `accuracy` value:"
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# lookup_metric = \"accuracy\"\n",
|
"# lookup_metric = \"accuracy\"\n",
|
||||||
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# iteration = 3\n",
|
"# iteration = 3\n",
|
||||||
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View the engineered names for featurized data\n",
|
"#### View the engineered names for featurized data\n",
|
||||||
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
|
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
|
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View the featurization summary\n",
|
"#### View the featurization summary\n",
|
||||||
@@ -352,29 +376,29 @@
|
|||||||
"- Type detected\n",
|
"- Type detected\n",
|
||||||
"- If feature was dropped\n",
|
"- If feature was dropped\n",
|
||||||
"- List of feature transformations for the raw feature"
|
"- List of feature transformations for the raw feature"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test"
|
"## Test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
@@ -392,33 +416,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()\n"
|
" plt.show()\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-missing-data-blacklist-early-termination
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "xif"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,10 +53,10 @@
|
|||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)"
|
"1. [Results](#Results)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -46,22 +70,22 @@
|
|||||||
"3. Training the Model using local compute and explain the model\n",
|
"3. Training the Model using local compute and explain the model\n",
|
||||||
"4. Visualization model's feature importance in widget\n",
|
"4. Visualization model's feature importance in widget\n",
|
||||||
"5. Explore best model's explanation"
|
"5. Explore best model's explanation"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -70,13 +94,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -98,20 +122,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from sklearn import datasets\n",
|
"from sklearn import datasets\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -130,10 +154,10 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"X_train = pd.DataFrame(X_train, columns=features)\n",
|
"X_train = pd.DataFrame(X_train, columns=features)\n",
|
||||||
"X_test = pd.DataFrame(X_test, columns=features)"
|
"X_test = pd.DataFrame(X_test, columns=features)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -152,13 +176,13 @@
|
|||||||
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
|
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -172,43 +196,43 @@
|
|||||||
" y_valid = y_test,\n",
|
" y_valid = y_test,\n",
|
||||||
" model_explainability=True,\n",
|
" model_explainability=True,\n",
|
||||||
" path=project_folder)"
|
" path=project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"You will see the currently running iterations printing to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Widget for monitoring runs\n",
|
"### Widget for monitoring runs\n",
|
||||||
@@ -216,40 +240,40 @@
|
|||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Best Model 's explanation\n",
|
"### Best Model 's explanation\n",
|
||||||
@@ -264,94 +288,70 @@
|
|||||||
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case\n",
|
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Note:- The **retrieve_model_explanation()** API only works in case AutoML has been configured with **'model_explainability'** flag set to **True**. "
|
"Note:- The **retrieve_model_explanation()** API only works in case AutoML has been configured with **'model_explainability'** flag set to **True**. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
|
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
||||||
" retrieve_model_explanation(best_run)"
|
" retrieve_model_explanation(best_run)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(overall_summary)\n",
|
"print(overall_summary)\n",
|
||||||
"print(overall_imp)"
|
"print(overall_imp)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(per_class_summary)\n",
|
"print(per_class_summary)\n",
|
||||||
"print(per_class_imp)"
|
"print(per_class_imp)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
|
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl.automlexplainer import explain_model\n",
|
"from azureml.train.automl.automlexplainer import explain_model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
||||||
" explain_model(fitted_model, X_train, X_test, features=features)"
|
" explain_model(fitted_model, X_train, X_test, features=features)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(overall_summary)\n",
|
"print(overall_summary)\n",
|
||||||
"print(overall_imp)"
|
"print(overall_imp)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "xif"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
name: auto-ml-model-explanation
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
|
- azureml-explain-model
|
||||||
@@ -0,0 +1,800 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.7.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Regression with Deployment using Hardware Performance Dataset**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Data](#Data)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Results](#Results)\n",
|
||||||
|
"1. [Test](#Test)\n",
|
||||||
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"In this example we use the Predicting Compressive Strength of Concrete Dataset to showcase how you can use AutoML for a regression problem. The regression goal is to predict the compressive strength of concrete based off of different ingredient combinations and the quantities of those ingredients.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will learn how to:\n",
|
||||||
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
|
"3. Train the model using local compute.\n",
|
||||||
|
"4. Explore the results.\n",
|
||||||
|
"5. Test the best fitted model."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"import logging\n",
|
||||||
|
"\n",
|
||||||
|
"from matplotlib import pyplot as plt\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import os\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"import azureml.dataprep as dprep\n",
|
||||||
|
" \n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
|
"experiment_name = 'automl-regression-concrete'\n",
|
||||||
|
"project_folder = './sample_projects/automl-regression-concrete'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace Name'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Project Directory'] = project_folder\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||||
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
" \n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster.\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
" \n",
|
||||||
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
" \n",
|
||||||
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"if not os.path.isdir('data'):\n",
|
||||||
|
" os.mkdir('data')\n",
|
||||||
|
" \n",
|
||||||
|
"if not os.path.exists(project_folder):\n",
|
||||||
|
" os.makedirs(project_folder)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to AmlCompute\n",
|
||||||
|
"conda_run_config.target = compute_target\n",
|
||||||
|
"conda_run_config.environment.docker.enabled = True\n",
|
||||||
|
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here create the script to be run in azure compute for loading the data, load the concrete strength dataset into the X and y variables. Next, split the data using train_test_split and return X_train and y_train for training the model. Finally, return X_train and y_train for training the model."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n",
|
||||||
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
|
"dflow.get_profile()\n",
|
||||||
|
"X = dflow.drop_columns(columns=['CONCRETE'])\n",
|
||||||
|
"y = dflow.keep_columns(columns=['CONCRETE'], validate_column_exists=True)\n",
|
||||||
|
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
||||||
|
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
|
||||||
|
"dflow.head()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train\n",
|
||||||
|
"\n",
|
||||||
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"|Property|Description|\n",
|
||||||
|
"|-|-|\n",
|
||||||
|
"|**task**|classification or regression|\n",
|
||||||
|
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
|
"\n",
|
||||||
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
|
" \"iterations\": 10,\n",
|
||||||
|
" \"n_cross_validations\": 5,\n",
|
||||||
|
" \"primary_metric\": 'spearman_correlation',\n",
|
||||||
|
" \"preprocess\": True,\n",
|
||||||
|
" \"max_concurrent_iterations\": 5,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'regression',\n",
|
||||||
|
" debug_log = 'automl.log',\n",
|
||||||
|
" path = project_folder,\n",
|
||||||
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" X = X_train,\n",
|
||||||
|
" y = y_train,\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Results\n",
|
||||||
|
"Widget for Monitoring Runs\n",
|
||||||
|
"The widget will first report a \u00e2\u20ac\u0153loading status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
|
"Note: The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(remote_run).show() "
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n",
|
||||||
|
"Retrieve All Child Runs\n",
|
||||||
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"children = list(remote_run.get_children())\n",
|
||||||
|
"metricslist = {}\n",
|
||||||
|
"for run in children:\n",
|
||||||
|
" properties = run.get_properties()\n",
|
||||||
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
|
"\n",
|
||||||
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
|
"rundata"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Retrieve the Best Model\n",
|
||||||
|
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
|
"print(best_run)\n",
|
||||||
|
"print(fitted_model)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Best Model Based on Any Other Metric\n",
|
||||||
|
"Show the run and the model that has the smallest root_mean_squared_error value (which turned out to be the same as the one with largest spearman_correlation value):"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
|
"print(best_run)\n",
|
||||||
|
"print(fitted_model)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"iteration = 3\n",
|
||||||
|
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
||||||
|
"print(third_run)\n",
|
||||||
|
"print(third_model)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"description = 'AutoML Model'\n",
|
||||||
|
"tags = None\n",
|
||||||
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
|
"\n",
|
||||||
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Scoring Script\n",
|
||||||
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy\n",
|
||||||
|
"import azureml.train.automl\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
"\n",
|
||||||
|
"def run(rawdata):\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(rawdata)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return json.dumps({\"error\": result})\n",
|
||||||
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a YAML File for the Environment"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||||
|
"\n",
|
||||||
|
"# Substitute the actual model id in the script file.\n",
|
||||||
|
"\n",
|
||||||
|
"script_file_name = 'score.py'\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a Container Image\n",
|
||||||
|
"\n",
|
||||||
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
|
"or when testing a model that is under development."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
|
" execution_script = script_file_name,\n",
|
||||||
|
" conda_file = conda_env_file_name,\n",
|
||||||
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"},\n",
|
||||||
|
" description = \"Image for automl regression sample\")\n",
|
||||||
|
"\n",
|
||||||
|
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||||
|
" # this is the model object \n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config, \n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"if image.creation_state == 'Failed':\n",
|
||||||
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
||||||
|
" description = 'sample service for Automl Regression')"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-sample-concrete'\n",
|
||||||
|
"print(aci_service_name)\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Delete a Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Deletes the specified web service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Gets logs from a deployed web service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#aci_service.get_logs()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Test\n",
|
||||||
|
"\n",
|
||||||
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"X_test = X_test.to_pandas_dataframe()\n",
|
||||||
|
"y_test = y_test.to_pandas_dataframe()\n",
|
||||||
|
"y_test = np.array(y_test)\n",
|
||||||
|
"y_test = y_test[:,0]\n",
|
||||||
|
"X_train = X_train.to_pandas_dataframe()\n",
|
||||||
|
"y_train = y_train.to_pandas_dataframe()\n",
|
||||||
|
"y_train = np.array(y_train)\n",
|
||||||
|
"y_train = y_train[:,0]"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### Predict on training and test set, and calculate residual values."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
|
"\n",
|
||||||
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
|
"y_residual_test = y_test - y_pred_test\n",
|
||||||
|
"\n",
|
||||||
|
"y_residual_train.shape"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%matplotlib inline\n",
|
||||||
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
|
"\n",
|
||||||
|
"# Set up a multi-plot chart.\n",
|
||||||
|
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
||||||
|
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
||||||
|
"f.set_figheight(6)\n",
|
||||||
|
"f.set_figwidth(16)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot residual values of training set.\n",
|
||||||
|
"a0.axis([0, 360, -200, 200])\n",
|
||||||
|
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
||||||
|
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
|
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
||||||
|
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
||||||
|
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
||||||
|
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot a histogram.\n",
|
||||||
|
"#a0.hist(y_residual_train, orientation = 'horizontal', color = ['b']*len(y_residual_train), bins = 10, histtype = 'step')\n",
|
||||||
|
"#a0.hist(y_residual_train, orientation = 'horizontal', color = ['b']*len(y_residual_train), alpha = 0.2, bins = 10)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot residual values of test set.\n",
|
||||||
|
"a1.axis([0, 90, -200, 200])\n",
|
||||||
|
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
||||||
|
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
|
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
||||||
|
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
||||||
|
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
||||||
|
"a1.set_yticklabels([])\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot a histogram.\n",
|
||||||
|
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), bins = 10, histtype = 'step')\n",
|
||||||
|
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), alpha = 0.2, bins = 10)\n",
|
||||||
|
"\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Calculate metrics for the prediction\n",
|
||||||
|
"\n",
|
||||||
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
|
"from the trained model that was returned."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Plot outputs\n",
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"test_pred = plt.scatter(y_test, y_pred_test, color='b')\n",
|
||||||
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Acknowledgements\n",
|
||||||
|
"\n",
|
||||||
|
"This Predicting Compressive Strength of Concrete Dataset is made available under the CC0 1.0 Universal (CC0 1.0)\n",
|
||||||
|
"Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0)\n",
|
||||||
|
"Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set and http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength\n",
|
||||||
|
"\n",
|
||||||
|
"I-Cheng Yeh, \"Modeling of strength of high performance concrete using artificial neural networks,\" Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). \n",
|
||||||
|
"\n",
|
||||||
|
"Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-regression-concrete-strength
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -0,0 +1,800 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.7.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Regression with Deployment using Hardware Performance Dataset**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Data](#Data)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Results](#Results)\n",
|
||||||
|
"1. [Test](#Test)\n",
|
||||||
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will learn how to:\n",
|
||||||
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
|
"3. Train the model using local compute.\n",
|
||||||
|
"4. Explore the results.\n",
|
||||||
|
"5. Test the best fitted model."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"import logging\n",
|
||||||
|
"\n",
|
||||||
|
"from matplotlib import pyplot as plt\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import os\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"import azureml.dataprep as dprep\n",
|
||||||
|
" \n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
|
"experiment_name = 'automl-regression-hardware'\n",
|
||||||
|
"project_folder = './sample_projects/automl-remote-regression'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace Name'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Project Directory'] = project_folder\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||||
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
" \n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster.\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
" \n",
|
||||||
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
" \n",
|
||||||
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"if not os.path.isdir('data'):\n",
|
||||||
|
" os.mkdir('data')\n",
|
||||||
|
" \n",
|
||||||
|
"if not os.path.exists(project_folder):\n",
|
||||||
|
" os.makedirs(project_folder)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to AmlCompute\n",
|
||||||
|
"conda_run_config.target = compute_target\n",
|
||||||
|
"conda_run_config.environment.docker.enabled = True\n",
|
||||||
|
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here create the script to be run in azure compute for loading the data, load the hardware dataset into the X and y variables. Next split the data using train_test_split and return X_train and y_train for training the model."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
|
||||||
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
|
"dflow.get_profile()\n",
|
||||||
|
"X = dflow.drop_columns(columns=['ERP'])\n",
|
||||||
|
"y = dflow.keep_columns(columns=['ERP'], validate_column_exists=True)\n",
|
||||||
|
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
||||||
|
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
|
||||||
|
"dflow.head()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n",
|
||||||
|
"## Train\n",
|
||||||
|
"\n",
|
||||||
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"|Property|Description|\n",
|
||||||
|
"|-|-|\n",
|
||||||
|
"|**task**|classification or regression|\n",
|
||||||
|
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
|
"\n",
|
||||||
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
|
" \"iterations\": 10,\n",
|
||||||
|
" \"n_cross_validations\": 5,\n",
|
||||||
|
" \"primary_metric\": 'spearman_correlation',\n",
|
||||||
|
" \"preprocess\": True,\n",
|
||||||
|
" \"max_concurrent_iterations\": 5,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'regression',\n",
|
||||||
|
" debug_log = 'automl_errors_20190417.log',\n",
|
||||||
|
" path = project_folder,\n",
|
||||||
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" X = X_train,\n",
|
||||||
|
" y = y_train,\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config, show_output = False)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Results"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Widget for Monitoring Runs\n",
|
||||||
|
"\n",
|
||||||
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(remote_run).show() "
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Wait until the run finishes.\n",
|
||||||
|
"remote_run.wait_for_completion(show_output = True)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Retrieve All Child Runs\n",
|
||||||
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"children = list(remote_run.get_children())\n",
|
||||||
|
"metricslist = {}\n",
|
||||||
|
"for run in children:\n",
|
||||||
|
" properties = run.get_properties()\n",
|
||||||
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
|
"\n",
|
||||||
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
|
"rundata"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Retrieve the Best Model\n",
|
||||||
|
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
|
"print(best_run)\n",
|
||||||
|
"print(fitted_model)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
|
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
|
"print(best_run)\n",
|
||||||
|
"print(fitted_model)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"iteration = 3\n",
|
||||||
|
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
||||||
|
"print(third_run)\n",
|
||||||
|
"print(third_model)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"description = 'AutoML Model'\n",
|
||||||
|
"tags = None\n",
|
||||||
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
|
"\n",
|
||||||
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Scoring Script\n",
|
||||||
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy\n",
|
||||||
|
"import azureml.train.automl\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
"\n",
|
||||||
|
"def run(rawdata):\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(rawdata)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return json.dumps({\"error\": result})\n",
|
||||||
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a YAML File for the Environment"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||||
|
"\n",
|
||||||
|
"# Substitute the actual model id in the script file.\n",
|
||||||
|
"\n",
|
||||||
|
"script_file_name = 'score.py'\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a Container Image\n",
|
||||||
|
"\n",
|
||||||
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
|
"or when testing a model that is under development."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
|
" execution_script = script_file_name,\n",
|
||||||
|
" conda_file = conda_env_file_name,\n",
|
||||||
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"},\n",
|
||||||
|
" description = \"Image for automl regression sample\")\n",
|
||||||
|
"\n",
|
||||||
|
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||||
|
" # this is the model object \n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config, \n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"if image.creation_state == 'Failed':\n",
|
||||||
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
||||||
|
" description = 'sample service for Automl Regression')"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-sample-hardware'\n",
|
||||||
|
"print(aci_service_name)\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Delete a Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Deletes the specified web service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Gets logs from a deployed web service."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#aci_service.get_logs()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Test\n",
|
||||||
|
"\n",
|
||||||
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"X_test = X_test.to_pandas_dataframe()\n",
|
||||||
|
"y_test = y_test.to_pandas_dataframe()\n",
|
||||||
|
"y_test = np.array(y_test)\n",
|
||||||
|
"y_test = y_test[:,0]\n",
|
||||||
|
"X_train = X_train.to_pandas_dataframe()\n",
|
||||||
|
"y_train = y_train.to_pandas_dataframe()\n",
|
||||||
|
"y_train = np.array(y_train)\n",
|
||||||
|
"y_train = y_train[:,0]"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### Predict on training and test set, and calculate residual values."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
|
"\n",
|
||||||
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
|
"y_residual_test = y_test - y_pred_test"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Calculate metrics for the prediction\n",
|
||||||
|
"\n",
|
||||||
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
|
"from the trained model that was returned."
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%matplotlib inline\n",
|
||||||
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
|
"\n",
|
||||||
|
"# Set up a multi-plot chart.\n",
|
||||||
|
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
||||||
|
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
||||||
|
"f.set_figheight(6)\n",
|
||||||
|
"f.set_figwidth(16)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot residual values of training set.\n",
|
||||||
|
"a0.axis([0, 360, -200, 200])\n",
|
||||||
|
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
||||||
|
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
|
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
||||||
|
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)),fontsize = 12)\n",
|
||||||
|
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
||||||
|
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot residual values of test set.\n",
|
||||||
|
"a1.axis([0, 90, -200, 200])\n",
|
||||||
|
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
||||||
|
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
|
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
||||||
|
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)),fontsize = 12)\n",
|
||||||
|
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
||||||
|
"a1.set_yticklabels([])\n",
|
||||||
|
"\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"test_pred = plt.scatter(y_test, y_pred_test, color='')\n",
|
||||||
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.show()"
|
||||||
|
],
|
||||||
|
"cell_type": "code"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Acknowledgements\n",
|
||||||
|
"This Predicting Hardware Performance Dataset is made available under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/faizunnabi/comp-hardware-performance and https://archive.ics.uci.edu/ml/datasets/Computer+Hardware\n",
|
||||||
|
"\n",
|
||||||
|
"_**Citation Found Here**_\n"
|
||||||
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-regression-hardware-performance
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -47,22 +71,22 @@
|
|||||||
"3. Train the model using local compute.\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -74,13 +98,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -101,21 +125,21 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
||||||
"from sklearn.datasets import load_diabetes\n",
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
@@ -126,10 +150,10 @@
|
|||||||
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
||||||
"\n",
|
"\n",
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -146,13 +170,13 @@
|
|||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'regression',\n",
|
"automl_config = AutoMLConfig(task = 'regression',\n",
|
||||||
" iteration_timeout_minutes = 10,\n",
|
" iteration_timeout_minutes = 10,\n",
|
||||||
@@ -164,43 +188,43 @@
|
|||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -208,32 +232,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -244,100 +268,100 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"root_mean_squared_error\"\n",
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"print(third_run)\n",
|
"print(third_run)\n",
|
||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test"
|
"## Test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Predict on training and test set, and calculate residual values."
|
"Predict on training and test set, and calculate residual values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_pred_train = fitted_model.predict(X_train)\n",
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
"y_residual_train = y_train - y_pred_train\n",
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"y_pred_test = fitted_model.predict(X_test)\n",
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
"y_residual_test = y_test - y_pred_test"
|
"y_residual_test = y_test - y_pred_test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib inline\n",
|
"%matplotlib inline\n",
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
@@ -375,33 +399,9 @@
|
|||||||
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
name: auto-ml-regression
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
|
- paramiko<2.5.0
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -55,22 +79,22 @@
|
|||||||
"- **Cancellation** of individual iterations or the entire run\n",
|
"- **Cancellation** of individual iterations or the entire run\n",
|
||||||
"- Retrieving models for any iteration or logged metric\n",
|
"- Retrieving models for any iteration or logged metric\n",
|
||||||
"- Specifying AutoML settings as `**kwargs`"
|
"- Specifying AutoML settings as `**kwargs`"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
@@ -85,13 +109,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -112,10 +136,10 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create or Attach existing AmlCompute\n",
|
"### Create or Attach existing AmlCompute\n",
|
||||||
@@ -124,13 +148,13 @@
|
|||||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
@@ -160,23 +184,23 @@
|
|||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"For remote executions, you need to make the data accessible from the remote compute.\n",
|
"For remote executions, you need to make the data accessible from the remote compute.\n",
|
||||||
"This can be done by uploading the data to DataStore.\n",
|
"This can be done by uploading the data to DataStore.\n",
|
||||||
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
|
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data_train = datasets.load_digits()\n",
|
"data_train = datasets.load_digits()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -198,13 +222,13 @@
|
|||||||
" path_on_compute='/tmp/azureml_runs',\n",
|
" path_on_compute='/tmp/azureml_runs',\n",
|
||||||
" mode='download', # download files from datastore to compute target\n",
|
" mode='download', # download files from datastore to compute target\n",
|
||||||
" overwrite=False)"
|
" overwrite=False)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -222,13 +246,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile $project_folder/get_data.py\n",
|
"%%writefile $project_folder/get_data.py\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -239,10 +263,10 @@
|
|||||||
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
||||||
"\n",
|
"\n",
|
||||||
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
|
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -258,13 +282,13 @@
|
|||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|"
|
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\": 10,\n",
|
" \"iteration_timeout_minutes\": 10,\n",
|
||||||
@@ -283,53 +307,53 @@
|
|||||||
" data_script = project_folder + \"/get_data.py\",\n",
|
" data_script = project_folder + \"/get_data.py\",\n",
|
||||||
" **automl_settings\n",
|
" **automl_settings\n",
|
||||||
" )\n"
|
" )\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
|
||||||
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
|
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = experiment.submit(automl_config, show_output = False)"
|
"remote_run = experiment.submit(automl_config, show_output = False)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results\n",
|
"## Results\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Loading executed runs\n",
|
"#### Loading executed runs\n",
|
||||||
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
|
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "raw",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = AutoMLRun(experiment = experiment, run_id = 'AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')"
|
"remote_run = AutoMLRun(experiment = experiment, run_id = 'AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "raw"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -339,51 +363,51 @@
|
|||||||
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
|
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(remote_run).show() "
|
"RunDetails(remote_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Wait until the run finishes.\n",
|
"# Wait until the run finishes.\n",
|
||||||
"remote_run.wait_for_completion(show_output = True)"
|
"remote_run.wait_for_completion(show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(remote_run.get_children())\n",
|
"children = list(remote_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -394,123 +418,123 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Cancelling Runs\n",
|
"### Cancelling Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
||||||
"# remote_run.cancel()\n",
|
"# remote_run.cancel()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Cancel iteration 1 and move onto iteration 2.\n",
|
"# Cancel iteration 1 and move onto iteration 2.\n",
|
||||||
"# remote_run.cancel_iteration(1)"
|
"# remote_run.cancel_iteration(1)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model which has the smallest `log_loss` value:"
|
"Show the run and the model which has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = remote_run.get_output(iteration=iteration)\n",
|
"third_run, third_model = remote_run.get_output(iteration=iteration)\n",
|
||||||
"print(third_run)\n",
|
"print(third_run)\n",
|
||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_test = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model"
|
"#### Testing Our Best Fitted Model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
@@ -523,33 +547,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-remote-amlcompute
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -28,10 +52,10 @@
|
|||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -40,22 +64,22 @@
|
|||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results."
|
"In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -68,13 +92,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -98,22 +122,22 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without."
|
"Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
@@ -145,63 +169,63 @@
|
|||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" sample_weight = sample_weight,\n",
|
" sample_weight = sample_weight,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_classifier, show_output = True)\n",
|
"local_run = experiment.submit(automl_classifier, show_output = True)\n",
|
||||||
"sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output = True)\n",
|
"sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output = True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()"
|
"best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:100, :]\n",
|
"X_test = digits.data[:100, :]\n",
|
||||||
"y_test = digits.target[:100]\n",
|
"y_test = digits.target[:100]\n",
|
||||||
"images = digits.images[:100]"
|
"images = digits.images[:100]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Compare the Models\n",
|
"#### Compare the Models\n",
|
||||||
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
|
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in range(0,len(y_test)):\n",
|
"for index in range(0,len(y_test)):\n",
|
||||||
@@ -215,33 +239,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: auto-ml-sample-weight
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-train-automl
|
||||||
|
- azureml-widgets
|
||||||
|
- matplotlib
|
||||||
|
- pandas_ml
|
||||||