Compare commits
1 Commits
update-spa
...
vizhur/aut
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e792ba8278 |
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Configuration\n",
|
"# Configuration\n",
|
||||||
@@ -59,19 +83,19 @@
|
|||||||
"### What is an Azure Machine Learning workspace\n",
|
"### What is an Azure Machine Learning workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models."
|
"An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This section describes activities required before you can access any Azure ML services functionality."
|
"This section describes activities required before you can access any Azure ML services functionality."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### 1. Azure Subscription\n",
|
"### 1. Azure Subscription\n",
|
||||||
@@ -89,26 +113,26 @@
|
|||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Once installation is complete, the following cell checks the Azure ML SDK version:"
|
"Once installation is complete, the following cell checks the Azure ML SDK version:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"install"
|
"install"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"This notebook was created using version 1.0.48 of the Azure ML SDK\")\n",
|
"print(\"This notebook was created using version 1.0.48.post1 of the Azure ML SDK\")\n",
|
||||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n",
|
"If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n",
|
||||||
@@ -126,10 +150,10 @@
|
|||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
"---"
|
"---"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configure your Azure ML workspace\n",
|
"## Configure your Azure ML workspace\n",
|
||||||
@@ -155,13 +179,13 @@
|
|||||||
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
|
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Replace the default values in the cell below with your workspace parameters"
|
"Replace the default values in the cell below with your workspace parameters"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -169,22 +193,22 @@
|
|||||||
"resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"<my-resource-group>\")\n",
|
"resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"<my-resource-group>\")\n",
|
||||||
"workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"<my-workspace-name>\")\n",
|
"workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"<my-workspace-name>\")\n",
|
||||||
"workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")"
|
"workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Access your workspace\n",
|
"### Access your workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. "
|
"The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -195,10 +219,10 @@
|
|||||||
" print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n",
|
" print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" print(\"Workspace not accessible. Change your parameters or create a new workspace below\")"
|
" print(\"Workspace not accessible. Change your parameters or create a new workspace below\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a new workspace\n",
|
"### Create a new workspace\n",
|
||||||
@@ -215,17 +239,17 @@
|
|||||||
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -240,10 +264,10 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# write the details of the workspace to a configuration file to the notebook library\n",
|
"# write the details of the workspace to a configuration file to the notebook library\n",
|
||||||
"ws.write_config()"
|
"ws.write_config()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create compute resources for your training experiments\n",
|
"### Create compute resources for your training experiments\n",
|
||||||
@@ -263,13 +287,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
|
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
@@ -294,20 +318,20 @@
|
|||||||
" \n",
|
" \n",
|
||||||
" # Wait for the cluster to complete, show the output log\n",
|
" # Wait for the cluster to complete, show the output log\n",
|
||||||
" cpu_cluster.wait_for_completion(show_output=True)"
|
" cpu_cluster.wait_for_completion(show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
|
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
@@ -331,10 +355,10 @@
|
|||||||
"\n",
|
"\n",
|
||||||
" # Wait for the cluster to complete, show the output log\n",
|
" # Wait for the cluster to complete, show the output log\n",
|
||||||
" gpu_cluster.wait_for_completion(show_output=True)"
|
" gpu_cluster.wait_for_completion(show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"---\n",
|
"---\n",
|
||||||
@@ -344,40 +368,16 @@
|
|||||||
"In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n",
|
"In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process."
|
"If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": []
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "roastala"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,307 +0,0 @@
|
|||||||
## How to use the RAPIDS on AzureML materials
|
|
||||||
### Setting up requirements
|
|
||||||
The material requires the use of the Azure ML SDK and of the Jupyter Notebook Server to run the interactive execution. Please refer to instructions to [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") Follow the instructions under **Local Computer**, make sure to run the last step: <span style="font-family: Courier New;">pip install \<new package\></span> with <span style="font-family: Courier New;">new package = progressbar2 (pip install progressbar2)</span>
|
|
||||||
|
|
||||||
After following the directions, the user should end up setting a conda environment (<span style="font-family: Courier New;">myenv</span>)that can be activated in an Anaconda prompt
|
|
||||||
|
|
||||||
The user would also require an Azure Subscription with a Machine Learning Services quota on the desired region for 24 nodes or more (to be able to select a vmSize with 4 GPUs as it is used on the Notebook) on the desired VM family ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)), the specific vmSize to be used within the chosen family would also need to be whitelisted for Machine Learning Services usage.
|
|
||||||
|
|
||||||
|
|
||||||
### Getting and running the material
|
|
||||||
Clone the AzureML Notebooks repository in GitHub by running the following command on a local_directory:
|
|
||||||
|
|
||||||
* C:\local_directory>git clone https://github.com/Azure/MachineLearningNotebooks.git
|
|
||||||
|
|
||||||
On a conda prompt navigate to the local directory, activate the conda environment (<span style="font-family: Courier New;">myenv</span>), where the Azure ML SDK was installed and launch Jupyter Notebook.
|
|
||||||
|
|
||||||
* (<span style="font-family: Courier New;">myenv</span>) C:\local_directory>jupyter notebook
|
|
||||||
|
|
||||||
From the resulting browser at http://localhost:8888/tree, navigate to the master notebook:
|
|
||||||
|
|
||||||
* http://localhost:8888/tree/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
|
|
||||||
|
|
||||||
|
|
||||||
The following notebook will appear:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
### Master Jupyter Notebook
|
|
||||||
The notebook can be executed interactively step by step, by pressing the Run button (In a red circle in the above image.)
|
|
||||||
|
|
||||||
The first couple of functional steps import the necessary AzureML libraries. If you experience any errors please refer back to the [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") instructions.
|
|
||||||
|
|
||||||
|
|
||||||
#### Setting up a Workspace
|
|
||||||
The following step gathers the information necessary to set up a workspace to execute the RAPIDS script. This needs to be done only once, or not at all if you already have a workspace you can use set up on the Azure Portal:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
It is important to be sure to set the correct values for the subscription\_id, resource\_group, workspace\_name, and region before executing the step. An example is:
|
|
||||||
|
|
||||||
subscription_id = os.environ.get("SUBSCRIPTION_ID", "1358e503-xxxx-4043-xxxx-65b83xxxx32d")
|
|
||||||
resource_group = os.environ.get("RESOURCE_GROUP", "AML-Rapids-Testing")
|
|
||||||
workspace_name = os.environ.get("WORKSPACE_NAME", "AML_Rapids_Tester")
|
|
||||||
workspace_region = os.environ.get("WORKSPACE_REGION", "West US 2")
|
|
||||||
|
|
||||||
|
|
||||||
The resource\_group and workspace_name could take any value, the region should match the region for which the subscription has the required Machine Learning Services node quota.
|
|
||||||
|
|
||||||
The first time the code is executed it will redirect to the Azure Portal to validate subscription credentials. After the workspace is created, its related information is stored on a local file so that this step can be subsequently skipped. The immediate step will just load the saved workspace
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Once a workspace has been created the user could skip its creation and just jump to this step. The configuration file resides in:
|
|
||||||
|
|
||||||
* C:\local_directory\\MachineLearningNotebooks\contrib\RAPIDS\aml_config\config.json
|
|
||||||
|
|
||||||
|
|
||||||
#### Creating an AML Compute Target
|
|
||||||
Following step, creates an AML Compute Target
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Parameter vm\_size on function call AmlCompute.provisioning\_configuration() has to be a member of the VM families ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)) that are the ones provided with P40 or V100 GPUs, that are the ones supported by RAPIDS. In this particular case an Standard\_NC24s\_V2 was used.
|
|
||||||
|
|
||||||
|
|
||||||
If the output of running the step has an error of the form:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
It is an indication that even though the subscription has a node quota for VMs for that family, it does not have a node quota for Machine Learning Services for that family.
|
|
||||||
You will need to request an increase node quota for that family in that region for **Machine Learning Services**.
|
|
||||||
|
|
||||||
|
|
||||||
Another possible error is the following:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Which indicates that specified vmSize has not been whitelisted for usage on Machine Learning Services and a request to do so should be filled.
|
|
||||||
|
|
||||||
The successful creation of the compute target would have an output like the following:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### RAPIDS script uploading and viewing
|
|
||||||
The next step copies the RAPIDS script process_data.py, which is a slightly modified implementation of the [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb), into a script processing folder and it presents its contents to the user. (The script is discussed in the next section in detail).
|
|
||||||
If the user wants to use a different RAPIDS script, the references to the <span style="font-family: Courier New;">process_data.py</span> script have to be changed
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### Data Uploading
|
|
||||||
The RAPIDS script loads and extracts features from the Fannie Mae’s Mortgage Dataset to train an XGBoost prediction model. The script uses two years of data
|
|
||||||
|
|
||||||
The next few steps download and decompress the data and is made available to the script as an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data).
|
|
||||||
|
|
||||||
|
|
||||||
The following functions are used to download and decompress the input data
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||

|
|
||||||

|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The next step uses those functions to download locally file:
|
|
||||||
http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/mortgage_2000-2001.tgz'
|
|
||||||
And to decompress it, into local folder path = .\mortgage_2000-2001
|
|
||||||
The step takes several minutes, the intermediate outputs provide progress indicators.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The decompressed data should have the following structure:
|
|
||||||
* .\mortgage_2000-2001\acq\Acquisition_<year>Q<num>.txt
|
|
||||||
* .\mortgage_2000-2001\perf\Performance_<year>Q<num>.txt
|
|
||||||
* .\mortgage_2000-2001\names.csv
|
|
||||||
|
|
||||||
The data is divided in partitions that roughly correspond to yearly quarters. RAPIDS includes support for multi-node, multi-GPU deployments, enabling scaling up and out on much larger dataset sizes. The user will be able to verify that the number of partitions that the script is able to process increases with the number of GPUs used. The RAPIDS script is implemented for single-machine scenarios. An example supporting multiple nodes will be published later.
|
|
||||||
|
|
||||||
|
|
||||||
The next step upload the data into the [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) under reference <span style="font-family: Courier New;">fileroot = mortgage_2000-2001</span>
|
|
||||||
|
|
||||||
The step takes several minutes to load the data, the output provides a progress indicator.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Once the data has been loaded into the Azure Machine LEarning Data Store, in subsequent run, the user can comment out the ds.upload line and just make reference to the <span style="font-family: Courier New;">mortgage_2000-2001</blog> data store reference
|
|
||||||
|
|
||||||
|
|
||||||
#### Setting up required libraries and environment to run RAPIDS code
|
|
||||||
There are two options to setup the environment to run RAPIDS code. The following steps shows how to ues a prebuilt conda environment. A recommended alternative is to specify a base Docker image and package dependencies. You can find sample code for that in the notebook.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Wrapper function to submit the RAPIDS script as an Azure Machine Learning experiment
|
|
||||||
|
|
||||||
The next step consists of the definition of a wrapper function to be used when the user attempts to run the RAPIDS script with different arguments. It takes as arguments: <span style="font-family: Times New Roman;">*cpu\_training*</span>; a flag that indicates if the run is meant to be processed with CPU-only, <span style="font-family: Times New Roman;">*gpu\_count*</span>; the number of GPUs to be used if they are meant to be used and part_count: the number of data partitions to be used
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The core of the function resides in configuring the run by the instantiation of a ScriptRunConfig object, which defines the source_directory for the script to be executed, the name of the script and the arguments to be passed to the script.
|
|
||||||
In addition to the wrapper function arguments, two other arguments are passed: <span style="font-family: Times New Roman;">*data\_dir*</span>, the directory where the data is stored and <span style="font-family: Times New Roman;">*end_year*</span> is the largest year to use partition from.
|
|
||||||
|
|
||||||
|
|
||||||
As mentioned earlier the size of the data that can be processed increases with the number of gpus, in the function, dictionary <span style="font-family: Times New Roman;">*max\_gpu\_count\_data\_partition_mapping*</span> maps the maximum number of partitions that we empirically found that the system can handle given the number of GPUs used. The function throws a warning when the number of partitions for a given number of gpus exceeds the maximum but the script is still executed, however the user should expect an error as an out of memory situation would be encountered
|
|
||||||
If the user wants to use a different RAPIDS script, the reference to the process_data.py script has to be changed
|
|
||||||
|
|
||||||
|
|
||||||
#### Submitting Experiments
|
|
||||||
We are ready to submit experiments: launching the RAPIDS script with different sets of parameters.
|
|
||||||
|
|
||||||
|
|
||||||
The following couple of steps submit experiments under different conditions.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
The user can change variable num\_gpu between one and the number of GPUs supported by the chosen vmSize. Variable part\_count can take any value between 1 and 11, but if it exceeds the maximum for num_gpu, the run would result in an error
|
|
||||||
|
|
||||||
|
|
||||||
If the experiment is successfully submitted, it would be placed on a queue for processing, its status would appeared as Queued and an output like the following would appear
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
When the experiment starts running, its status would appeared as Running and the output would change to something like this:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Reproducing the performance gains plot results on the Blog Post
|
|
||||||
When the run has finished successfully, its status would appeared as Completed and the output would change to something like this:
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Which is the output for an experiment run with three partitions and one GPU, notice that the reported processing time is 49.16 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
This output corresponds to a run with three partitions and two GPUs, notice that the reported processing time is 37.50 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and three GPUs, notice that the reported processing time is 24.40 seconds just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and four GPUs, notice that the reported processing time is 23.33 seconds just as depicted on the performance gains plot on the blogpost
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with three partitions and using only CPU, notice that the reported processing time is 9 minutes and 1.21 seconds or 541.21 second just as depicted on the performance gains plot on the blog post
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
This output corresponds to an experiment run with nine partitions and four GPUs, notice that the notebook throws a warning signaling that the number of partitions exceed the maximum that the system can handle with those many GPUs and the run ends up failing, hence having and status of Failed.
|
|
||||||
|
|
||||||
|
|
||||||
##### Freeing Resources
|
|
||||||
In the last step the notebook deletes the compute target. (This step is optional especially if the min_nodes in the cluster is set to 0 with which the cluster will scale down to 0 nodes when there is no usage.)
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
### RAPIDS Script
|
|
||||||
The Master Notebook runs experiments by launching a RAPIDS script with different sets of parameters. In this section, the RAPIDS script, process_data.py in the material, is analyzed
|
|
||||||
|
|
||||||
The script first imports all the necessary libraries and parses the arguments passed by the Master Notebook.
|
|
||||||
|
|
||||||
The all internal functions to be used by the script are defined.
|
|
||||||
|
|
||||||
|
|
||||||
#### Wrapper Auxiliary Functions:
|
|
||||||
The below functions are wrappers for a configuration module for librmm, the RAPIDS Memory Manager python interface:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
A couple of other functions are wrappers for the submission of jobs to the DASK client:
|
|
||||||
|
|
||||||

|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Data Loading Functions:
|
|
||||||
The data is loaded through the use of the following three functions
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
All three functions use library function cudf.read_csv(), cuDF version for the well known counterpart on Pandas.
|
|
||||||
|
|
||||||
|
|
||||||
#### Data Transformation and Feature Extraction Functions:
|
|
||||||
The raw data is transformed and processed to extract features by joining, slicing, grouping, aggregating, factoring, etc, the original dataframes just as is done with Pandas. The following functions in the script are used for that purpose:
|
|
||||||

|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
#### Main() Function
|
|
||||||
The previous functions are used in the Main function to accomplish several steps: Set up the Dask client, do all ETL operations, set up and train an XGBoost model, the function also assigns which data needs to be processed by each Dask client
|
|
||||||
|
|
||||||
|
|
||||||
##### Setting Up DASK client:
|
|
||||||
The following lines:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
Initialize and set up a DASK client with a number of workers corresponding to the number of GPUs to be used on the run. A successful execution of the set up will result on the following output:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
##### All ETL functions are used on single calls to process\_quarter_gpu, one per data partition
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
##### Concentrating the data assigned to each DASK worker
|
|
||||||
The partitions assigned to each worker are concatenated and set up for training.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
##### Setting Training Parameters
|
|
||||||
The parameters used for the training of a gradient boosted decision tree model are set up in the following code block:
|
|
||||||

|
|
||||||
|
|
||||||
Notice how the parameters are modified when using the CPU-only mode.
|
|
||||||
|
|
||||||
|
|
||||||
##### Launching the training of a gradient boosted decision tree model using XGBoost.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
The outputs of the script can be observed in the master notebook as the script is executed
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "ksivas"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# NVIDIA RAPIDS in Azure Machine Learning"
|
"# NVIDIA RAPIDS in Azure Machine Learning"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL\u00c3\u201a\u00c2\u00a0and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model\u00c2\u00a0in Azure.\n",
|
"The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL\u00c3\u201a\u00c2\u00a0and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model\u00c2\u00a0in Azure.\n",
|
||||||
@@ -35,30 +59,30 @@
|
|||||||
"* An Azure subscription to create a Machine Learning Workspace\n",
|
"* An Azure subscription to create a Machine Learning Workspace\n",
|
||||||
"* Familiarity with the Azure ML SDK (refer to [notebook samples](https://github.com/Azure/MachineLearningNotebooks))\n",
|
"* Familiarity with the Azure ML SDK (refer to [notebook samples](https://github.com/Azure/MachineLearningNotebooks))\n",
|
||||||
"* A Jupyter notebook environment with Azure Machine Learning SDK installed. Refer to instructions to [setup the environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local)"
|
"* A Jupyter notebook environment with Azure Machine Learning SDK installed. Refer to instructions to [setup the environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Verify if Azure ML SDK is installed"
|
"### Verify if Azure ML SDK is installed"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"from azureml.core import Workspace, Experiment\n",
|
"from azureml.core import Workspace, Experiment\n",
|
||||||
@@ -68,17 +92,17 @@
|
|||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core import ScriptRunConfig\n",
|
"from azureml.core import ScriptRunConfig\n",
|
||||||
"from azureml.widgets import RunDetails"
|
"from azureml.widgets import RunDetails"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Azure ML Workspace"
|
"### Create Azure ML Workspace"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The following step is optional if you already have a workspace. If you want to use an existing workspace, then\n",
|
"The following step is optional if you already have a workspace. If you want to use an existing workspace, then\n",
|
||||||
@@ -86,13 +110,13 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"<font color='red'>Important</font>: in the code cell below, be sure to set the correct values for the subscription_id, \n",
|
"<font color='red'>Important</font>: in the code cell below, be sure to set the correct values for the subscription_id, \n",
|
||||||
"resource_group, workspace_name, region before executing this code cell."
|
"resource_group, workspace_name, region before executing this code cell."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"<subscription_id>\")\n",
|
"subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"<subscription_id>\")\n",
|
||||||
"resource_group = os.environ.get(\"RESOURCE_GROUP\", \"<resource_group>\")\n",
|
"resource_group = os.environ.get(\"RESOURCE_GROUP\", \"<resource_group>\")\n",
|
||||||
@@ -103,20 +127,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# write config to a local directory for future use\n",
|
"# write config to a local directory for future use\n",
|
||||||
"ws.write_config()"
|
"ws.write_config()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Load existing Workspace"
|
"### Load existing Workspace"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"# if a locally-saved configuration file for the workspace is not available, use the following to load workspace\n",
|
"# if a locally-saved configuration file for the workspace is not available, use the following to load workspace\n",
|
||||||
@@ -130,17 +154,17 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if not os.path.isdir(scripts_folder):\n",
|
"if not os.path.isdir(scripts_folder):\n",
|
||||||
" os.mkdir(scripts_folder)"
|
" os.mkdir(scripts_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create AML Compute Target"
|
"### Create AML Compute Target"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Because NVIDIA RAPIDS requires P40 or V100 GPUs, the user needs to specify compute targets from one of [NC_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview) virtual machine types in Azure; these are the families of virtual machines in Azure that are provisioned with these GPUs.\n",
|
"Because NVIDIA RAPIDS requires P40 or V100 GPUs, the user needs to specify compute targets from one of [NC_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview) virtual machine types in Azure; these are the families of virtual machines in Azure that are provisioned with these GPUs.\n",
|
||||||
@@ -148,13 +172,13 @@
|
|||||||
"Pick one of the supported VM SKUs based on the number of GPUs you want to use for ETL and training in RAPIDS.\n",
|
"Pick one of the supported VM SKUs based on the number of GPUs you want to use for ETL and training in RAPIDS.\n",
|
||||||
" \n",
|
" \n",
|
||||||
"The script in this notebook is implemented for single-machine scenarios. An example supporting multiple nodes will be published later."
|
"The script in this notebook is implemented for single-machine scenarios. An example supporting multiple nodes will be published later."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"gpu_cluster_name = \"gpucluster\"\n",
|
"gpu_cluster_name = \"gpucluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -170,27 +194,27 @@
|
|||||||
" # create the cluster\n",
|
" # create the cluster\n",
|
||||||
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n",
|
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n",
|
||||||
" gpu_cluster.wait_for_completion(show_output=True)"
|
" gpu_cluster.wait_for_completion(show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Script to process data and train model"
|
"### Script to process data and train model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The _process_data.py_ script used in the step below is a slightly modified implementation of [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb)."
|
"The _process_data.py_ script used in the step below is a slightly modified implementation of [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb)."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# copy process_data.py into the script folder\n",
|
"# copy process_data.py into the script folder\n",
|
||||||
"import shutil\n",
|
"import shutil\n",
|
||||||
@@ -198,41 +222,41 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(os.path.join(scripts_folder, './process_data.py'), 'r') as process_data_script:\n",
|
"with open(os.path.join(scripts_folder, './process_data.py'), 'r') as process_data_script:\n",
|
||||||
" print(process_data_script.read())"
|
" print(process_data_script.read())"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Data required to run this sample"
|
"### Data required to run this sample"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This sample uses [Fannie Mae's Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html). Once you obtain access to the data, you will need to make this data available in an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data), for use in this sample. The following code shows how to do that."
|
"This sample uses [Fannie Mae's Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html). Once you obtain access to the data, you will need to make this data available in an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data), for use in this sample. The following code shows how to do that."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Downloading Data"
|
"### Downloading Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<font color='red'>Important</font>: Python package progressbar2 is necessary to run the following cell. If it is not available in your environment where this notebook is running, please install it."
|
"<font color='red'>Important</font>: Python package progressbar2 is necessary to run the following cell. If it is not available in your environment where this notebook is running, please install it."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import tarfile\n",
|
"import tarfile\n",
|
||||||
"import hashlib\n",
|
"import hashlib\n",
|
||||||
@@ -287,13 +311,13 @@
|
|||||||
" pbar.finish()\n",
|
" pbar.finish()\n",
|
||||||
" print(\"...All {0} files have been decompressed\".format(numFiles))\n",
|
" print(\"...All {0} files have been decompressed\".format(numFiles))\n",
|
||||||
" tar.close()"
|
" tar.close()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fileroot = 'mortgage_2000-2001'\n",
|
"fileroot = 'mortgage_2000-2001'\n",
|
||||||
"path = '.\\\\{0}'.format(fileroot)\n",
|
"path = '.\\\\{0}'.format(fileroot)\n",
|
||||||
@@ -305,20 +329,20 @@
|
|||||||
" filename = download_file(fileroot)\n",
|
" filename = download_file(fileroot)\n",
|
||||||
" decompress_file(filename,path)\n",
|
" decompress_file(filename,path)\n",
|
||||||
" print(\"Input Data has been Downloaded and Decompressed\")"
|
" print(\"Input Data has been Downloaded and Decompressed\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Uploading Data to Workspace"
|
"### Uploading Data to Workspace"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ds = ws.get_default_datastore()\n",
|
"ds = ws.get_default_datastore()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -328,17 +352,17 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# data already uploaded to the datastore\n",
|
"# data already uploaded to the datastore\n",
|
||||||
"data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore=fileroot)"
|
"data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore=fileroot)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create AML run configuration to launch a machine learning job"
|
"### Create AML run configuration to launch a machine learning job"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"RunConfiguration is used to submit jobs to Azure Machine Learning service. When creating RunConfiguration for a job, users can either \n",
|
"RunConfiguration is used to submit jobs to Azure Machine Learning service. When creating RunConfiguration for a job, users can either \n",
|
||||||
@@ -347,27 +371,27 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"The second option is the recommended option in AML. \n",
|
"The second option is the recommended option in AML. \n",
|
||||||
"The following steps have code for both options. You can pick the one that is more appropriate for your requirements. "
|
"The following steps have code for both options. You can pick the one that is more appropriate for your requirements. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Specify prebuilt conda environment"
|
"#### Specify prebuilt conda environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The following code shows how to use an existing image from [Docker Hub](https://hub.docker.com/r/rapidsai/rapidsai/) that has a prebuilt conda environment named 'rapids' when creating a RunConfiguration. Note that this conda environment does not include azureml-defaults package that is required for using AML functionality like metrics tracking, model management etc. This package is automatically installed when you use 'Specify package dependencies' option and that is why it is the recommended option to create RunConfiguraiton in AML."
|
"The following code shows how to use an existing image from [Docker Hub](https://hub.docker.com/r/rapidsai/rapidsai/) that has a prebuilt conda environment named 'rapids' when creating a RunConfiguration. Note that this conda environment does not include azureml-defaults package that is required for using AML functionality like metrics tracking, model management etc. This package is automatically installed when you use 'Specify package dependencies' option and that is why it is the recommended option to create RunConfiguraiton in AML."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"run_config = RunConfiguration()\n",
|
"run_config = RunConfiguration()\n",
|
||||||
"run_config.framework = 'python'\n",
|
"run_config.framework = 'python'\n",
|
||||||
@@ -382,27 +406,27 @@
|
|||||||
"# run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
|
"# run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
|
||||||
"run_config.environment.spark.precache_packages = False\n",
|
"run_config.environment.spark.precache_packages = False\n",
|
||||||
"run_config.data_references={'data':data_ref.to_config()}"
|
"run_config.data_references={'data':data_ref.to_config()}"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Specify package dependencies"
|
"#### Specify package dependencies"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The following code shows how to list package dependencies in a conda environment definition file (rapids.yml) when creating a RunConfiguration"
|
"The following code shows how to list package dependencies in a conda environment definition file (rapids.yml) when creating a RunConfiguration"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n",
|
"# cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n",
|
||||||
"# run_config = RunConfiguration(conda_dependencies=cd)\n",
|
"# run_config = RunConfiguration(conda_dependencies=cd)\n",
|
||||||
@@ -416,20 +440,20 @@
|
|||||||
"# run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
|
"# run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
|
||||||
"# run_config.environment.spark.precache_packages = False\n",
|
"# run_config.environment.spark.precache_packages = False\n",
|
||||||
"# run_config.data_references={'data':data_ref.to_config()}"
|
"# run_config.data_references={'data':data_ref.to_config()}"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Wrapper function to submit Azure Machine Learning experiment"
|
"### Wrapper function to submit Azure Machine Learning experiment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# parameter cpu_predictor indicates if training should be done on CPU. If set to true, GPUs are used *only* for ETL and *not* for training\n",
|
"# parameter cpu_predictor indicates if training should be done on CPU. If set to true, GPUs are used *only* for ETL and *not* for training\n",
|
||||||
"# parameter num_gpu indicates number of GPUs to use among the GPUs available in the VM for ETL and if cpu_predictor is false, for training as well \n",
|
"# parameter num_gpu indicates number of GPUs to use among the GPUs available in the VM for ETL and if cpu_predictor is false, for training as well \n",
|
||||||
@@ -466,20 +490,20 @@
|
|||||||
" run = exp.submit(config=src)\n",
|
" run = exp.submit(config=src)\n",
|
||||||
" RunDetails(run).show()\n",
|
" RunDetails(run).show()\n",
|
||||||
" return run"
|
" return run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Submit experiment (ETL & training on GPU)"
|
"### Submit experiment (ETL & training on GPU)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"cpu_predictor = False\n",
|
"cpu_predictor = False\n",
|
||||||
"# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
|
"# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
|
||||||
@@ -487,22 +511,22 @@
|
|||||||
"data_part_count = 1\n",
|
"data_part_count = 1\n",
|
||||||
"# train using CPU, use GPU for both ETL and training\n",
|
"# train using CPU, use GPU for both ETL and training\n",
|
||||||
"run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
|
"run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Submit experiment (ETL on GPU, training on CPU)\n",
|
"### Submit experiment (ETL on GPU, training on CPU)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To observe performance difference between GPU-accelerated RAPIDS based training with CPU-only training, set 'cpu_predictor' predictor to 'True' and rerun the experiment"
|
"To observe performance difference between GPU-accelerated RAPIDS based training with CPU-only training, set 'cpu_predictor' predictor to 'True' and rerun the experiment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"cpu_predictor = True\n",
|
"cpu_predictor = True\n",
|
||||||
"# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
|
"# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
|
||||||
@@ -510,50 +534,26 @@
|
|||||||
"data_part_count = 1\n",
|
"data_part_count = 1\n",
|
||||||
"# train using CPU, use GPU for ETL\n",
|
"# train using CPU, use GPU for ETL\n",
|
||||||
"run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
|
"run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete cluster"
|
"### Delete cluster"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# delete the cluster\n",
|
"# delete the cluster\n",
|
||||||
"# gpu_cluster.delete()"
|
"# gpu_cluster.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "ksivas"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
|
Before Width: | Height: | Size: 180 KiB |
|
Before Width: | Height: | Size: 183 KiB |
|
Before Width: | Height: | Size: 183 KiB |
|
Before Width: | Height: | Size: 177 KiB |
|
Before Width: | Height: | Size: 5.0 KiB |
|
Before Width: | Height: | Size: 4.8 KiB |
|
Before Width: | Height: | Size: 3.2 KiB |
|
Before Width: | Height: | Size: 70 KiB |
|
Before Width: | Height: | Size: 64 KiB |
|
Before Width: | Height: | Size: 554 KiB |
|
Before Width: | Height: | Size: 213 KiB |
|
Before Width: | Height: | Size: 58 KiB |
|
Before Width: | Height: | Size: 34 KiB |
|
Before Width: | Height: | Size: 4.5 KiB |
|
Before Width: | Height: | Size: 187 KiB |
|
Before Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 9.7 KiB |
|
Before Width: | Height: | Size: 163 KiB |
|
Before Width: | Height: | Size: 3.5 KiB |
|
Before Width: | Height: | Size: 2.9 KiB |
|
Before Width: | Height: | Size: 2.5 KiB |
|
Before Width: | Height: | Size: 3.0 KiB |
|
Before Width: | Height: | Size: 60 KiB |
|
Before Width: | Height: | Size: 3.5 KiB |
|
Before Width: | Height: | Size: 3.9 KiB |
|
Before Width: | Height: | Size: 5.0 KiB |
|
Before Width: | Height: | Size: 4.0 KiB |
|
Before Width: | Height: | Size: 4.1 KiB |
|
Before Width: | Height: | Size: 4.5 KiB |
|
Before Width: | Height: | Size: 5.1 KiB |
|
Before Width: | Height: | Size: 3.9 KiB |
|
Before Width: | Height: | Size: 3.6 KiB |
|
Before Width: | Height: | Size: 120 KiB |
|
Before Width: | Height: | Size: 55 KiB |
|
Before Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 181 KiB |
|
Before Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 19 KiB |
|
Before Width: | Height: | Size: 45 KiB |
|
Before Width: | Height: | Size: 31 KiB |
|
Before Width: | Height: | Size: 29 KiB |
|
Before Width: | Height: | Size: 10 KiB |
|
Before Width: | Height: | Size: 18 KiB |
|
Before Width: | Height: | Size: 2.4 KiB |
|
Before Width: | Height: | Size: 2.5 KiB |
|
Before Width: | Height: | Size: 3.4 KiB |
|
Before Width: | Height: | Size: 4.8 KiB |
|
Before Width: | Height: | Size: 99 KiB |
@@ -1,35 +0,0 @@
|
|||||||
name: rapids
|
|
||||||
channels:
|
|
||||||
- nvidia
|
|
||||||
- numba
|
|
||||||
- conda-forge
|
|
||||||
- rapidsai
|
|
||||||
- defaults
|
|
||||||
- pytorch
|
|
||||||
|
|
||||||
dependencies:
|
|
||||||
- arrow-cpp=0.12.0
|
|
||||||
- bokeh
|
|
||||||
- cffi=1.11.5
|
|
||||||
- cmake=3.12
|
|
||||||
- cuda92
|
|
||||||
- cython==0.29
|
|
||||||
- dask=1.1.1
|
|
||||||
- distributed=1.25.3
|
|
||||||
- faiss-gpu=1.5.0
|
|
||||||
- numba=0.42
|
|
||||||
- numpy=1.15.4
|
|
||||||
- nvstrings
|
|
||||||
- pandas=0.23.4
|
|
||||||
- pyarrow=0.12.0
|
|
||||||
- scikit-learn
|
|
||||||
- scipy
|
|
||||||
- cudf
|
|
||||||
- cuml
|
|
||||||
- python=3.6.2
|
|
||||||
- jupyterlab
|
|
||||||
- pip:
|
|
||||||
- file:/rapids/xgboost/python-package/dist/xgboost-0.81-py3-none-any.whl
|
|
||||||
- git+https://github.com/rapidsai/dask-xgboost@dask-cudf
|
|
||||||
- git+https://github.com/rapidsai/dask-cudf@master
|
|
||||||
- git+https://github.com/rapidsai/dask-cuda@master
|
|
||||||
@@ -1,7 +1,31 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "rafarmah"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Track Data Drift between Training and Inference Data in Production \n",
|
"# Track Data Drift between Training and Inference Data in Production \n",
|
||||||
@@ -11,24 +35,24 @@
|
|||||||
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
|
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
|
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Prerequisites and Setup"
|
"# Prerequisites and Setup"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Install the DataDrift package\n",
|
"## Install the DataDrift package\n",
|
||||||
@@ -38,20 +62,20 @@
|
|||||||
"pip install azureml-contrib-datadrift\n",
|
"pip install azureml-contrib-datadrift\n",
|
||||||
"pip install lightgbm\n",
|
"pip install lightgbm\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Import Dependencies"
|
"## Import Dependencies"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
@@ -73,22 +97,22 @@
|
|||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"from sklearn.externals import joblib\n",
|
"from sklearn.externals import joblib\n",
|
||||||
"from sklearn.model_selection import train_test_split\n"
|
"from sklearn.model_selection import train_test_split\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Set up Configuraton and Create Azure ML Workspace\n",
|
"## Set up Configuraton and Create Azure ML Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
|
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
|
||||||
"prefix = \"dd\"\n",
|
"prefix = \"dd\"\n",
|
||||||
@@ -100,32 +124,32 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# optionally, set email address to receive an email alert for DataDrift\n",
|
"# optionally, set email address to receive an email alert for DataDrift\n",
|
||||||
"email_address = \"\""
|
"email_address = \"\""
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Generate Train/Testing Data\n",
|
"## Generate Train/Testing Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
|
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
|
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
|
||||||
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
|
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
|
||||||
@@ -201,24 +225,24 @@
|
|||||||
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
|
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
|
||||||
" \n",
|
" \n",
|
||||||
" return df_featurized"
|
" return df_featurized"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Train model on Jan 1 - 14, 2009 data\n",
|
"# Train model on Jan 1 - 14, 2009 data\n",
|
||||||
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
|
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
|
||||||
"df.head()"
|
"df.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"label = \"temperature\"\n",
|
"label = \"temperature\"\n",
|
||||||
"x_df = df.drop(label, axis=1)\n",
|
"x_df = df.drop(label, axis=1)\n",
|
||||||
@@ -233,20 +257,20 @@
|
|||||||
"os.makedirs(training_dir, exist_ok=True)\n",
|
"os.makedirs(training_dir, exist_ok=True)\n",
|
||||||
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
|
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
|
||||||
"training_df.to_csv(training_dir + \"/\" + training_file)"
|
"training_df.to_csv(training_dir + \"/\" + training_file)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create/Register Training Dataset"
|
"## Create/Register Training Dataset"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dataset_name = \"dataset\"\n",
|
"dataset_name = \"dataset\"\n",
|
||||||
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
|
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
|
||||||
@@ -261,20 +285,20 @@
|
|||||||
"datasets = [(Dataset.Scenario.TRAINING, trainingDataset)]\n",
|
"datasets = [(Dataset.Scenario.TRAINING, trainingDataset)]\n",
|
||||||
"print(\"dataset registration done.\\n\")\n",
|
"print(\"dataset registration done.\\n\")\n",
|
||||||
"datasets"
|
"datasets"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train and Save Model"
|
"## Train and Save Model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import lightgbm as lgb\n",
|
"import lightgbm as lgb\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -303,32 +327,32 @@
|
|||||||
" valid_sets=[train, test],\n",
|
" valid_sets=[train, test],\n",
|
||||||
" verbose_eval=50,\n",
|
" verbose_eval=50,\n",
|
||||||
" early_stopping_rounds=25)"
|
" early_stopping_rounds=25)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
|
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"os.makedirs('outputs', exist_ok=True)\n",
|
"os.makedirs('outputs', exist_ok=True)\n",
|
||||||
"joblib.dump(model, model_file)"
|
"joblib.dump(model, model_file)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register Model"
|
"## Register Model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"model = Model.register(model_path=model_file,\n",
|
"model = Model.register(model_path=model_file,\n",
|
||||||
" model_name=model_name,\n",
|
" model_name=model_name,\n",
|
||||||
@@ -336,52 +360,52 @@
|
|||||||
" datasets=datasets)\n",
|
" datasets=datasets)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(model_name, image_name, service_name, model)"
|
"print(model_name, image_name, service_name, model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Deploy Model To AKS"
|
"# Deploy Model To AKS"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": []
|
"source": [],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prepare Environment"
|
"## Prepare Environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
|
||||||
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
|
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myenv.serialize_to_string())"
|
" f.write(myenv.serialize_to_string())"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Image"
|
"## Create Image"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Image creation may take up to 15 minutes.\n",
|
"# Image creation may take up to 15 minutes.\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -402,20 +426,20 @@
|
|||||||
" image.wait_for_creation(show_output=True)\n",
|
" image.wait_for_creation(show_output=True)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" image = ws.images[image_name]"
|
" image = ws.images[image_name]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Compute Target"
|
"## Create Compute Target"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"aks_name = 'dd-demo-e2e'\n",
|
"aks_name = 'dd-demo-e2e'\n",
|
||||||
"prov_config = AksCompute.provisioning_configuration()\n",
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
@@ -430,20 +454,20 @@
|
|||||||
" print(aks_target.provisioning_errors)\n",
|
" print(aks_target.provisioning_errors)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" aks_target=ws.compute_targets[aks_name]"
|
" aks_target=ws.compute_targets[aks_name]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy Service"
|
"## Deploy Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"aks_service_name = service_name\n",
|
"aks_service_name = service_name\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -458,34 +482,34 @@
|
|||||||
" print(aks_service.state)\n",
|
" print(aks_service.state)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" aks_service = ws.webservices[aks_service_name]"
|
" aks_service = ws.webservices[aks_service_name]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Run DataDrift Analysis"
|
"# Run DataDrift Analysis"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Send Scoring Data to Service"
|
"## Send Scoring Data to Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Download Scoring Data"
|
"### Download Scoring Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Score Model on March 15, 2016 data\n",
|
"# Score Model on March 15, 2016 data\n",
|
||||||
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
|
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
|
||||||
@@ -496,13 +520,13 @@
|
|||||||
"print(\"Dropping unnecessary columns\")\n",
|
"print(\"Dropping unnecessary columns\")\n",
|
||||||
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
|
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
|
||||||
"scoring_df.head()"
|
"scoring_df.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
|
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
|
||||||
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
|
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
|
||||||
@@ -522,30 +546,30 @@
|
|||||||
"for col in difference:\n",
|
"for col in difference:\n",
|
||||||
" encoded_df[col] = 0\n",
|
" encoded_df[col] = 0\n",
|
||||||
"encoded_df.head()"
|
"encoded_df.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Serialize dataframe to list of row dictionaries\n",
|
"# Serialize dataframe to list of row dictionaries\n",
|
||||||
"encoded_dict = encoded_df.to_dict('records')"
|
"encoded_dict = encoded_df.to_dict('records')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Submit Scoring Data to Service"
|
"### Submit Scoring Data to Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -570,36 +594,36 @@
|
|||||||
"\n",
|
"\n",
|
||||||
" load = []\n",
|
" load = []\n",
|
||||||
" time.sleep(3)"
|
" time.sleep(3)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We need to wait up to 10 minutes for the Model Data Collector to dump the model input and inference data to storage in the Workspace, where it's used by the DataDriftDetector job."
|
"We need to wait up to 10 minutes for the Model Data Collector to dump the model input and inference data to storage in the Workspace, where it's used by the DataDriftDetector job."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"time.sleep(600)"
|
"time.sleep(600)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configure DataDrift"
|
"## Configure DataDrift"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"services = [service_name]\n",
|
"services = [service_name]\n",
|
||||||
"start = datetime.now() - timedelta(days=2)\n",
|
"start = datetime.now() - timedelta(days=2)\n",
|
||||||
@@ -614,48 +638,48 @@
|
|||||||
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
|
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
|
||||||
" \n",
|
" \n",
|
||||||
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
|
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Run an Adhoc DataDriftDetector Run"
|
"## Run an Adhoc DataDriftDetector Run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"target_date = datetime.today()\n",
|
"target_date = datetime.today()\n",
|
||||||
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
|
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"exp = Experiment(ws, datadrift._id)\n",
|
"exp = Experiment(ws, datadrift._id)\n",
|
||||||
"dd_run = Run(experiment=exp, run_id=run)\n",
|
"dd_run = Run(experiment=exp, run_id=run)\n",
|
||||||
"RunDetails(dd_run).show()"
|
"RunDetails(dd_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Get Drift Analysis Results"
|
"## Get Drift Analysis Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(dd_run.get_children())\n",
|
"children = list(dd_run.get_children())\n",
|
||||||
"for child in children:\n",
|
"for child in children:\n",
|
||||||
@@ -663,61 +687,37 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
|
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
|
||||||
"drift_metrics"
|
"drift_metrics"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Show all drift figures, one per serivice.\n",
|
"# Show all drift figures, one per serivice.\n",
|
||||||
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
|
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"drift_figures = datadrift.show(with_details=True)"
|
"drift_figures = datadrift.show(with_details=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Enable DataDrift Schedule"
|
"## Enable DataDrift Schedule"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"datadrift.enable_schedule()"
|
"datadrift.enable_schedule()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "rafarmah"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,22 +0,0 @@
|
|||||||
name: azure_automl
|
|
||||||
dependencies:
|
|
||||||
# The python interpreter version.
|
|
||||||
# Currently Azure ML only supports 3.5.2 and later.
|
|
||||||
- nomkl
|
|
||||||
- python>=3.5.2,<3.6.8
|
|
||||||
- nb_conda
|
|
||||||
- matplotlib==2.1.0
|
|
||||||
- numpy>=1.11.0,<=1.16.2
|
|
||||||
- cython
|
|
||||||
- urllib3<1.24
|
|
||||||
- scipy>=1.0.0,<=1.1.0
|
|
||||||
- scikit-learn>=0.19.0,<=0.20.3
|
|
||||||
- pandas>=0.22.0,<0.23.0
|
|
||||||
- py-xgboost<=0.80
|
|
||||||
|
|
||||||
- pip:
|
|
||||||
# Required packages for AzureML execution, history, and data preparation.
|
|
||||||
- azureml-sdk[automl,explain]
|
|
||||||
- azureml-widgets
|
|
||||||
- pandas_ml
|
|
||||||
|
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -31,10 +55,10 @@
|
|||||||
"1. [Deploy](#Deploy)\n",
|
"1. [Deploy](#Deploy)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"1. [Acknowledgements](#Acknowledgements)"
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -52,22 +76,22 @@
|
|||||||
"6. Create a container image.\n",
|
"6. Create a container image.\n",
|
||||||
"7. Create an Azure Container Instance (ACI) service.\n",
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
"8. Test the ACI service."
|
"8. Test the ACI service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
@@ -85,13 +109,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -113,10 +137,10 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create or Attach existing AmlCompute\n",
|
"## Create or Attach existing AmlCompute\n",
|
||||||
@@ -124,13 +148,13 @@
|
|||||||
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
@@ -160,35 +184,35 @@
|
|||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Data\n",
|
"# Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here load the data in the get_data() script to be utilized in azure compute. To do this first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_Run_config."
|
"Here load the data in the get_data() script to be utilized in azure compute. To do this first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_Run_config."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"if not os.path.isdir('data'):\n",
|
"if not os.path.isdir('data'):\n",
|
||||||
" os.mkdir('data')\n",
|
" os.mkdir('data')\n",
|
||||||
" \n",
|
" \n",
|
||||||
"if not os.path.exists(project_folder):\n",
|
"if not os.path.exists(project_folder):\n",
|
||||||
" os.makedirs(project_folder)"
|
" os.makedirs(project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -204,22 +228,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Load Data\n",
|
"### Load Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here we create the script to be run in azure comput for loading the data, we load the bank marketing dataset into X_train and y_train. Next X_train and y_train is returned for training the model."
|
"Here we create the script to be run in azure comput for loading the data, we load the bank marketing dataset into X_train and y_train. Next X_train and y_train is returned for training the model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
|
||||||
"dflow = dprep.auto_read_file(data)\n",
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
@@ -227,10 +251,10 @@
|
|||||||
"X_train = dflow.drop_columns(columns=['y'])\n",
|
"X_train = dflow.drop_columns(columns=['y'])\n",
|
||||||
"y_train = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
|
"y_train = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
|
||||||
"dflow.head()"
|
"dflow.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -249,13 +273,13 @@
|
|||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\": 5,\n",
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
@@ -275,43 +299,43 @@
|
|||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" **automl_settings\n",
|
" **automl_settings\n",
|
||||||
" )"
|
" )"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = experiment.submit(automl_config, show_output = True)"
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -319,20 +343,20 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(remote_run).show() "
|
"RunDetails(remote_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy\n",
|
"## Deploy\n",
|
||||||
@@ -340,51 +364,51 @@
|
|||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()"
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register the Fitted Model for Deployment\n",
|
"### Register the Fitted Model for Deployment\n",
|
||||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model trained on bank marketing data to predict if a client will subscribe to a term deposit'\n",
|
"description = 'AutoML Model trained on bank marketing data to predict if a client will subscribe to a term deposit'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = remote_run.register_model(description = description, tags = tags)\n",
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Scoring Script\n",
|
"### Create Scoring Script\n",
|
||||||
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -410,46 +434,46 @@
|
|||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return json.dumps({\"error\": result})\n",
|
" return json.dumps({\"error\": result})\n",
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a YAML File for the Environment"
|
"### Create a YAML File for the Environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -458,13 +482,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"conda_env_file_name = 'myenv.yml'\n",
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Substitute the actual version number in the environment file.\n",
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
@@ -485,23 +509,23 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a Container Image\n",
|
"### Create a Container Image\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
"or when testing a model that is under development."
|
"or when testing a model that is under development."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -521,22 +545,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if image.creation_state == 'Failed':\n",
|
"if image.creation_state == 'Failed':\n",
|
||||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deploy an image that contains the model and other assets needed by the service."
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -544,13 +568,13 @@
|
|||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'area': \"bmData\", 'type': \"automl_classification\"}, \n",
|
" tags = {'area': \"bmData\", 'type': \"automl_classification\"}, \n",
|
||||||
" description = 'sample service for Automl Classification')"
|
" description = 'sample service for Automl Classification')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -562,70 +586,70 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete a Web Service\n",
|
"### Delete a Web Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deletes the specified web service."
|
"Deletes the specified web service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"#aci_service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get Logs from a Deployed Web Service\n",
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Gets logs from a deployed web service."
|
"Gets logs from a deployed web service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.get_logs()"
|
"#aci_service.get_logs()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that the model is trained split our data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
"Now that the model is trained split our data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Load the bank marketing datasets.\n",
|
"# Load the bank marketing datasets.\n",
|
||||||
"from sklearn.datasets import load_diabetes\n",
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"from numpy import array"
|
"from numpy import array"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n",
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n",
|
||||||
"dflow = dprep.auto_read_file(data)\n",
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
@@ -633,62 +657,62 @@
|
|||||||
"X_test = dflow.drop_columns(columns=['y'])\n",
|
"X_test = dflow.drop_columns(columns=['y'])\n",
|
||||||
"y_test = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
|
"y_test = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
|
||||||
"dflow.head()"
|
"dflow.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_test = X_test.to_pandas_dataframe()\n",
|
"X_test = X_test.to_pandas_dataframe()\n",
|
||||||
"y_test = y_test.to_pandas_dataframe()"
|
"y_test = y_test.to_pandas_dataframe()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_pred = fitted_model.predict(X_test)\n",
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
"actual = array(y_test)\n",
|
"actual = array(y_test)\n",
|
||||||
"actual = actual[:,0]\n",
|
"actual = actual[:,0]\n",
|
||||||
"print(y_pred.shape, \" \", actual.shape)"
|
"print(y_pred.shape, \" \", actual.shape)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Calculate metrics for the prediction\n",
|
"### Calculate metrics for the prediction\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
"from the trained model that was returned."
|
"from the trained model that was returned."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib notebook\n",
|
"%matplotlib notebook\n",
|
||||||
"test_pred = plt.scatter(actual, y_pred, color='b')\n",
|
"test_pred = plt.scatter(actual, y_pred, color='b')\n",
|
||||||
"test_test = plt.scatter(actual, actual, color='g')\n",
|
"test_test = plt.scatter(actual, actual, color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Acknowledgements"
|
"## Acknowledgements"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .\n",
|
"This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .\n",
|
||||||
@@ -697,33 +721,9 @@
|
|||||||
"This data set is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing\n",
|
"This data set is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing\n",
|
||||||
"\n",
|
"\n",
|
||||||
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
|
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "v-rasav"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.7"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -31,10 +55,10 @@
|
|||||||
"1. [Deploy](#Deploy)\n",
|
"1. [Deploy](#Deploy)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"1. [Acknowledgements](#Acknowledgements)"
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -52,22 +76,22 @@
|
|||||||
"6. Create a container image.\n",
|
"6. Create a container image.\n",
|
||||||
"7. Create an Azure Container Instance (ACI) service.\n",
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
"8. Test the ACI service."
|
"8. Test the ACI service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -82,13 +106,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -110,10 +134,10 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create or Attach existing AmlCompute\n",
|
"## Create or Attach existing AmlCompute\n",
|
||||||
@@ -121,13 +145,13 @@
|
|||||||
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
@@ -157,35 +181,35 @@
|
|||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Data\n",
|
"# Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"if not os.path.isdir('data'):\n",
|
"if not os.path.isdir('data'):\n",
|
||||||
" os.mkdir('data')\n",
|
" os.mkdir('data')\n",
|
||||||
" \n",
|
" \n",
|
||||||
"if not os.path.exists(project_folder):\n",
|
"if not os.path.exists(project_folder):\n",
|
||||||
" os.makedirs(project_folder)"
|
" os.makedirs(project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -201,22 +225,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Load Data\n",
|
"### Load Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data using train_test_split and return X_train and y_train for training the model."
|
"Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data using train_test_split and return X_train and y_train for training the model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
|
||||||
"dflow = dprep.auto_read_file(data)\n",
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
@@ -225,10 +249,10 @@
|
|||||||
"y = dflow.keep_columns(columns=['Class'], validate_column_exists=True)\n",
|
"y = dflow.keep_columns(columns=['Class'], validate_column_exists=True)\n",
|
||||||
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
||||||
"y_train, y_test = y.random_split(percentage=0.8, seed=223)"
|
"y_train, y_test = y.random_split(percentage=0.8, seed=223)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -247,20 +271,20 @@
|
|||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\": 5,\n",
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
@@ -280,43 +304,43 @@
|
|||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" **automl_settings\n",
|
" **automl_settings\n",
|
||||||
" )"
|
" )"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = experiment.submit(automl_config, show_output = True)"
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -324,20 +348,20 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(remote_run).show() "
|
"RunDetails(remote_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy\n",
|
"## Deploy\n",
|
||||||
@@ -345,51 +369,51 @@
|
|||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()"
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register the Fitted Model for Deployment\n",
|
"### Register the Fitted Model for Deployment\n",
|
||||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = remote_run.register_model(description = description, tags = tags)\n",
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Scoring Script\n",
|
"### Create Scoring Script\n",
|
||||||
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -414,59 +438,59 @@
|
|||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return json.dumps({\"error\": result})\n",
|
" return json.dumps({\"error\": result})\n",
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a YAML File for the Environment"
|
"### Create a YAML File for the Environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
|
||||||
" pip_packages=['azureml-sdk[automl]'])\n",
|
" pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"conda_env_file_name = 'myenv.yml'\n",
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Substitute the actual version number in the environment file.\n",
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
@@ -487,23 +511,23 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a Container Image\n",
|
"### Create a Container Image\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
"or when testing a model that is under development."
|
"or when testing a model that is under development."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -523,22 +547,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if image.creation_state == 'Failed':\n",
|
"if image.creation_state == 'Failed':\n",
|
||||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deploy an image that contains the model and other assets needed by the service."
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -546,13 +570,13 @@
|
|||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'area': \"cards\", 'type': \"automl_classification\"}, \n",
|
" tags = {'area': \"cards\", 'type': \"automl_classification\"}, \n",
|
||||||
" description = 'sample service for Automl Classification')"
|
" description = 'sample service for Automl Classification')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -564,89 +588,89 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete a Web Service\n",
|
"### Delete a Web Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deletes the specified web service."
|
"Deletes the specified web service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"#aci_service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get Logs from a Deployed Web Service\n",
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Gets logs from a deployed web service."
|
"Gets logs from a deployed web service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.get_logs()"
|
"#aci_service.get_logs()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Randomly select and test\n",
|
"#Randomly select and test\n",
|
||||||
"X_test = X_test.to_pandas_dataframe()\n",
|
"X_test = X_test.to_pandas_dataframe()\n",
|
||||||
"y_test = y_test.to_pandas_dataframe()\n"
|
"y_test = y_test.to_pandas_dataframe()\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_pred = fitted_model.predict(X_test)\n",
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
"y_pred"
|
"y_pred"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Calculate metrics for the prediction\n",
|
"### Calculate metrics for the prediction\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
"from the trained model that was returned."
|
"from the trained model that was returned."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Randomly select and test\n",
|
"#Randomly select and test\n",
|
||||||
"# Plot outputs\n",
|
"# Plot outputs\n",
|
||||||
@@ -656,17 +680,17 @@
|
|||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()\n",
|
"plt.show()\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Acknowledgements"
|
"## Acknowledgements"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This Credit Card fraud Detection dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ and is available at: https://www.kaggle.com/mlg-ulb/creditcardfraud\n",
|
"This Credit Card fraud Detection dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ and is available at: https://www.kaggle.com/mlg-ulb/creditcardfraud\n",
|
||||||
@@ -680,33 +704,9 @@
|
|||||||
"o\tDal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)\n",
|
"o\tDal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)\n",
|
||||||
"\u00e2\u20ac\u00a2\tCarcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier\n",
|
"\u00e2\u20ac\u00a2\tCarcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier\n",
|
||||||
"\u00e2\u20ac\u00a2\tCarcillo, Fabrizio; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing"
|
"\u00e2\u20ac\u00a2\tCarcillo, Fabrizio; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "v-rasav"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.7"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,10 +53,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Deploy](#Deploy)\n",
|
"1. [Deploy](#Deploy)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -50,22 +74,22 @@
|
|||||||
"6. Create a container image.\n",
|
"6. Create a container image.\n",
|
||||||
"7. Create an Azure Container Instance (ACI) service.\n",
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
"8. Test the ACI service."
|
"8. Test the ACI service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
@@ -80,13 +104,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -108,10 +132,10 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -128,13 +152,13 @@
|
|||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_train = digits.data[10:,:]\n",
|
"X_train = digits.data[10:,:]\n",
|
||||||
@@ -150,36 +174,36 @@
|
|||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy\n",
|
"## Deploy\n",
|
||||||
@@ -187,50 +211,50 @@
|
|||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register the Fitted Model for Deployment\n",
|
"### Register the Fitted Model for Deployment\n",
|
||||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = local_run.register_model(description = description, tags = tags)\n",
|
"model = local_run.register_model(description = description, tags = tags)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(local_run.model_id) # This will be written to the script file later in the notebook."
|
"print(local_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Scoring Script"
|
"### Create Scoring Script"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -256,56 +280,56 @@
|
|||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return json.dumps({\"error\": result})\n",
|
" return json.dumps({\"error\": result})\n",
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a YAML File for the Environment"
|
"### Create a YAML File for the Environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. The following cells create a file, myenv.yml, which specifies the dependencies from the run."
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. The following cells create a file, myenv.yml, which specifies the dependencies from the run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)"
|
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dependencies = ml_run.get_run_sdk_dependencies(iteration = 7)"
|
"dependencies = ml_run.get_run_sdk_dependencies(iteration = 7)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -314,13 +338,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"conda_env_file_name = 'myenv.yml'\n",
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Substitute the actual version number in the environment file.\n",
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
@@ -341,20 +365,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a Container Image"
|
"### Create a Container Image"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -374,20 +398,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if image.creation_state == 'Failed':\n",
|
"if image.creation_state == 'Failed':\n",
|
||||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy the Image as a Web Service on Azure Container Instance"
|
"### Deploy the Image as a Web Service on Azure Container Instance"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -395,13 +419,13 @@
|
|||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n",
|
" tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n",
|
||||||
" description = 'sample service for Automl Classification')"
|
" description = 'sample service for Automl Classification')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -413,52 +437,52 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete a Web Service"
|
"### Delete a Web Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"#aci_service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get Logs from a Deployed Web Service"
|
"### Get Logs from a Deployed Web Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.get_logs()"
|
"#aci_service.get_logs()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test"
|
"## Test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Randomly select digits and test\n",
|
"#Randomly select digits and test\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
@@ -478,33 +502,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -31,10 +55,10 @@
|
|||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -50,22 +74,22 @@
|
|||||||
"2. Configure AutoML using `AutoMLConfig`.\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"3. Train the model using local compute with ONNX compatible config on.\n",
|
"3. Train the model using local compute with ONNX compatible config on.\n",
|
||||||
"4. Explore the results and save the ONNX model."
|
"4. Explore the results and save the ONNX model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -79,13 +103,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig, constants"
|
"from azureml.train.automl import AutoMLConfig, constants"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -106,22 +130,22 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
|
"This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iris = datasets.load_iris()\n",
|
"iris = datasets.load_iris()\n",
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
|
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
|
||||||
@@ -130,30 +154,30 @@
|
|||||||
" random_state=0)\n",
|
" random_state=0)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Ensure the x_train and x_test are pandas DataFrame."
|
"### Ensure the x_train and x_test are pandas DataFrame."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
||||||
"# This is needed for initializing the input variable names of ONNX model, \n",
|
"# This is needed for initializing the input variable names of ONNX model, \n",
|
||||||
"# and the prediction with the ONNX model using the inference helper.\n",
|
"# and the prediction with the ONNX model using the inference helper.\n",
|
||||||
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
||||||
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
|
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train with enable ONNX compatible models config on\n",
|
"## Train with enable ONNX compatible models config on\n",
|
||||||
@@ -172,20 +196,20 @@
|
|||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n",
|
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Set the preprocess=True, currently the InferenceHelper only supports this mode."
|
"### Set the preprocess=True, currently the InferenceHelper only supports this mode."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -198,43 +222,43 @@
|
|||||||
" preprocess=True,\n",
|
" preprocess=True,\n",
|
||||||
" enable_onnx_compatible_models=True,\n",
|
" enable_onnx_compatible_models=True,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -242,20 +266,20 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best ONNX Model\n",
|
"### Retrieve the Best ONNX Model\n",
|
||||||
@@ -263,47 +287,47 @@
|
|||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
|
"Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)"
|
"best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Save the best ONNX model"
|
"### Save the best ONNX model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
||||||
"onnx_fl_path = \"./best_model.onnx\"\n",
|
"onnx_fl_path = \"./best_model.onnx\"\n",
|
||||||
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Predict with the ONNX model, using onnxruntime package"
|
"### Predict with the ONNX model, using onnxruntime package"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import sys\n",
|
"import sys\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
@@ -342,40 +366,16 @@
|
|||||||
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
||||||
" if not onnxrt_present:\n",
|
" if not onnxrt_present:\n",
|
||||||
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": []
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -50,22 +74,22 @@
|
|||||||
"3. Train the model on a whilelisted models using local compute. \n",
|
"3. Train the model on a whilelisted models using local compute. \n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Note: This notebook will install tensorflow if not already installed in the enviornment..\n",
|
"#Note: This notebook will install tensorflow if not already installed in the enviornment..\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
@@ -90,13 +114,13 @@
|
|||||||
" whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"]\n",
|
" whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -117,32 +141,32 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"y_train = digits.target[100:]"
|
"y_train = digits.target[100:]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -160,13 +184,13 @@
|
|||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
|
"|**whitelist_models**|List of models that AutoML should use. The possible values are listed [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings).|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -179,43 +203,43 @@
|
|||||||
" enable_tf=True,\n",
|
" enable_tf=True,\n",
|
||||||
" whitelist_models=whitelist_models,\n",
|
" whitelist_models=whitelist_models,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -223,32 +247,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -259,102 +283,102 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"print(third_run)\n",
|
"print(third_run)\n",
|
||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_test = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"We will try to predict 2 digits and see how our model works."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
@@ -367,33 +391,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -31,10 +55,10 @@
|
|||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -49,22 +73,22 @@
|
|||||||
"3. Train the model using local compute.\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -77,10 +101,10 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Accessing the Azure ML workspace requires authentication with Azure.\n",
|
"Accessing the Azure ML workspace requires authentication with Azure.\n",
|
||||||
@@ -103,13 +127,13 @@
|
|||||||
"ws = Workspace.from_config(auth = auth)\n",
|
"ws = Workspace.from_config(auth = auth)\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
|
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -130,32 +154,32 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"y_train = digits.target[100:]"
|
"y_train = digits.target[100:]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -177,75 +201,75 @@
|
|||||||
"* If you specify neither the `iterations` nor the `experiment_timeout_minutes`, automated ML keeps running iterations while it continues to see improvements in the scores.\n",
|
"* If you specify neither the `iterations` nor the `experiment_timeout_minutes`, automated ML keeps running iterations while it continues to see improvements in the scores.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The following example doesn't specify `iterations` or `experiment_timeout_minutes` and so runs until the scores stop improving.\n"
|
"The following example doesn't specify `iterations` or `experiment_timeout_minutes` and so runs until the scores stop improving.\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
" primary_metric = 'AUC_weighted',\n",
|
||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" n_cross_validations = 3)"
|
" n_cross_validations = 3)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = local_run.continue_experiment(X = X_train, \n",
|
"local_run = local_run.continue_experiment(X = X_train, \n",
|
||||||
" y = y_train, \n",
|
" y = y_train, \n",
|
||||||
" show_output = True,\n",
|
" show_output = True,\n",
|
||||||
" iterations = 5)"
|
" iterations = 5)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -253,32 +277,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -289,41 +313,41 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)"
|
"print(best_run)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Print the properties of the model\n",
|
"#### Print the properties of the model\n",
|
||||||
"The fitted_model is a python object and you can read the different properties of the object.\n",
|
"The fitted_model is a python object and you can read the different properties of the object.\n",
|
||||||
"The following shows printing hyperparameters for each step in the pipeline."
|
"The following shows printing hyperparameters for each step in the pipeline."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from pprint import pprint\n",
|
"from pprint import pprint\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -346,98 +370,98 @@
|
|||||||
" print()\n",
|
" print()\n",
|
||||||
" \n",
|
" \n",
|
||||||
"print_model(fitted_model)"
|
"print_model(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)"
|
"print(best_run)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print_model(fitted_model)"
|
"print_model(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"print(third_run)"
|
"print(third_run)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print_model(third_model)"
|
"print_model(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test \n",
|
"## Test \n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_test = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"We will try to predict 2 digits and see how our model works."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
@@ -450,33 +474,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -45,29 +69,29 @@
|
|||||||
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
||||||
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
||||||
"3. Pass the `Dataflow` to AutoML for a remote run."
|
"3. Pass the `Dataflow` to AutoML for a remote run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import time\n",
|
"import time\n",
|
||||||
@@ -80,13 +104,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import azureml.dataprep as dprep\n",
|
"import azureml.dataprep as dprep\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
" \n",
|
" \n",
|
||||||
@@ -108,20 +132,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
||||||
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
||||||
@@ -130,21 +154,21 @@
|
|||||||
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
||||||
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
||||||
"dflow.get_profile()"
|
"dflow.get_profile()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
||||||
"dflow = dflow.drop_nulls('Primary Type')\n",
|
"dflow = dflow.drop_nulls('Primary Type')\n",
|
||||||
"dflow.head(5)"
|
"dflow.head(5)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Review the Data Preparation Result\n",
|
"### Review the Data Preparation Result\n",
|
||||||
@@ -152,32 +176,32 @@
|
|||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
||||||
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This creates a general AutoML settings object applicable for both local and remote runs."
|
"This creates a general AutoML settings object applicable for both local and remote runs."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\" : 10,\n",
|
" \"iteration_timeout_minutes\" : 10,\n",
|
||||||
@@ -186,20 +210,20 @@
|
|||||||
" \"preprocess\" : True,\n",
|
" \"preprocess\" : True,\n",
|
||||||
" \"verbosity\" : logging.INFO\n",
|
" \"verbosity\" : logging.INFO\n",
|
||||||
"}"
|
"}"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create or Attach an AmlCompute cluster"
|
"### Create or Attach an AmlCompute cluster"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
@@ -231,13 +255,13 @@
|
|||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -252,22 +276,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pass Data with `Dataflow` Objects\n",
|
"### Pass Data with `Dataflow` Objects\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `Dataflow` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `Dataflow` object and send it to the remote compute target. The `Dataflow` will not be evaluated locally."
|
"The `Dataflow` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `Dataflow` object and send it to the remote compute target. The `Dataflow` will not be evaluated locally."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -276,73 +300,73 @@
|
|||||||
" X = X,\n",
|
" X = X,\n",
|
||||||
" y = y,\n",
|
" y = y,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = experiment.submit(automl_config, show_output = True)"
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pre-process cache cleanup\n",
|
"### Pre-process cache cleanup\n",
|
||||||
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
|
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run.clean_preprocessor_cache()"
|
"remote_run.clean_preprocessor_cache()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Cancelling Runs\n",
|
"### Cancelling Runs\n",
|
||||||
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
||||||
"# remote_run.cancel()\n",
|
"# remote_run.cancel()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Cancel iteration 1 and move onto iteration 2.\n",
|
"# Cancel iteration 1 and move onto iteration 2.\n",
|
||||||
"# remote_run.cancel_iteration(1)"
|
"# remote_run.cancel_iteration(1)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -350,31 +374,31 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(remote_run).show()"
|
"RunDetails(remote_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(remote_run.get_children())\n",
|
"children = list(remote_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -385,101 +409,101 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the first iteration:"
|
"Show the run and the model from the first iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 0\n",
|
"iteration = 0\n",
|
||||||
"best_run, fitted_model = remote_run.get_output(iteration = iteration)\n",
|
"best_run, fitted_model = remote_run.get_output(iteration = iteration)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data\n",
|
"#### Load Test Data\n",
|
||||||
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
||||||
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will use confusion matrix to see how our model works."
|
"We will use confusion matrix to see how our model works."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from pandas_ml import ConfusionMatrix\n",
|
"from pandas_ml import ConfusionMatrix\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -494,33 +518,9 @@
|
|||||||
"print(cm)\n",
|
"print(cm)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"cm.plot()"
|
"cm.plot()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -45,29 +69,29 @@
|
|||||||
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
||||||
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
||||||
"3. Pass the `Dataflow` to AutoML for a remote run."
|
"3. Pass the `Dataflow` to AutoML for a remote run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -78,13 +102,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"import azureml.dataprep as dprep\n",
|
"import azureml.dataprep as dprep\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
" \n",
|
" \n",
|
||||||
@@ -106,20 +130,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
||||||
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
|
||||||
@@ -128,21 +152,21 @@
|
|||||||
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
|
||||||
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
|
||||||
"dflow.get_profile()"
|
"dflow.get_profile()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
|
||||||
"dflow = dflow.drop_nulls('Primary Type')\n",
|
"dflow = dflow.drop_nulls('Primary Type')\n",
|
||||||
"dflow.head(5)"
|
"dflow.head(5)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Review the Data Preparation Result\n",
|
"### Review the Data Preparation Result\n",
|
||||||
@@ -150,32 +174,32 @@
|
|||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
|
||||||
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This creates a general AutoML settings object applicable for both local and remote runs."
|
"This creates a general AutoML settings object applicable for both local and remote runs."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\" : 10,\n",
|
" \"iteration_timeout_minutes\" : 10,\n",
|
||||||
@@ -184,57 +208,57 @@
|
|||||||
" \"preprocess\" : True,\n",
|
" \"preprocess\" : True,\n",
|
||||||
" \"verbosity\" : logging.INFO\n",
|
" \"verbosity\" : logging.INFO\n",
|
||||||
"}"
|
"}"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pass Data with `Dataflow` Objects\n",
|
"### Pass Data with `Dataflow` Objects\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
" X = X,\n",
|
" X = X,\n",
|
||||||
" y = y,\n",
|
" y = y,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -242,31 +266,31 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show()"
|
"RunDetails(local_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -277,101 +301,101 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the first iteration:"
|
"Show the run and the model from the first iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 0\n",
|
"iteration = 0\n",
|
||||||
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
|
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data\n",
|
"#### Load Test Data\n",
|
||||||
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
|
||||||
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
"dflow_test = dflow_test.drop_nulls('Primary Type')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will use confusion matrix to see how our model works."
|
"We will use confusion matrix to see how our model works."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from pandas_ml import ConfusionMatrix\n",
|
"from pandas_ml import ConfusionMatrix\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -385,33 +409,9 @@
|
|||||||
"print(cm)\n",
|
"print(cm)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"cm.plot()"
|
"cm.plot()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,10 +53,10 @@
|
|||||||
"1. [Explore](#Explore)\n",
|
"1. [Explore](#Explore)\n",
|
||||||
"1. [Download](#Download)\n",
|
"1. [Download](#Download)\n",
|
||||||
"1. [Register](#Register)"
|
"1. [Register](#Register)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -45,20 +69,20 @@
|
|||||||
"2. List all AutoML runs in an experiment.\n",
|
"2. List all AutoML runs in an experiment.\n",
|
||||||
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
||||||
"4. Download a fitted pipeline for any iteration."
|
"4. Download a fitted pipeline for any iteration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup"
|
"## Setup"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
@@ -66,36 +90,36 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()"
|
"ws = Workspace.from_config()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Explore"
|
"## Explore"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### List Experiments"
|
"### List Experiments"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"experiment_list = Experiment.list(workspace=ws)\n",
|
"experiment_list = Experiment.list(workspace=ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -106,21 +130,21 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"summary_df.T"
|
"summary_df.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### List runs for an experiment\n",
|
"### List runs for an experiment\n",
|
||||||
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -146,22 +170,22 @@
|
|||||||
"from IPython.display import display\n",
|
"from IPython.display import display\n",
|
||||||
"display(projname_html)\n",
|
"display(projname_html)\n",
|
||||||
"display(summary_df.T)"
|
"display(summary_df.T)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get details for a run\n",
|
"### Get details for a run\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
|
"run_id = automl_runs_project[0] # Replace with your own run_id from above run ids\n",
|
||||||
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
|
"assert (run_id in summary_df.keys()), \"Run id not found! Please set run id to a value from above run ids\"\n",
|
||||||
@@ -207,143 +231,119 @@
|
|||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"display(HTML('<h3>Metrics</h3>'))\n",
|
"display(HTML('<h3>Metrics</h3>'))\n",
|
||||||
"display(rundata)\n"
|
"display(rundata)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Download"
|
"## Download"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Download the Best Model for Any Given Metric"
|
"### Download the Best Model for Any Given Metric"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
||||||
"fitted_model"
|
"fitted_model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Download the Model for Any Given Iteration"
|
"### Download the Model for Any Given Iteration"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 1 # Replace with an iteration number.\n",
|
"iteration = 1 # Replace with an iteration number.\n",
|
||||||
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
||||||
"fitted_model"
|
"fitted_model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register"
|
"## Register"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register fitted model for deployment\n",
|
"### Register fitted model for deployment\n",
|
||||||
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags)\n",
|
"ml_run.register_model(description = description, tags = tags)\n",
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register the Best Model for Any Given Metric"
|
"### Register the Best Model for Any Given Metric"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register the Model for Any Given Iteration"
|
"### Register the Model for Any Given Iteration"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 1 # Replace with an iteration number.\n",
|
"iteration = 1 # Replace with an iteration number.\n",
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
||||||
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "xiaga@microsoft.com, tosingli@microsoft.com, erwright@microsoft.com"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.8"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,10 +53,10 @@
|
|||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Evaluate](#Evaluate)"
|
"1. [Evaluate](#Evaluate)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -47,20 +71,20 @@
|
|||||||
"2. Configuration and local run of AutoML for a time-series model with lag and holiday features \n",
|
"2. Configuration and local run of AutoML for a time-series model with lag and holiday features \n",
|
||||||
"3. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
|
"3. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
|
||||||
"4. Evaluating the fitted model using a rolling test "
|
"4. Evaluating the fitted model using a rolling test "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n"
|
"## Setup\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
@@ -78,20 +102,20 @@
|
|||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
|
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -113,28 +137,28 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"Read bike share demand data from file, and preview data."
|
"Read bike share demand data from file, and preview data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = pd.read_csv('bike-no.csv', parse_dates=['date'])\n",
|
"data = pd.read_csv('bike-no.csv', parse_dates=['date'])\n",
|
||||||
"data.head()"
|
"data.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Let's set up what we know about the dataset. \n",
|
"Let's set up what we know about the dataset. \n",
|
||||||
@@ -146,33 +170,33 @@
|
|||||||
"**Grain** is another word for an individual time series in your dataset. Grains are identified by values of the columns listed `grain_column_names`, for example \"store\" and \"item\" if your data has multiple time series of sales, one series for each combination of store and item sold.\n",
|
"**Grain** is another word for an individual time series in your dataset. Grains are identified by values of the columns listed `grain_column_names`, for example \"store\" and \"item\" if your data has multiple time series of sales, one series for each combination of store and item sold.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This dataset has only one time series. Please see the [orange juice notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales) for an example of a multi-time series dataset."
|
"This dataset has only one time series. Please see the [orange juice notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales) for an example of a multi-time series dataset."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"target_column_name = 'cnt'\n",
|
"target_column_name = 'cnt'\n",
|
||||||
"time_column_name = 'date'\n",
|
"time_column_name = 'date'\n",
|
||||||
"grain_column_names = []"
|
"grain_column_names = []"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Split the data\n",
|
"## Split the data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The first split we make is into train and test sets. Note we are splitting on time."
|
"The first split we make is into train and test sets. Note we are splitting on time."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"train = data[data[time_column_name] < '2012-09-01']\n",
|
"train = data[data[time_column_name] < '2012-09-01']\n",
|
||||||
"test = data[data[time_column_name] >= '2012-09-01']\n",
|
"test = data[data[time_column_name] >= '2012-09-01']\n",
|
||||||
@@ -187,28 +211,28 @@
|
|||||||
"print(y_train.shape)\n",
|
"print(y_train.shape)\n",
|
||||||
"print(X_test.shape)\n",
|
"print(X_test.shape)\n",
|
||||||
"print(y_test.shape)"
|
"print(y_test.shape)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Setting forecaster maximum horizon \n",
|
"### Setting forecaster maximum horizon \n",
|
||||||
"\n",
|
"\n",
|
||||||
"The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 14 periods (i.e. 14 days). Notice that this is much shorter than the number of days in the test set; we will need to use a rolling test to evaluate the performance on the whole test set. For more discussion of forecast horizons and guiding principles for setting them, please see the [energy demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand). "
|
"The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 14 periods (i.e. 14 days). Notice that this is much shorter than the number of days in the test set; we will need to use a rolling test to evaluate the performance on the whole test set. For more discussion of forecast horizons and guiding principles for setting them, please see the [energy demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand). "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"max_horizon = 14"
|
"max_horizon = 14"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -226,13 +250,13 @@
|
|||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
|
"|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" 'time_column_name': time_column_name,\n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
@@ -254,78 +278,78 @@
|
|||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" verbosity=logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" **automl_settings)"
|
" **automl_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We will now run the experiment, starting with 10 iterations of model search. The experiment can be continued for more iterations if more accurate results are required. You will see the currently running iterations printing to the console."
|
"We will now run the experiment, starting with 10 iterations of model search. The experiment can be continued for more iterations if more accurate results are required. You will see the currently running iterations printing to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!"
|
"Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"fitted_model.steps"
|
"fitted_model.steps"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### View the engineered names for featurized data\n",
|
"### View the engineered names for featurized data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can accees the engineered feature names generated in time-series featurization. Note that a number of named holiday periods are represented. We recommend that you have at least one year of data when using this feature to ensure that all yearly holidays are captured in the training featurization."
|
"You can accees the engineered feature names generated in time-series featurization. Note that a number of named holiday periods are represented. We recommend that you have at least one year of data when using this feature to ensure that all yearly holidays are captured in the training featurization."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
|
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### View the featurization summary\n",
|
"### View the featurization summary\n",
|
||||||
@@ -337,56 +361,56 @@
|
|||||||
"- Type detected\n",
|
"- Type detected\n",
|
||||||
"- If feature was dropped\n",
|
"- If feature was dropped\n",
|
||||||
"- List of feature transformations for the raw feature"
|
"- List of feature transformations for the raw feature"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()"
|
"fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Evaluate"
|
"## Evaluate"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We now use the best fitted model from the AutoML Run to make forecasts for the test set. \n",
|
"We now use the best fitted model from the AutoML Run to make forecasts for the test set. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"We always score on the original dataset whose schema matches the training set schema."
|
"We always score on the original dataset whose schema matches the training set schema."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_test.head()"
|
"X_test.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We now define some functions for aligning output to input and for producing rolling forecasts over the full test set. As previously stated, the forecast horizon of 14 days is shorter than the length of the test set - which is about 120 days. To get predictions over the full test set, we iterate over the test set, making forecasts 14 days at a time and combining the results. We also make sure that each 14-day forecast uses up-to-date actuals - the current context - to construct lag features. \n",
|
"We now define some functions for aligning output to input and for producing rolling forecasts over the full test set. As previously stated, the forecast horizon of 14 days is shorter than the length of the test set - which is about 120 days. To get predictions over the full test set, we iterate over the test set, making forecasts 14 days at a time and combining the results. We also make sure that each 14-day forecast uses up-to-date actuals - the current context - to construct lag features. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
|
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name='predicted',\n",
|
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name='predicted',\n",
|
||||||
" horizon_colname='horizon_origin'):\n",
|
" horizon_colname='horizon_origin'):\n",
|
||||||
@@ -464,30 +488,30 @@
|
|||||||
" origin_time = horizon_time\n",
|
" origin_time = horizon_time\n",
|
||||||
" \n",
|
" \n",
|
||||||
" return pd.concat(df_list, ignore_index=True)"
|
" return pd.concat(df_list, ignore_index=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"df_all = do_rolling_forecast(fitted_model, X_test, y_test, max_horizon)\n",
|
"df_all = do_rolling_forecast(fitted_model, X_test, y_test, max_horizon)\n",
|
||||||
"df_all"
|
"df_all"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We now calculate some error metrics for the forecasts and vizualize the predictions vs. the actuals."
|
"We now calculate some error metrics for the forecasts and vizualize the predictions vs. the actuals."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def APE(actual, pred):\n",
|
"def APE(actual, pred):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
@@ -506,13 +530,13 @@
|
|||||||
" actual_safe = actual[not_na & not_zero]\n",
|
" actual_safe = actual[not_na & not_zero]\n",
|
||||||
" pred_safe = pred[not_na & not_zero]\n",
|
" pred_safe = pred[not_na & not_zero]\n",
|
||||||
" return np.mean(APE(actual_safe, pred_safe))"
|
" return np.mean(APE(actual_safe, pred_safe))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"Simple forecasting model\")\n",
|
"print(\"Simple forecasting model\")\n",
|
||||||
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
||||||
@@ -527,39 +551,39 @@
|
|||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The MAPE seems high; it is being skewed by an actual with a small absolute value. For a more informative evaluation, we can calculate the metrics by forecast horizon:"
|
"The MAPE seems high; it is being skewed by an actual with a small absolute value. For a more informative evaluation, we can calculate the metrics by forecast horizon:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"df_all.groupby('horizon_origin').apply(\n",
|
"df_all.groupby('horizon_origin').apply(\n",
|
||||||
" lambda df: pd.Series({'MAPE': MAPE(df[target_column_name], df['predicted']),\n",
|
" lambda df: pd.Series({'MAPE': MAPE(df[target_column_name], df['predicted']),\n",
|
||||||
" 'RMSE': np.sqrt(mean_squared_error(df[target_column_name], df['predicted'])),\n",
|
" 'RMSE': np.sqrt(mean_squared_error(df[target_column_name], df['predicted'])),\n",
|
||||||
" 'MAE': mean_absolute_error(df[target_column_name], df['predicted'])}))"
|
" 'MAE': mean_absolute_error(df[target_column_name], df['predicted'])}))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"It's also interesting to see the distributions of APE (absolute percentage error) by horizon. On a log scale, the outlying APE in the horizon-3 group is clear."
|
"It's also interesting to see the distributions of APE (absolute percentage error) by horizon. On a log scale, the outlying APE in the horizon-3 group is clear."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"df_all_APE = df_all.assign(APE=APE(df_all[target_column_name], df_all['predicted']))\n",
|
"df_all_APE = df_all.assign(APE=APE(df_all[target_column_name], df_all['predicted']))\n",
|
||||||
"APEs = [df_all_APE[df_all['horizon_origin'] == h].APE.values for h in range(1, max_horizon + 1)]\n",
|
"APEs = [df_all_APE[df_all['horizon_origin'] == h].APE.values for h in range(1, max_horizon + 1)]\n",
|
||||||
@@ -572,33 +596,9 @@
|
|||||||
"plt.title('Absolute Percentage Errors by Forecast Horizon')\n",
|
"plt.title('Absolute Percentage Errors by Forecast Horizon')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "xiaga@microsoft.com, tosingli@microsoft.com, erwright@microsoft.com"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.8"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "xiaga, tosingli, erwright"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.8"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -28,10 +52,10 @@
|
|||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)"
|
"1. [Train](#Train)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -45,20 +69,20 @@
|
|||||||
"3. View engineered features and prediction results\n",
|
"3. View engineered features and prediction results\n",
|
||||||
"4. Configuration and local run of AutoML for a time-series model with lag and rolling window features\n",
|
"4. Configuration and local run of AutoML for a time-series model with lag and rolling window features\n",
|
||||||
"5. Estimate feature importance"
|
"5. Estimate feature importance"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n"
|
"## Setup\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
@@ -74,20 +98,20 @@
|
|||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from matplotlib import pyplot as plt\n",
|
"from matplotlib import pyplot as plt\n",
|
||||||
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score"
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
|
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -109,46 +133,46 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"We will use energy consumption data from New York City for model training. The data is stored in a tabular format and includes energy demand and basic weather data at an hourly frequency. Pandas CSV reader is used to read the file into memory. Special attention is given to the \"timeStamp\" column in the data since it contains text which should be parsed as datetime-type objects. "
|
"We will use energy consumption data from New York City for model training. The data is stored in a tabular format and includes energy demand and basic weather data at an hourly frequency. Pandas CSV reader is used to read the file into memory. Special attention is given to the \"timeStamp\" column in the data since it contains text which should be parsed as datetime-type objects. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = pd.read_csv(\"nyc_energy.csv\", parse_dates=['timeStamp'])\n",
|
"data = pd.read_csv(\"nyc_energy.csv\", parse_dates=['timeStamp'])\n",
|
||||||
"data.head()"
|
"data.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We must now define the schema of this dataset. Every time-series must have a time column and a target. The target quantity is what will be eventually forecasted by a trained model. In this case, the target is the \"demand\" column. The other columns, \"temp\" and \"precip,\" are implicitly designated as features."
|
"We must now define the schema of this dataset. Every time-series must have a time column and a target. The target quantity is what will be eventually forecasted by a trained model. In this case, the target is the \"demand\" column. The other columns, \"temp\" and \"precip,\" are implicitly designated as features."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Dataset schema\n",
|
"# Dataset schema\n",
|
||||||
"time_column_name = 'timeStamp'\n",
|
"time_column_name = 'timeStamp'\n",
|
||||||
"target_column_name = 'demand'"
|
"target_column_name = 'demand'"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Forecast Horizon\n",
|
"### Forecast Horizon\n",
|
||||||
@@ -159,30 +183,30 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Forecast horizons in AutoML are given as integer multiples of the time-series frequency. In this example, we set the horizon to 48 hours."
|
"Forecast horizons in AutoML are given as integer multiples of the time-series frequency. In this example, we set the horizon to 48 hours."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"max_horizon = 48"
|
"max_horizon = 48"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Split the data into train and test sets\n",
|
"### Split the data into train and test sets\n",
|
||||||
"We now split the data into a train and a test set so that we may evaluate model performance. We note that the tail of the dataset contains a large number of NA values in the target column, so we designate the test set as the 48 hour window ending on the latest date of known energy demand. "
|
"We now split the data into a train and a test set so that we may evaluate model performance. We note that the tail of the dataset contains a large number of NA values in the target column, so we designate the test set as the 48 hour window ending on the latest date of known energy demand. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Find time point to split on\n",
|
"# Find time point to split on\n",
|
||||||
"latest_known_time = data[~pd.isnull(data[target_column_name])][time_column_name].max()\n",
|
"latest_known_time = data[~pd.isnull(data[target_column_name])][time_column_name].max()\n",
|
||||||
@@ -195,10 +219,10 @@
|
|||||||
"# Move the target values into their own arrays \n",
|
"# Move the target values into their own arrays \n",
|
||||||
"y_train = X_train.pop(target_column_name).values\n",
|
"y_train = X_train.pop(target_column_name).values\n",
|
||||||
"y_test = X_test.pop(target_column_name).values"
|
"y_test = X_test.pop(target_column_name).values"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -215,13 +239,13 @@
|
|||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way.|\n",
|
"|**n_cross_validations**|Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"time_series_settings = {\n",
|
"time_series_settings = {\n",
|
||||||
" 'time_column_name': time_column_name,\n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
@@ -239,72 +263,72 @@
|
|||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" **time_series_settings)"
|
" **time_series_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Submitting the configuration will start a new run in this experiment. For local runs, the execution is synchronous. Depending on the data and number of iterations, this can run for a while. Parameters controlling concurrency may speed up the process, depending on your hardware.\n",
|
"Submitting the configuration will start a new run in this experiment. For local runs, the execution is synchronous. Depending on the data and number of iterations, this can run for a while. Parameters controlling concurrency may speed up the process, depending on your hardware.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"You will see the currently running iterations printing to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
"Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"fitted_model.steps"
|
"fitted_model.steps"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### View the engineered names for featurized data\n",
|
"### View the engineered names for featurized data\n",
|
||||||
"Below we display the engineered feature names generated for the featurized data using the time-series featurization."
|
"Below we display the engineered feature names generated for the featurized data using the time-series featurization."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
|
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Test the Best Fitted Model\n",
|
"### Test the Best Fitted Model\n",
|
||||||
@@ -314,13 +338,13 @@
|
|||||||
"We need to pass the recent values of the target variable `y`, whereas the scikit-compatible `predict` function only takes the non-target variables `X`. In our case, the test data immediately follows the training data, and we fill the `y` variable with `NaN`. The `NaN` serves as a question mark for the forecaster to fill with the actuals. Using the forecast function will produce forecasts using the shortest possible forecast horizon. The last time at which a definite (non-NaN) value is seen is the _forecast origin_ - the last time when the value of the target is known. \n",
|
"We need to pass the recent values of the target variable `y`, whereas the scikit-compatible `predict` function only takes the non-target variables `X`. In our case, the test data immediately follows the training data, and we fill the `y` variable with `NaN`. The `NaN` serves as a question mark for the forecaster to fill with the actuals. Using the forecast function will produce forecasts using the shortest possible forecast horizon. The last time at which a definite (non-NaN) value is seen is the _forecast origin_ - the last time when the value of the target is known. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Using the `predict` method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use."
|
"Using the `predict` method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Replace ALL values in y_pred by NaN. \n",
|
"# Replace ALL values in y_pred by NaN. \n",
|
||||||
"# The forecast origin will be at the beginning of the first forecast period\n",
|
"# The forecast origin will be at the beginning of the first forecast period\n",
|
||||||
@@ -331,13 +355,13 @@
|
|||||||
"# This contains the assumptions that were made in the forecast\n",
|
"# This contains the assumptions that were made in the forecast\n",
|
||||||
"# and helps align the forecast to the original data\n",
|
"# and helps align the forecast to the original data\n",
|
||||||
"y_fcst, X_trans = fitted_model.forecast(X_test, y_query)"
|
"y_fcst, X_trans = fitted_model.forecast(X_test, y_query)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# limit the evaluation to data where y_test has actuals\n",
|
"# limit the evaluation to data where y_test has actuals\n",
|
||||||
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
||||||
@@ -373,37 +397,37 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"df_all = align_outputs(y_fcst, X_trans, X_test, y_test)\n",
|
"df_all = align_outputs(y_fcst, X_trans, X_test, y_test)\n",
|
||||||
"df_all.head()"
|
"df_all.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Looking at `X_trans` is also useful to see what featurization happened to the data."
|
"Looking at `X_trans` is also useful to see what featurization happened to the data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_trans"
|
"X_trans"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Calculate accuracy metrics\n",
|
"### Calculate accuracy metrics\n",
|
||||||
"Finally, we calculate some accuracy metrics for the forecast and plot the predictions vs. the actuals over the time range in the test set."
|
"Finally, we calculate some accuracy metrics for the forecast and plot the predictions vs. the actuals over the time range in the test set."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def MAPE(actual, pred):\n",
|
"def MAPE(actual, pred):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
@@ -416,13 +440,13 @@
|
|||||||
" pred_safe = pred[not_na & not_zero]\n",
|
" pred_safe = pred[not_na & not_zero]\n",
|
||||||
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
||||||
" return np.mean(APE)"
|
" return np.mean(APE)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"Simple forecasting model\")\n",
|
"print(\"Simple forecasting model\")\n",
|
||||||
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
||||||
@@ -440,36 +464,36 @@
|
|||||||
"plt.title('Prediction vs. Actual Time-Series')\n",
|
"plt.title('Prediction vs. Actual Time-Series')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The distribution looks a little heavy tailed: we underestimate the excursions of the extremes. A normal-quantile transform of the target might help, but let's first try using some past data with the lags and rolling window transforms.\n"
|
"The distribution looks a little heavy tailed: we underestimate the excursions of the extremes. A normal-quantile transform of the target might help, but let's first try using some past data with the lags and rolling window transforms.\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Using lags and rolling window features"
|
"### Using lags and rolling window features"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation.\n",
|
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that we configured target lags, that is the previous values of the target variables, and the prediction is no longer horizon-less. We therefore must still specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features."
|
"Now that we configured target lags, that is the previous values of the target variables, and the prediction is no longer horizon-less. We therefore must still specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"time_series_settings_with_lags = {\n",
|
"time_series_settings_with_lags = {\n",
|
||||||
" 'time_column_name': time_column_name,\n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
@@ -490,50 +514,50 @@
|
|||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" verbosity=logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" **time_series_settings_with_lags)"
|
" **time_series_settings_with_lags)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We now start a new local run, this time with lag and rolling window featurization. AutoML applies featurizations in the setup stage, prior to iterating over ML models. The full training set is featurized first, followed by featurization of each of the CV splits. Lag and rolling window features introduce additional complexity, so the run will take longer than in the previous example that lacked these featurizations."
|
"We now start a new local run, this time with lag and rolling window featurization. AutoML applies featurizations in the setup stage, prior to iterating over ML models. The full training set is featurized first, followed by featurization of each of the CV splits. Lag and rolling window features introduce additional complexity, so the run will take longer than in the previous example that lacked these featurizations."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run_lags = experiment.submit(automl_config_lags, show_output=True)"
|
"local_run_lags = experiment.submit(automl_config_lags, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run_lags, fitted_model_lags = local_run_lags.get_output()\n",
|
"best_run_lags, fitted_model_lags = local_run_lags.get_output()\n",
|
||||||
"y_fcst_lags, X_trans_lags = fitted_model_lags.forecast(X_test, y_query)\n",
|
"y_fcst_lags, X_trans_lags = fitted_model_lags.forecast(X_test, y_query)\n",
|
||||||
"df_lags = align_outputs(y_fcst_lags, X_trans_lags, X_test, y_test)\n",
|
"df_lags = align_outputs(y_fcst_lags, X_trans_lags, X_test, y_test)\n",
|
||||||
"df_lags.head()"
|
"df_lags.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_trans_lags"
|
"X_trans_lags"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"Forecasting model with lags\")\n",
|
"print(\"Forecasting model with lags\")\n",
|
||||||
"rmse = np.sqrt(mean_squared_error(df_lags[target_column_name], df_lags['predicted']))\n",
|
"rmse = np.sqrt(mean_squared_error(df_lags[target_column_name], df_lags['predicted']))\n",
|
||||||
@@ -549,20 +573,20 @@
|
|||||||
"plt.xticks(fontsize=8)\n",
|
"plt.xticks(fontsize=8)\n",
|
||||||
"plt.legend((pred, actual), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((pred, actual), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### What features matter for the forecast?"
|
"### What features matter for the forecast?"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl.automlexplainer import explain_model\n",
|
"from azureml.train.automl.automlexplainer import explain_model\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -572,42 +596,18 @@
|
|||||||
"# unpack the tuple\n",
|
"# unpack the tuple\n",
|
||||||
"shap_values, expected_values, feat_overall_imp, feat_names, per_class_summary, per_class_imp = expl\n",
|
"shap_values, expected_values, feat_overall_imp, feat_names, per_class_summary, per_class_imp = expl\n",
|
||||||
"best_run_lags"
|
"best_run_lags"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Please go to the Azure Portal's best run to see the top features chart.\n",
|
"Please go to the Azure Portal's best run to see the top features chart.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The informative features make all sorts of intuitive sense. Temperature is a strong driver of heating and cooling demand in NYC. Apart from that, the daily life cycle, expressed by `hour`, and the weekly cycle, expressed by `wday` drives people's energy use habits."
|
"The informative features make all sorts of intuitive sense. Temperature is a strong driver of heating and cooling demand in NYC. Apart from that, the daily life cycle, expressed by `hour`, and the weekly cycle, expressed by `wday` drives people's energy use habits."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "xiaga, tosingli, erwright"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.8"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "erwright, tosingli"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.8"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Predict](#Predict)\n",
|
"1. [Predict](#Predict)\n",
|
||||||
"1. [Operationalize](#Operationalize)"
|
"1. [Operationalize](#Operationalize)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -42,20 +66,20 @@
|
|||||||
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
|
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup"
|
"## Setup"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
@@ -70,20 +94,20 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem. "
|
"As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -105,79 +129,79 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
|
"You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"time_column_name = 'WeekStarting'\n",
|
"time_column_name = 'WeekStarting'\n",
|
||||||
"data = pd.read_csv(\"dominicks_OJ.csv\", parse_dates=[time_column_name])\n",
|
"data = pd.read_csv(\"dominicks_OJ.csv\", parse_dates=[time_column_name])\n",
|
||||||
"data.head()"
|
"data.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
|
"Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we thus define the **grain** - the columns whose values determine the boundaries between time-series: "
|
"The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we thus define the **grain** - the columns whose values determine the boundaries between time-series: "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"grain_column_names = ['Store', 'Brand']\n",
|
"grain_column_names = ['Store', 'Brand']\n",
|
||||||
"nseries = data.groupby(grain_column_names).ngroups\n",
|
"nseries = data.groupby(grain_column_names).ngroups\n",
|
||||||
"print('Data contains {0} individual time-series.'.format(nseries))"
|
"print('Data contains {0} individual time-series.'.format(nseries))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"For demonstration purposes, we extract sales time-series for just a few of the stores:"
|
"For demonstration purposes, we extract sales time-series for just a few of the stores:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"use_stores = [2, 5, 8]\n",
|
"use_stores = [2, 5, 8]\n",
|
||||||
"data_subset = data[data.Store.isin(use_stores)]\n",
|
"data_subset = data[data.Store.isin(use_stores)]\n",
|
||||||
"nseries = data_subset.groupby(grain_column_names).ngroups\n",
|
"nseries = data_subset.groupby(grain_column_names).ngroups\n",
|
||||||
"print('Data subset contains {0} individual time-series.'.format(nseries))"
|
"print('Data subset contains {0} individual time-series.'.format(nseries))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Data Splitting\n",
|
"### Data Splitting\n",
|
||||||
"We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the grain columns."
|
"We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the grain columns."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"n_test_periods = 20\n",
|
"n_test_periods = 20\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -190,10 +214,10 @@
|
|||||||
" return df_head, df_tail\n",
|
" return df_head, df_tail\n",
|
||||||
"\n",
|
"\n",
|
||||||
"X_train, X_test = split_last_n_by_grain(data_subset, n_test_periods)"
|
"X_train, X_test = split_last_n_by_grain(data_subset, n_test_periods)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Modeling\n",
|
"## Modeling\n",
|
||||||
@@ -208,20 +232,20 @@
|
|||||||
"AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n",
|
"AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
|
"You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"target_column_name = 'Quantity'\n",
|
"target_column_name = 'Quantity'\n",
|
||||||
"y_train = X_train.pop(target_column_name).values"
|
"y_train = X_train.pop(target_column_name).values"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -251,13 +275,13 @@
|
|||||||
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
|
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
|
||||||
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
|
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
|
||||||
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|"
|
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"time_series_settings = {\n",
|
"time_series_settings = {\n",
|
||||||
" 'time_column_name': time_column_name,\n",
|
" 'time_column_name': time_column_name,\n",
|
||||||
@@ -277,84 +301,84 @@
|
|||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" verbosity=logging.INFO,\n",
|
" verbosity=logging.INFO,\n",
|
||||||
" **time_series_settings)"
|
" **time_series_settings)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can now submit a new training run. For local runs, the execution is synchronous. Depending on the data and number of iterations this operation may take several minutes.\n",
|
"You can now submit a new training run. For local runs, the execution is synchronous. Depending on the data and number of iterations this operation may take several minutes.\n",
|
||||||
"Information from each iteration will be printed to the console."
|
"Information from each iteration will be printed to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"Each run within an Experiment stores serialized (i.e. pickled) pipelines from the AutoML iterations. We can now retrieve the pipeline with the best performance on the validation dataset:"
|
"Each run within an Experiment stores serialized (i.e. pickled) pipelines from the AutoML iterations. We can now retrieve the pipeline with the best performance on the validation dataset:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_pipeline = local_run.get_output()\n",
|
"best_run, fitted_pipeline = local_run.get_output()\n",
|
||||||
"fitted_pipeline.steps"
|
"fitted_pipeline.steps"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Forecasting\n",
|
"# Forecasting\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:"
|
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_test = X_test.pop(target_column_name).values"
|
"y_test = X_test.pop(target_column_name).values"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_test.head()"
|
"X_test.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data. \n",
|
"To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"We will first create a query `y_query`, which is aligned index-for-index to `X_test`. This is a vector of target values where each `NaN` serves the function of the question mark to be replaced by forecast. Passing definite values in the `y` argument allows the `forecast` function to make predictions on data that does not immediately follow the train data which contains `y`. In each grain, the last time point where the model sees a definite value of `y` is that grain's _forecast origin_."
|
"We will first create a query `y_query`, which is aligned index-for-index to `X_test`. This is a vector of target values where each `NaN` serves the function of the question mark to be replaced by forecast. Passing definite values in the `y` argument allows the `forecast` function to make predictions on data that does not immediately follow the train data which contains `y`. In each grain, the last time point where the model sees a definite value of `y` is that grain's _forecast origin_."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Replace ALL values in y_pred by NaN.\n",
|
"# Replace ALL values in y_pred by NaN.\n",
|
||||||
"# The forecast origin will be at the beginning of the first forecast period.\n",
|
"# The forecast origin will be at the beginning of the first forecast period.\n",
|
||||||
@@ -365,19 +389,19 @@
|
|||||||
"# This contains the assumptions that were made in the forecast\n",
|
"# This contains the assumptions that were made in the forecast\n",
|
||||||
"# and helps align the forecast to the original data\n",
|
"# and helps align the forecast to the original data\n",
|
||||||
"y_pred, X_trans = fitted_pipeline.forecast(X_test, y_query)"
|
"y_pred, X_trans = fitted_pipeline.forecast(X_test, y_query)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"If you are used to scikit pipelines, perhaps you expected `predict(X_test)`. However, forecasting requires a more general interface that also supplies the past target `y` values. Please use `forecast(X,y)` as `predict(X)` is reserved for internal purposes on forecasting models.\n",
|
"If you are used to scikit pipelines, perhaps you expected `predict(X_test)`. However, forecasting requires a more general interface that also supplies the past target `y` values. Please use `forecast(X,y)` as `predict(X)` is reserved for internal purposes on forecasting models.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The [energy demand forecasting notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) demonstrates the use of the forecast function in more detail in the context of using lags and rolling window features. "
|
"The [energy demand forecasting notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) demonstrates the use of the forecast function in more detail in the context of using lags and rolling window features. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Evaluate\n",
|
"# Evaluate\n",
|
||||||
@@ -385,13 +409,13 @@
|
|||||||
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). \n",
|
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). \n",
|
||||||
"\n",
|
"\n",
|
||||||
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
|
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
"def align_outputs(y_predicted, X_trans, X_test, y_test, predicted_column_name = 'predicted'):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
@@ -426,13 +450,13 @@
|
|||||||
" return(clean)\n",
|
" return(clean)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"df_all = align_outputs(y_pred, X_trans, X_test, y_test)"
|
"df_all = align_outputs(y_pred, X_trans, X_test, y_test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"def MAPE(actual, pred):\n",
|
"def MAPE(actual, pred):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
@@ -445,13 +469,13 @@
|
|||||||
" pred_safe = pred[not_na & not_zero]\n",
|
" pred_safe = pred[not_na & not_zero]\n",
|
||||||
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
" APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n",
|
||||||
" return np.mean(APE)"
|
" return np.mean(APE)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(\"Simple forecasting model\")\n",
|
"print(\"Simple forecasting model\")\n",
|
||||||
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
"rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all['predicted']))\n",
|
||||||
@@ -468,49 +492,49 @@
|
|||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Operationalize"
|
"# Operationalize"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"_Operationalization_ means getting the model into the cloud so that other can run it after you close the notebook. We will create a docker running on Azure Container Instances with the model."
|
"_Operationalization_ means getting the model into the cloud so that other can run it after you close the notebook. We will create a docker running on Azure Container Instances with the model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML OJ forecaster'\n",
|
"description = 'AutoML OJ forecaster'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = local_run.register_model(description = description, tags = tags)\n",
|
"model = local_run.register_model(description = description, tags = tags)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(local_run.model_id)"
|
"print(local_run.model_id)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Develop the scoring script\n",
|
"### Develop the scoring script\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Serializing and deserializing complex data frames may be tricky. We first develop the `run()` function of the scoring script locally, then write it into a scoring script. It is much easier to debug any quirks of the scoring function without crossing two compute environments. For this exercise, we handle a common quirk of how pandas dataframes serialize time stamp values."
|
"Serializing and deserializing complex data frames may be tricky. We first develop the `run()` function of the scoring script locally, then write it into a scoring script. It is much easier to debug any quirks of the scoring function without crossing two compute environments. For this exercise, we handle a common quirk of how pandas dataframes serialize time stamp values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# this is where we test the run function of the scoring script interactively\n",
|
"# this is where we test the run function of the scoring script interactively\n",
|
||||||
"# before putting it in the scoring script\n",
|
"# before putting it in the scoring script\n",
|
||||||
@@ -549,13 +573,13 @@
|
|||||||
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
||||||
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
||||||
" })"
|
" })"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# test the run function here before putting in the scoring script\n",
|
"# test the run function here before putting in the scoring script\n",
|
||||||
"import json\n",
|
"import json\n",
|
||||||
@@ -569,20 +593,20 @@
|
|||||||
"y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
|
"y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
|
||||||
"y_fcst_all['forecast'] = res_dict['forecast']\n",
|
"y_fcst_all['forecast'] = res_dict['forecast']\n",
|
||||||
"y_fcst_all.head()"
|
"y_fcst_all.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now that the function works locally in the notebook, let's write it down into the scoring script. The scoring script is authored by the data scientist. Adjust it to taste, adding inputs, outputs and processing as needed."
|
"Now that the function works locally in the notebook, let's write it down into the scoring script. The scoring script is authored by the data scientist. Adjust it to taste, adding inputs, outputs and processing as needed."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score_fcast.py\n",
|
"%%writefile score_fcast.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -635,13 +659,13 @@
|
|||||||
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
" return json.dumps({\"forecast\": forecast_as_list, # return the minimum over the wire: \n",
|
||||||
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
" \"index\": index_as_df.to_json() # no forecast and its featurized values\n",
|
||||||
" })"
|
" })"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# get the model\n",
|
"# get the model\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun\n",
|
"from azureml.train.automl.run import AutoMLRun\n",
|
||||||
@@ -649,13 +673,13 @@
|
|||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)\n",
|
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)\n",
|
||||||
"best_iteration = int(str.split(best_run.id,'_')[-1]) # the iteration number is a postfix of the run ID."
|
"best_iteration = int(str.split(best_run.id,'_')[-1]) # the iteration number is a postfix of the run ID."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# get the best model's dependencies and write them into this file\n",
|
"# get the best model's dependencies and write them into this file\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -669,13 +693,13 @@
|
|||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# this is the script file name we wrote a few cells above\n",
|
"# this is the script file name we wrote a few cells above\n",
|
||||||
"script_file_name = 'score_fcast.py'\n",
|
"script_file_name = 'score_fcast.py'\n",
|
||||||
@@ -697,20 +721,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a Container Image"
|
"### Create a Container Image"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -730,20 +754,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if image.creation_state == 'Failed':\n",
|
"if image.creation_state == 'Failed':\n",
|
||||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy the Image as a Web Service on Azure Container Instance"
|
"### Deploy the Image as a Web Service on Azure Container Instance"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -751,13 +775,13 @@
|
|||||||
" memory_gb = 2, \n",
|
" memory_gb = 2, \n",
|
||||||
" tags = {'type': \"automl-forecasting\"},\n",
|
" tags = {'type': \"automl-forecasting\"},\n",
|
||||||
" description = \"Automl forecasting sample service\")"
|
" description = \"Automl forecasting sample service\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -770,20 +794,20 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Call the service"
|
"### Call the service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# we send the data to the service serialized into a json string\n",
|
"# we send the data to the service serialized into a json string\n",
|
||||||
"test_sample = json.dumps({'X':X_test.to_json(), 'y' : y_query.tolist()})\n",
|
"test_sample = json.dumps({'X':X_test.to_json(), 'y' : y_query.tolist()})\n",
|
||||||
@@ -797,59 +821,35 @@
|
|||||||
" y_fcst_all['forecast'] = res_dict['forecast'] \n",
|
" y_fcst_all['forecast'] = res_dict['forecast'] \n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" print(res_dict)"
|
" print(res_dict)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_fcst_all.head()"
|
"y_fcst_all.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete the web service if desired"
|
"### Delete the web service if desired"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"serv = Webservice(ws, 'automl-forecast-01')\n",
|
"serv = Webservice(ws, 'automl-forecast-01')\n",
|
||||||
"# serv.delete() # don't do it accidentally"
|
"# serv.delete() # don't do it accidentally"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "erwright, tosingli"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.8"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -53,22 +77,22 @@
|
|||||||
"- **Blacklisting** certain pipelines\n",
|
"- **Blacklisting** certain pipelines\n",
|
||||||
"- Specifying **target metrics** to indicate stopping criteria\n",
|
"- Specifying **target metrics** to indicate stopping criteria\n",
|
||||||
"- Handling **missing data** in the input"
|
"- Handling **missing data** in the input"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -81,13 +105,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -108,20 +132,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_train = digits.data[10:,:]\n",
|
"X_train = digits.data[10:,:]\n",
|
||||||
@@ -135,21 +159,21 @@
|
|||||||
"rng.shuffle(missing_samples)\n",
|
"rng.shuffle(missing_samples)\n",
|
||||||
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
||||||
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"df = pd.DataFrame(data = X_train)\n",
|
"df = pd.DataFrame(data = X_train)\n",
|
||||||
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
||||||
"df.head()"
|
"df.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -168,13 +192,13 @@
|
|||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -188,43 +212,43 @@
|
|||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -232,32 +256,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -268,81 +292,81 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model which has the smallest `accuracy` value:"
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# lookup_metric = \"accuracy\"\n",
|
"# lookup_metric = \"accuracy\"\n",
|
||||||
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# iteration = 3\n",
|
"# iteration = 3\n",
|
||||||
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View the engineered names for featurized data\n",
|
"#### View the engineered names for featurized data\n",
|
||||||
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
|
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
|
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View the featurization summary\n",
|
"#### View the featurization summary\n",
|
||||||
@@ -352,29 +376,29 @@
|
|||||||
"- Type detected\n",
|
"- Type detected\n",
|
||||||
"- If feature was dropped\n",
|
"- If feature was dropped\n",
|
||||||
"- List of feature transformations for the raw feature"
|
"- List of feature transformations for the raw feature"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test"
|
"## Test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
@@ -392,33 +416,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()\n"
|
" plt.show()\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "xif"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,10 +53,10 @@
|
|||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)"
|
"1. [Results](#Results)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -46,22 +70,22 @@
|
|||||||
"3. Training the Model using local compute and explain the model\n",
|
"3. Training the Model using local compute and explain the model\n",
|
||||||
"4. Visualization model's feature importance in widget\n",
|
"4. Visualization model's feature importance in widget\n",
|
||||||
"5. Explore best model's explanation"
|
"5. Explore best model's explanation"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -70,13 +94,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -98,20 +122,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from sklearn import datasets\n",
|
"from sklearn import datasets\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -130,10 +154,10 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"X_train = pd.DataFrame(X_train, columns=features)\n",
|
"X_train = pd.DataFrame(X_train, columns=features)\n",
|
||||||
"X_test = pd.DataFrame(X_test, columns=features)"
|
"X_test = pd.DataFrame(X_test, columns=features)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -152,13 +176,13 @@
|
|||||||
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
|
"|**model_explainability**|Indicate to explain each trained pipeline or not |\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -172,43 +196,43 @@
|
|||||||
" y_valid = y_test,\n",
|
" y_valid = y_test,\n",
|
||||||
" model_explainability=True,\n",
|
" model_explainability=True,\n",
|
||||||
" path=project_folder)"
|
" path=project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"You will see the currently running iterations printing to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Widget for monitoring runs\n",
|
"### Widget for monitoring runs\n",
|
||||||
@@ -216,40 +240,40 @@
|
|||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Best Model 's explanation\n",
|
"### Best Model 's explanation\n",
|
||||||
@@ -264,94 +288,70 @@
|
|||||||
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case\n",
|
"6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Note:- The **retrieve_model_explanation()** API only works in case AutoML has been configured with **'model_explainability'** flag set to **True**. "
|
"Note:- The **retrieve_model_explanation()** API only works in case AutoML has been configured with **'model_explainability'** flag set to **True**. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
|
"from azureml.train.automl.automlexplainer import retrieve_model_explanation\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
||||||
" retrieve_model_explanation(best_run)"
|
" retrieve_model_explanation(best_run)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(overall_summary)\n",
|
"print(overall_summary)\n",
|
||||||
"print(overall_imp)"
|
"print(overall_imp)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(per_class_summary)\n",
|
"print(per_class_summary)\n",
|
||||||
"print(per_class_imp)"
|
"print(per_class_imp)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
|
"Beside retrieve the existed model explanation information, explain the model with different train/test data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl.automlexplainer import explain_model\n",
|
"from azureml.train.automl.automlexplainer import explain_model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
|
||||||
" explain_model(fitted_model, X_train, X_test, features=features)"
|
" explain_model(fitted_model, X_train, X_test, features=features)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(overall_summary)\n",
|
"print(overall_summary)\n",
|
||||||
"print(overall_imp)"
|
"print(overall_imp)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "xif"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.7.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -31,10 +55,10 @@
|
|||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"1. [Acknowledgements](#Acknowledgements)"
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -48,21 +72,21 @@
|
|||||||
"3. Train the model using local compute.\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -78,13 +102,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -105,10 +129,10 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create or Attach existing AmlCompute\n",
|
"## Create or Attach existing AmlCompute\n",
|
||||||
@@ -116,13 +140,13 @@
|
|||||||
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
@@ -152,35 +176,35 @@
|
|||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Data\n",
|
"# Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"if not os.path.isdir('data'):\n",
|
"if not os.path.isdir('data'):\n",
|
||||||
" os.mkdir('data')\n",
|
" os.mkdir('data')\n",
|
||||||
" \n",
|
" \n",
|
||||||
"if not os.path.exists(project_folder):\n",
|
"if not os.path.exists(project_folder):\n",
|
||||||
" os.makedirs(project_folder)"
|
" os.makedirs(project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -196,22 +220,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Load Data\n",
|
"### Load Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here create the script to be run in azure compute for loading the data, load the concrete strength dataset into the X and y variables. Next, split the data using train_test_split and return X_train and y_train for training the model. Finally, return X_train and y_train for training the model."
|
"Here create the script to be run in azure compute for loading the data, load the concrete strength dataset into the X and y variables. Next, split the data using train_test_split and return X_train and y_train for training the model. Finally, return X_train and y_train for training the model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n",
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n",
|
||||||
"dflow = dprep.auto_read_file(data)\n",
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
@@ -221,10 +245,10 @@
|
|||||||
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
||||||
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
|
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
|
||||||
"dflow.head()"
|
"dflow.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -243,20 +267,20 @@
|
|||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\": 5,\n",
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
@@ -276,60 +300,60 @@
|
|||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" **automl_settings\n",
|
" **automl_settings\n",
|
||||||
" )"
|
" )"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = experiment.submit(automl_config, show_output = True)"
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results\n",
|
"## Results\n",
|
||||||
"Widget for Monitoring Runs\n",
|
"Widget for Monitoring Runs\n",
|
||||||
"The widget will first report a \u00e2\u20ac\u0153loading status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \u00e2\u20ac\u0153loading status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"Note: The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"Note: The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(remote_run).show() "
|
"RunDetails(remote_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"Retrieve All Child Runs\n",
|
"Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(remote_run.get_children())\n",
|
"children = list(remote_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -340,93 +364,93 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Retrieve the Best Model\n",
|
"## Retrieve the Best Model\n",
|
||||||
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Best Model Based on Any Other Metric\n",
|
"Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest root_mean_squared_error value (which turned out to be the same as the one with largest spearman_correlation value):"
|
"Show the run and the model that has the smallest root_mean_squared_error value (which turned out to be the same as the one with largest spearman_correlation value):"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"root_mean_squared_error\"\n",
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
||||||
"print(third_run)\n",
|
"print(third_run)\n",
|
||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register the Fitted Model for Deployment\n",
|
"## Register the Fitted Model for Deployment\n",
|
||||||
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = remote_run.register_model(description = description, tags = tags)\n",
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Scoring Script\n",
|
"### Create Scoring Script\n",
|
||||||
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -451,46 +475,46 @@
|
|||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return json.dumps({\"error\": result})\n",
|
" return json.dumps({\"error\": result})\n",
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a YAML File for the Environment"
|
"### Create a YAML File for the Environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -498,13 +522,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"conda_env_file_name = 'myenv.yml'\n",
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Substitute the actual version number in the environment file.\n",
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
@@ -525,23 +549,23 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a Container Image\n",
|
"### Create a Container Image\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
"or when testing a model that is under development."
|
"or when testing a model that is under development."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -561,22 +585,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if image.creation_state == 'Failed':\n",
|
"if image.creation_state == 'Failed':\n",
|
||||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deploy an image that contains the model and other assets needed by the service."
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -584,13 +608,13 @@
|
|||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
||||||
" description = 'sample service for Automl Regression')"
|
" description = 'sample service for Automl Regression')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -602,58 +626,58 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete a Web Service\n",
|
"### Delete a Web Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deletes the specified web service."
|
"Deletes the specified web service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"#aci_service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get Logs from a Deployed Web Service\n",
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Gets logs from a deployed web service."
|
"Gets logs from a deployed web service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.get_logs()"
|
"#aci_service.get_logs()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Test\n",
|
"### Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_test = X_test.to_pandas_dataframe()\n",
|
"X_test = X_test.to_pandas_dataframe()\n",
|
||||||
"y_test = y_test.to_pandas_dataframe()\n",
|
"y_test = y_test.to_pandas_dataframe()\n",
|
||||||
@@ -663,20 +687,20 @@
|
|||||||
"y_train = y_train.to_pandas_dataframe()\n",
|
"y_train = y_train.to_pandas_dataframe()\n",
|
||||||
"y_train = np.array(y_train)\n",
|
"y_train = np.array(y_train)\n",
|
||||||
"y_train = y_train[:,0]"
|
"y_train = y_train[:,0]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Predict on training and test set, and calculate residual values."
|
"##### Predict on training and test set, and calculate residual values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_pred_train = fitted_model.predict(X_train)\n",
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
"y_residual_train = y_train - y_pred_train\n",
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
@@ -685,13 +709,13 @@
|
|||||||
"y_residual_test = y_test - y_pred_test\n",
|
"y_residual_test = y_test - y_pred_test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"y_residual_train.shape"
|
"y_residual_train.shape"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib inline\n",
|
"%matplotlib inline\n",
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
@@ -729,23 +753,23 @@
|
|||||||
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), alpha = 0.2, bins = 10)\n",
|
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), alpha = 0.2, bins = 10)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Calculate metrics for the prediction\n",
|
"### Calculate metrics for the prediction\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
"from the trained model that was returned."
|
"from the trained model that was returned."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Plot outputs\n",
|
"# Plot outputs\n",
|
||||||
"%matplotlib notebook\n",
|
"%matplotlib notebook\n",
|
||||||
@@ -753,10 +777,10 @@
|
|||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Acknowledgements\n",
|
"## Acknowledgements\n",
|
||||||
@@ -768,33 +792,9 @@
|
|||||||
"I-Cheng Yeh, \"Modeling of strength of high performance concrete using artificial neural networks,\" Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). \n",
|
"I-Cheng Yeh, \"Modeling of strength of high performance concrete using artificial neural networks,\" Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science."
|
"Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "v-rasav"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.7.1"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.7.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -31,10 +55,10 @@
|
|||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n",
|
"1. [Test](#Test)\n",
|
||||||
"1. [Acknowledgements](#Acknowledgements)"
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -48,21 +72,21 @@
|
|||||||
"3. Train the model using local compute.\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -78,13 +102,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -105,10 +129,10 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create or Attach existing AmlCompute\n",
|
"## Create or Attach existing AmlCompute\n",
|
||||||
@@ -116,13 +140,13 @@
|
|||||||
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
@@ -152,35 +176,35 @@
|
|||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Data\n",
|
"# Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"if not os.path.isdir('data'):\n",
|
"if not os.path.isdir('data'):\n",
|
||||||
" os.mkdir('data')\n",
|
" os.mkdir('data')\n",
|
||||||
" \n",
|
" \n",
|
||||||
"if not os.path.exists(project_folder):\n",
|
"if not os.path.exists(project_folder):\n",
|
||||||
" os.makedirs(project_folder)"
|
" os.makedirs(project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -196,22 +220,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Load Data\n",
|
"### Load Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here create the script to be run in azure compute for loading the data, load the hardware dataset into the X and y variables. Next split the data using train_test_split and return X_train and y_train for training the model."
|
"Here create the script to be run in azure compute for loading the data, load the hardware dataset into the X and y variables. Next split the data using train_test_split and return X_train and y_train for training the model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
|
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
|
||||||
"dflow = dprep.auto_read_file(data)\n",
|
"dflow = dprep.auto_read_file(data)\n",
|
||||||
@@ -221,10 +245,10 @@
|
|||||||
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
|
||||||
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
|
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
|
||||||
"dflow.head()"
|
"dflow.head()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
@@ -244,20 +268,20 @@
|
|||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\": 5,\n",
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
@@ -277,35 +301,35 @@
|
|||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" **automl_settings\n",
|
" **automl_settings\n",
|
||||||
" )"
|
" )"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = experiment.submit(automl_config, show_output = False)"
|
"remote_run = experiment.submit(automl_config, show_output = False)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -313,41 +337,41 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(remote_run).show() "
|
"RunDetails(remote_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Wait until the run finishes.\n",
|
"# Wait until the run finishes.\n",
|
||||||
"remote_run.wait_for_completion(show_output = True)"
|
"remote_run.wait_for_completion(show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Retrieve All Child Runs\n",
|
"## Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(remote_run.get_children())\n",
|
"children = list(remote_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -358,93 +382,93 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Retrieve the Best Model\n",
|
"## Retrieve the Best Model\n",
|
||||||
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"root_mean_squared_error\"\n",
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
||||||
"print(third_run)\n",
|
"print(third_run)\n",
|
||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register the Fitted Model for Deployment\n",
|
"## Register the Fitted Model for Deployment\n",
|
||||||
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = remote_run.register_model(description = description, tags = tags)\n",
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Scoring Script\n",
|
"### Create Scoring Script\n",
|
||||||
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -469,58 +493,58 @@
|
|||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return json.dumps({\"error\": result})\n",
|
" return json.dumps({\"error\": result})\n",
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a YAML File for the Environment"
|
"### Create a YAML File for the Environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
" print('{}\\t{}'.format(p, dependencies[p]))"
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"conda_env_file_name = 'myenv.yml'\n",
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Substitute the actual version number in the environment file.\n",
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
@@ -541,23 +565,23 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(script_file_name, 'w') as cefw:\n",
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a Container Image\n",
|
"### Create a Container Image\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
"or when testing a model that is under development."
|
"or when testing a model that is under development."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image, ContainerImage\n",
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -577,22 +601,22 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"if image.creation_state == 'Failed':\n",
|
"if image.creation_state == 'Failed':\n",
|
||||||
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deploy an image that contains the model and other assets needed by the service."
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -600,13 +624,13 @@
|
|||||||
" memory_gb = 1, \n",
|
" memory_gb = 1, \n",
|
||||||
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
||||||
" description = 'sample service for Automl Regression')"
|
" description = 'sample service for Automl Regression')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -618,58 +642,58 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"aci_service.wait_for_deployment(True)\n",
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
"print(aci_service.state)"
|
"print(aci_service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete a Web Service\n",
|
"### Delete a Web Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Deletes the specified web service."
|
"Deletes the specified web service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.delete()"
|
"#aci_service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get Logs from a Deployed Web Service\n",
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Gets logs from a deployed web service."
|
"Gets logs from a deployed web service."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#aci_service.get_logs()"
|
"#aci_service.get_logs()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_test = X_test.to_pandas_dataframe()\n",
|
"X_test = X_test.to_pandas_dataframe()\n",
|
||||||
"y_test = y_test.to_pandas_dataframe()\n",
|
"y_test = y_test.to_pandas_dataframe()\n",
|
||||||
@@ -679,43 +703,43 @@
|
|||||||
"y_train = y_train.to_pandas_dataframe()\n",
|
"y_train = y_train.to_pandas_dataframe()\n",
|
||||||
"y_train = np.array(y_train)\n",
|
"y_train = np.array(y_train)\n",
|
||||||
"y_train = y_train[:,0]"
|
"y_train = y_train[:,0]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Predict on training and test set, and calculate residual values."
|
"##### Predict on training and test set, and calculate residual values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_pred_train = fitted_model.predict(X_train)\n",
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
"y_residual_train = y_train - y_pred_train\n",
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"y_pred_test = fitted_model.predict(X_test)\n",
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
"y_residual_test = y_test - y_pred_test"
|
"y_residual_test = y_test - y_pred_test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Calculate metrics for the prediction\n",
|
"### Calculate metrics for the prediction\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
"from the trained model that was returned."
|
"from the trained model that was returned."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib inline\n",
|
"%matplotlib inline\n",
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
@@ -745,56 +769,32 @@
|
|||||||
"a1.set_yticklabels([])\n",
|
"a1.set_yticklabels([])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib notebook\n",
|
"%matplotlib notebook\n",
|
||||||
"test_pred = plt.scatter(y_test, y_pred_test, color='')\n",
|
"test_pred = plt.scatter(y_test, y_pred_test, color='')\n",
|
||||||
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Acknowledgements\n",
|
"## Acknowledgements\n",
|
||||||
"This Predicting Hardware Performance Dataset is made available under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/faizunnabi/comp-hardware-performance and https://archive.ics.uci.edu/ml/datasets/Computer+Hardware\n",
|
"This Predicting Hardware Performance Dataset is made available under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/faizunnabi/comp-hardware-performance and https://archive.ics.uci.edu/ml/datasets/Computer+Hardware\n",
|
||||||
"\n",
|
"\n",
|
||||||
"_**Citation Found Here**_\n"
|
"_**Citation Found Here**_\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "v-rasav"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.7.1"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -47,22 +71,22 @@
|
|||||||
"3. Train the model using local compute.\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Explore the results.\n",
|
"4. Explore the results.\n",
|
||||||
"5. Test the best fitted model."
|
"5. Test the best fitted model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -74,13 +98,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -101,21 +125,21 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
||||||
"from sklearn.datasets import load_diabetes\n",
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
@@ -126,10 +150,10 @@
|
|||||||
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
|
||||||
"\n",
|
"\n",
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -146,13 +170,13 @@
|
|||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'regression',\n",
|
"automl_config = AutoMLConfig(task = 'regression',\n",
|
||||||
" iteration_timeout_minutes = 10,\n",
|
" iteration_timeout_minutes = 10,\n",
|
||||||
@@ -164,43 +188,43 @@
|
|||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -208,32 +232,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -244,100 +268,100 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"root_mean_squared_error\"\n",
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
"third_run, third_model = local_run.get_output(iteration = iteration)\n",
|
||||||
"print(third_run)\n",
|
"print(third_run)\n",
|
||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test"
|
"## Test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Predict on training and test set, and calculate residual values."
|
"Predict on training and test set, and calculate residual values."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_pred_train = fitted_model.predict(X_train)\n",
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
"y_residual_train = y_train - y_pred_train\n",
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"y_pred_test = fitted_model.predict(X_test)\n",
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
"y_residual_test = y_test - y_pred_test"
|
"y_residual_test = y_test - y_pred_test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib inline\n",
|
"%matplotlib inline\n",
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
@@ -375,33 +399,9 @@
|
|||||||
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)"
|
"1. [Test](#Test)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -55,22 +79,22 @@
|
|||||||
"- **Cancellation** of individual iterations or the entire run\n",
|
"- **Cancellation** of individual iterations or the entire run\n",
|
||||||
"- Retrieving models for any iteration or logged metric\n",
|
"- Retrieving models for any iteration or logged metric\n",
|
||||||
"- Specifying AutoML settings as `**kwargs`"
|
"- Specifying AutoML settings as `**kwargs`"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
@@ -85,13 +109,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -112,10 +136,10 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create or Attach existing AmlCompute\n",
|
"### Create or Attach existing AmlCompute\n",
|
||||||
@@ -124,13 +148,13 @@
|
|||||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import AmlCompute\n",
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
@@ -160,23 +184,23 @@
|
|||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"For remote executions, you need to make the data accessible from the remote compute.\n",
|
"For remote executions, you need to make the data accessible from the remote compute.\n",
|
||||||
"This can be done by uploading the data to DataStore.\n",
|
"This can be done by uploading the data to DataStore.\n",
|
||||||
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
|
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"data_train = datasets.load_digits()\n",
|
"data_train = datasets.load_digits()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -198,13 +222,13 @@
|
|||||||
" path_on_compute='/tmp/azureml_runs',\n",
|
" path_on_compute='/tmp/azureml_runs',\n",
|
||||||
" mode='download', # download files from datastore to compute target\n",
|
" mode='download', # download files from datastore to compute target\n",
|
||||||
" overwrite=False)"
|
" overwrite=False)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.runconfig import RunConfiguration\n",
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
@@ -222,13 +246,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile $project_folder/get_data.py\n",
|
"%%writefile $project_folder/get_data.py\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -239,10 +263,10 @@
|
|||||||
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
||||||
"\n",
|
"\n",
|
||||||
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
|
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -258,13 +282,13 @@
|
|||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|"
|
"|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_settings = {\n",
|
"automl_settings = {\n",
|
||||||
" \"iteration_timeout_minutes\": 10,\n",
|
" \"iteration_timeout_minutes\": 10,\n",
|
||||||
@@ -283,53 +307,53 @@
|
|||||||
" data_script = project_folder + \"/get_data.py\",\n",
|
" data_script = project_folder + \"/get_data.py\",\n",
|
||||||
" **automl_settings\n",
|
" **automl_settings\n",
|
||||||
" )\n"
|
" )\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
|
||||||
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
|
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = experiment.submit(automl_config, show_output = False)"
|
"remote_run = experiment.submit(automl_config, show_output = False)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results\n",
|
"## Results\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Loading executed runs\n",
|
"#### Loading executed runs\n",
|
||||||
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
|
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "raw",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run = AutoMLRun(experiment = experiment, run_id = 'AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')"
|
"remote_run = AutoMLRun(experiment = experiment, run_id = 'AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "raw"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -339,51 +363,51 @@
|
|||||||
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
|
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"remote_run"
|
"remote_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(remote_run).show() "
|
"RunDetails(remote_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Wait until the run finishes.\n",
|
"# Wait until the run finishes.\n",
|
||||||
"remote_run.wait_for_completion(show_output = True)"
|
"remote_run.wait_for_completion(show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(remote_run.get_children())\n",
|
"children = list(remote_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -394,123 +418,123 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Cancelling Runs\n",
|
"### Cancelling Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
||||||
"# remote_run.cancel()\n",
|
"# remote_run.cancel()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Cancel iteration 1 and move onto iteration 2.\n",
|
"# Cancel iteration 1 and move onto iteration 2.\n",
|
||||||
"# remote_run.cancel_iteration(1)"
|
"# remote_run.cancel_iteration(1)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = remote_run.get_output()\n",
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model which has the smallest `log_loss` value:"
|
"Show the run and the model which has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 3\n",
|
"iteration = 3\n",
|
||||||
"third_run, third_model = remote_run.get_output(iteration=iteration)\n",
|
"third_run, third_model = remote_run.get_output(iteration=iteration)\n",
|
||||||
"print(third_run)\n",
|
"print(third_run)\n",
|
||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_test = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model"
|
"#### Testing Our Best Fitted Model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
@@ -523,33 +547,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -28,10 +52,10 @@
|
|||||||
"1. [Setup](#Setup)\n",
|
"1. [Setup](#Setup)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -40,22 +64,22 @@
|
|||||||
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results."
|
"In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -68,13 +92,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -98,22 +122,22 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without."
|
"Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_train = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
@@ -145,63 +169,63 @@
|
|||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" sample_weight = sample_weight,\n",
|
" sample_weight = sample_weight,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_classifier, show_output = True)\n",
|
"local_run = experiment.submit(automl_classifier, show_output = True)\n",
|
||||||
"sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output = True)\n",
|
"sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output = True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()"
|
"best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test\n",
|
"## Test\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_test = digits.data[:100, :]\n",
|
"X_test = digits.data[:100, :]\n",
|
||||||
"y_test = digits.target[:100]\n",
|
"y_test = digits.target[:100]\n",
|
||||||
"images = digits.images[:100]"
|
"images = digits.images[:100]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Compare the Models\n",
|
"#### Compare the Models\n",
|
||||||
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
|
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in range(0,len(y_test)):\n",
|
"for index in range(0,len(y_test)):\n",
|
||||||
@@ -215,33 +239,9 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -30,10 +54,10 @@
|
|||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"1. [Results](#Results)\n",
|
"1. [Results](#Results)\n",
|
||||||
"1. [Test](#Test)\n"
|
"1. [Test](#Test)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -50,22 +74,22 @@
|
|||||||
"In addition this notebook showcases the following features\n",
|
"In addition this notebook showcases the following features\n",
|
||||||
"- Explicit train test splits \n",
|
"- Explicit train test splits \n",
|
||||||
"- Handling **sparse data** in the input"
|
"- Handling **sparse data** in the input"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -75,13 +99,13 @@
|
|||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig"
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -103,20 +127,20 @@
|
|||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
"outputDf.T"
|
"outputDf.T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data"
|
"## Data"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from sklearn.datasets import fetch_20newsgroups\n",
|
"from sklearn.datasets import fetch_20newsgroups\n",
|
||||||
"from sklearn.feature_extraction.text import HashingVectorizer\n",
|
"from sklearn.feature_extraction.text import HashingVectorizer\n",
|
||||||
@@ -145,10 +169,10 @@
|
|||||||
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
|
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
|
||||||
"summary_df['Validation Set'] = [X_valid.shape[0], X_valid.shape[1]]\n",
|
"summary_df['Validation Set'] = [X_valid.shape[0], X_valid.shape[1]]\n",
|
||||||
"summary_df"
|
"summary_df"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -167,13 +191,13 @@
|
|||||||
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom validation set.|\n",
|
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom validation set.|\n",
|
||||||
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -187,43 +211,43 @@
|
|||||||
" X_valid = X_valid, \n",
|
" X_valid = X_valid, \n",
|
||||||
" y_valid = y_valid, \n",
|
" y_valid = y_valid, \n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run"
|
"local_run"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Results"
|
"## Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for Monitoring Runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
@@ -231,32 +255,32 @@
|
|||||||
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(local_run).show() "
|
"RunDetails(local_run).show() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -267,74 +291,74 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and the model which has the smallest `accuracy` value:"
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# lookup_metric = \"accuracy\"\n",
|
"# lookup_metric = \"accuracy\"\n",
|
||||||
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
"# best_run, fitted_model = local_run.get_output(metric = lookup_metric)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a Specific Iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and the model from the third iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# iteration = 3\n",
|
"# iteration = 3\n",
|
||||||
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test"
|
"## Test"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Load test data.\n",
|
"# Load test data.\n",
|
||||||
"from pandas_ml import ConfusionMatrix\n",
|
"from pandas_ml import ConfusionMatrix\n",
|
||||||
@@ -355,33 +379,9 @@
|
|||||||
"cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n",
|
"cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n",
|
||||||
"print(cm)\n",
|
"print(cm)\n",
|
||||||
"cm.plot()"
|
"cm.plot()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,17 +0,0 @@
|
|||||||
-- This shows using the AutoMLPredict stored procedure to predict using a forecasting model for the nyc_energy dataset.
|
|
||||||
|
|
||||||
DECLARE @Model NVARCHAR(MAX) = (SELECT TOP 1 Model FROM dbo.aml_model
|
|
||||||
WHERE ExperimentName = 'automl-sql-forecast'
|
|
||||||
ORDER BY CreatedDate DESC)
|
|
||||||
|
|
||||||
EXEC dbo.AutoMLPredict @input_query='
|
|
||||||
SELECT CAST(timeStamp AS NVARCHAR(30)) AS timeStamp,
|
|
||||||
demand,
|
|
||||||
precip,
|
|
||||||
temp
|
|
||||||
FROM nyc_energy
|
|
||||||
WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL
|
|
||||||
AND timeStamp >= ''2017-02-01''',
|
|
||||||
@label_column='demand',
|
|
||||||
@model=@model
|
|
||||||
WITH RESULT SETS ((timeStamp NVARCHAR(30), actual_demand FLOAT, precip FLOAT, temp FLOAT, predicted_demand FLOAT))
|
|
||||||
@@ -1,50 +1,66 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "sql"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "jeffshep"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"name": "sql",
|
||||||
|
"version": ""
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Train a model and use it for prediction\r\n",
|
"# Train a model and use it for prediction\r\n",
|
||||||
"\r\n",
|
"\r\n",
|
||||||
"Before running this notebook, run the auto-ml-sql-setup.ipynb notebook."
|
"Before running this notebook, run the auto-ml-sql-setup.ipynb notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Set the default database"
|
"## Set the default database"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"USE [automl]\r\n",
|
"USE [automl]\r\n",
|
||||||
"GO"
|
"GO"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Use the AutoMLTrain stored procedure to create a forecasting model for the nyc_energy dataset."
|
"## Use the AutoMLTrain stored procedure to create a forecasting model for the nyc_energy dataset."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"INSERT INTO dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)\r\n",
|
"INSERT INTO dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)\r\n",
|
||||||
"EXEC dbo.AutoMLTrain @input_query='\r\n",
|
"EXEC dbo.AutoMLTrain @input_query='\r\n",
|
||||||
@@ -64,20 +80,20 @@
|
|||||||
"@is_validate_column='is_validate_column',\r\n",
|
"@is_validate_column='is_validate_column',\r\n",
|
||||||
"@experiment_name='automl-sql-forecast',\r\n",
|
"@experiment_name='automl-sql-forecast',\r\n",
|
||||||
"@primary_metric='normalized_root_mean_squared_error'"
|
"@primary_metric='normalized_root_mean_squared_error'"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Use the AutoMLPredict stored procedure to predict using the forecasting model for the nyc_energy dataset."
|
"## Use the AutoMLPredict stored procedure to predict using the forecasting model for the nyc_energy dataset."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"DECLARE @Model NVARCHAR(MAX) = (SELECT TOP 1 Model FROM dbo.aml_model\r\n",
|
"DECLARE @Model NVARCHAR(MAX) = (SELECT TOP 1 Model FROM dbo.aml_model\r\n",
|
||||||
" WHERE ExperimentName = 'automl-sql-forecast'\r\n",
|
" WHERE ExperimentName = 'automl-sql-forecast'\r\n",
|
||||||
@@ -94,20 +110,20 @@
|
|||||||
"@label_column='demand',\r\n",
|
"@label_column='demand',\r\n",
|
||||||
"@model=@model\r\n",
|
"@model=@model\r\n",
|
||||||
"WITH RESULT SETS ((timeStamp NVARCHAR(30), actual_demand FLOAT, precip FLOAT, temp FLOAT, predicted_demand FLOAT))"
|
"WITH RESULT SETS ((timeStamp NVARCHAR(30), actual_demand FLOAT, precip FLOAT, temp FLOAT, predicted_demand FLOAT))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## List all the metrics for all iterations for the most recent training run."
|
"## List all the metrics for all iterations for the most recent training run."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"DECLARE @RunId NVARCHAR(43)\r\n",
|
"DECLARE @RunId NVARCHAR(43)\r\n",
|
||||||
"DECLARE @ExperimentName NVARCHAR(255)\r\n",
|
"DECLARE @ExperimentName NVARCHAR(255)\r\n",
|
||||||
@@ -117,25 +133,9 @@
|
|||||||
"ORDER BY CreatedDate DESC\r\n",
|
"ORDER BY CreatedDate DESC\r\n",
|
||||||
"\r\n",
|
"\r\n",
|
||||||
"EXEC dbo.AutoMLGetMetrics @RunId, @ExperimentName"
|
"EXEC dbo.AutoMLGetMetrics @RunId, @ExperimentName"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "jeffshep"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "sql",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"name": "sql",
|
|
||||||
"version": ""
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,7 +1,23 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "sql"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "jeffshep"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"name": "sql",
|
||||||
|
"version": ""
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Set up Azure ML Automated Machine Learning on SQL Server 2019 CTP 2.4 big data cluster\r\n",
|
"# Set up Azure ML Automated Machine Learning on SQL Server 2019 CTP 2.4 big data cluster\r\n",
|
||||||
@@ -57,43 +73,43 @@
|
|||||||
"./python -m pip install --upgrade numpy \r\n",
|
"./python -m pip install --upgrade numpy \r\n",
|
||||||
"\r\n",
|
"\r\n",
|
||||||
"./python -m pip install --upgrade sklearn\r\n"
|
"./python -m pip install --upgrade sklearn\r\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"-- Enable external scripts to allow invoking Python\r\n",
|
"-- Enable external scripts to allow invoking Python\r\n",
|
||||||
"sp_configure 'external scripts enabled',1 \r\n",
|
"sp_configure 'external scripts enabled',1 \r\n",
|
||||||
"reconfigure with override \r\n",
|
"reconfigure with override \r\n",
|
||||||
"GO\r\n"
|
"GO\r\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"-- Use database 'automl'\r\n",
|
"-- Use database 'automl'\r\n",
|
||||||
"USE [automl]\r\n",
|
"USE [automl]\r\n",
|
||||||
"GO"
|
"GO"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"-- This is a table to hold the Azure ML connection information.\r\n",
|
"-- This is a table to hold the Azure ML connection information.\r\n",
|
||||||
"SET ANSI_NULLS ON\r\n",
|
"SET ANSI_NULLS ON\r\n",
|
||||||
@@ -111,20 +127,20 @@
|
|||||||
"\t[ConfigFile] [nvarchar](255) NULL\r\n",
|
"\t[ConfigFile] [nvarchar](255) NULL\r\n",
|
||||||
") ON [PRIMARY]\r\n",
|
") ON [PRIMARY]\r\n",
|
||||||
"GO"
|
"GO"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Copy the values from create-for-rbac above into the cell below"
|
"# Copy the values from create-for-rbac above into the cell below"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"-- Use the following values:\r\n",
|
"-- Use the following values:\r\n",
|
||||||
"-- Leave the name as 'Default'\r\n",
|
"-- Leave the name as 'Default'\r\n",
|
||||||
@@ -141,13 +157,13 @@
|
|||||||
" N'/tmp/aml/config.json' -- Path\r\n",
|
" N'/tmp/aml/config.json' -- Path\r\n",
|
||||||
" );\r\n",
|
" );\r\n",
|
||||||
"GO"
|
"GO"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"-- This is a table to hold the results from the AutoMLTrain procedure.\r\n",
|
"-- This is a table to hold the results from the AutoMLTrain procedure.\r\n",
|
||||||
"SET ANSI_NULLS ON\r\n",
|
"SET ANSI_NULLS ON\r\n",
|
||||||
@@ -169,13 +185,13 @@
|
|||||||
"\r\n",
|
"\r\n",
|
||||||
"ALTER TABLE [dbo].[aml_model] ADD DEFAULT (getutcdate()) FOR [CreatedDate]\r\n",
|
"ALTER TABLE [dbo].[aml_model] ADD DEFAULT (getutcdate()) FOR [CreatedDate]\r\n",
|
||||||
"GO\r\n"
|
"GO\r\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"-- This stored procedure uses automated machine learning to train several models\r\n",
|
"-- This stored procedure uses automated machine learning to train several models\r\n",
|
||||||
"-- and return the best model.\r\n",
|
"-- and return the best model.\r\n",
|
||||||
@@ -411,13 +427,13 @@
|
|||||||
"\t, @config_file = @config_file\r\n",
|
"\t, @config_file = @config_file\r\n",
|
||||||
"WITH RESULT SETS ((best_run NVARCHAR(250), experiment_name NVARCHAR(100), fitted_model VARCHAR(MAX), log_file_text NVARCHAR(MAX), workspace NVARCHAR(100)))\r\n",
|
"WITH RESULT SETS ((best_run NVARCHAR(250), experiment_name NVARCHAR(100), fitted_model VARCHAR(MAX), log_file_text NVARCHAR(MAX), workspace NVARCHAR(100)))\r\n",
|
||||||
"END"
|
"END"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"-- This procedure returns a list of metrics for each iteration of a training run.\r\n",
|
"-- This procedure returns a list of metrics for each iteration of a training run.\r\n",
|
||||||
"SET ANSI_NULLS ON\r\n",
|
"SET ANSI_NULLS ON\r\n",
|
||||||
@@ -489,13 +505,13 @@
|
|||||||
"\t, @config_file = @config_file\r\n",
|
"\t, @config_file = @config_file\r\n",
|
||||||
"WITH RESULT SETS ((iteration INT, metric_name NVARCHAR(100), metric_value FLOAT))\r\n",
|
"WITH RESULT SETS ((iteration INT, metric_name NVARCHAR(100), metric_value FLOAT))\r\n",
|
||||||
"END"
|
"END"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"-- This procedure predicts values based on a model returned by AutoMLTrain and a dataset.\r\n",
|
"-- This procedure predicts values based on a model returned by AutoMLTrain and a dataset.\r\n",
|
||||||
"-- It returns the dataset with a new column added, which is the predicted value.\r\n",
|
"-- It returns the dataset with a new column added, which is the predicted value.\r\n",
|
||||||
@@ -538,25 +554,9 @@
|
|||||||
" , @model = @model \r\n",
|
" , @model = @model \r\n",
|
||||||
"\t, @label_column = @label_column\r\n",
|
"\t, @label_column = @label_column\r\n",
|
||||||
"END"
|
"END"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "jeffshep"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "sql",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"name": "sql",
|
|
||||||
"version": ""
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "rogehe"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
@@ -29,10 +53,10 @@
|
|||||||
"1. [Data](#Data)\n",
|
"1. [Data](#Data)\n",
|
||||||
"1. [Train](#Train)\n",
|
"1. [Train](#Train)\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
@@ -40,22 +64,22 @@
|
|||||||
"In this example we will explore AutoML's subsampling feature. This is useful for training on large datasets to speed up the convergence.\n",
|
"In this example we will explore AutoML's subsampling feature. This is useful for training on large datasets to speed up the convergence.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The setup is quiet similar to a normal classification, with the exception of the `enable_subsampling` option. Keep in mind that even with the `enable_subsampling` flag set, subsampling will only be run for large datasets (>= 50k rows) and large (>= 85) or no iteration restrictions.\n"
|
"The setup is quiet similar to a normal classification, with the exception of the `enable_subsampling` option. Keep in mind that even with the `enable_subsampling` flag set, subsampling will only be run for large datasets (>= 50k rows) and large (>= 85) or no iteration restrictions.\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -67,13 +91,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -93,22 +117,22 @@
|
|||||||
"output['Experiment Name'] = experiment.name\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
"pd.DataFrame(data = output, index = ['']).T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data\n",
|
"## Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We will create a simple dataset using the numpy sin function just for this example. We need just over 50k rows."
|
"We will create a simple dataset using the numpy sin function just for this example. We need just over 50k rows."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"base = np.arange(60000)\n",
|
"base = np.arange(60000)\n",
|
||||||
"cos = np.cos(base)\n",
|
"cos = np.cos(base)\n",
|
||||||
@@ -117,10 +141,10 @@
|
|||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"X_train = np.hstack((base.reshape(-1, 1), cos.reshape(-1, 1)))\n",
|
"X_train = np.hstack((base.reshape(-1, 1), cos.reshape(-1, 1)))\n",
|
||||||
"y_train = y"
|
"y_train = y"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train\n",
|
"## Train\n",
|
||||||
@@ -133,13 +157,13 @@
|
|||||||
"|**iterations**|Number of iterations. Subsampling requires a lot of iterations at smaller percent so in order for subsampling to be used we need to set iterations to be a high number.|\n",
|
"|**iterations**|Number of iterations. Subsampling requires a lot of iterations at smaller percent so in order for subsampling to be used we need to set iterations to be a high number.|\n",
|
||||||
"|**experiment_timeout_minutes**|The experiment timeout, it's set to 5 right now to shorten the demo but it should probably be higher if we want to finish all the iterations.|\n",
|
"|**experiment_timeout_minutes**|The experiment timeout, it's set to 5 right now to shorten the demo but it should probably be higher if we want to finish all the iterations.|\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -152,57 +176,33 @@
|
|||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" enable_subsampling=True,\n",
|
" enable_subsampling=True,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": []
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "rogehe"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,7 +1,33 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"name": "build-model-run-history-03",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "pasha"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"notebookId": 3836944406456339
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
||||||
@@ -9,27 +35,27 @@
|
|||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#Model Building"
|
"#Model Building"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import pprint\n",
|
"import pprint\n",
|
||||||
@@ -40,37 +66,37 @@
|
|||||||
"from pyspark.ml.classification import LogisticRegression\n",
|
"from pyspark.ml.classification import LogisticRegression\n",
|
||||||
"from pyspark.ml.evaluation import BinaryClassificationEvaluator\n",
|
"from pyspark.ml.evaluation import BinaryClassificationEvaluator\n",
|
||||||
"from pyspark.ml.tuning import CrossValidator, ParamGridBuilder"
|
"from pyspark.ml.tuning import CrossValidator, ParamGridBuilder"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Set auth to be used by workspace related APIs.\n",
|
"# Set auth to be used by workspace related APIs.\n",
|
||||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
||||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
||||||
"auth = None"
|
"auth = None"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# import the Workspace class and check the azureml SDK version\n",
|
"# import the Workspace class and check the azureml SDK version\n",
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
@@ -80,13 +106,13 @@
|
|||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#get the train and test datasets\n",
|
"#get the train and test datasets\n",
|
||||||
"train_data_path = \"AdultCensusIncomeTrain\"\n",
|
"train_data_path = \"AdultCensusIncomeTrain\"\n",
|
||||||
@@ -99,20 +125,20 @@
|
|||||||
"print(\"test: ({}, {})\".format(test.count(), len(test.columns)))\n",
|
"print(\"test: ({}, {})\".format(test.count(), len(test.columns)))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"train.printSchema()"
|
"train.printSchema()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#Define Model"
|
"#Define Model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"label = \"income\"\n",
|
"label = \"income\"\n",
|
||||||
"dtypes = dict(train.dtypes)\n",
|
"dtypes = dict(train.dtypes)\n",
|
||||||
@@ -142,13 +168,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# assemble the encoded feature columns in to a column named \"features\"\n",
|
"# assemble the encoded feature columns in to a column named \"features\"\n",
|
||||||
"assembler = VectorAssembler(inputCols=featureCols, outputCol=\"features\")"
|
"assembler = VectorAssembler(inputCols=featureCols, outputCol=\"features\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.run import Run\n",
|
"from azureml.core.run import Run\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
@@ -216,24 +242,24 @@
|
|||||||
"root_run.complete()\n",
|
"root_run.complete()\n",
|
||||||
"root_run_id = root_run.id\n",
|
"root_run_id = root_run.id\n",
|
||||||
"print (\"run id:\", root_run.id)"
|
"print (\"run id:\", root_run.id)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"metrics = root_run.get_metrics(recursive=True)\n",
|
"metrics = root_run.get_metrics(recursive=True)\n",
|
||||||
"best_run_id = max(metrics, key = lambda k: metrics[k]['au_roc'])\n",
|
"best_run_id = max(metrics, key = lambda k: metrics[k]['au_roc'])\n",
|
||||||
"print(best_run_id, metrics[best_run_id]['au_roc'], metrics[best_run_id]['reg'])"
|
"print(best_run_id, metrics[best_run_id]['au_roc'], metrics[best_run_id]['reg'])"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Get the best run\n",
|
"#Get the best run\n",
|
||||||
"child_runs = {}\n",
|
"child_runs = {}\n",
|
||||||
@@ -242,31 +268,31 @@
|
|||||||
" child_runs[r.id] = r\n",
|
" child_runs[r.id] = r\n",
|
||||||
" \n",
|
" \n",
|
||||||
"best_run = child_runs[best_run_id]"
|
"best_run = child_runs[best_run_id]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Download the model from the best run to a local folder\n",
|
"#Download the model from the best run to a local folder\n",
|
||||||
"best_model_file_name = \"best_model.zip\"\n",
|
"best_model_file_name = \"best_model.zip\"\n",
|
||||||
"best_run.download_file(name = 'outputs/' + model_name, output_file_path = best_model_file_name)"
|
"best_run.download_file(name = 'outputs/' + model_name, output_file_path = best_model_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#Model Evaluation"
|
"#Model Evaluation"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"##unzip the model to dbfs (as load() seems to require that) and load it.\n",
|
"##unzip the model to dbfs (as load() seems to require that) and load it.\n",
|
||||||
"if os.path.isfile(model_dbfs) or os.path.isdir(model_dbfs):\n",
|
"if os.path.isfile(model_dbfs) or os.path.isdir(model_dbfs):\n",
|
||||||
@@ -274,25 +300,25 @@
|
|||||||
"shutil.unpack_archive(best_model_file_name, model_dbfs)\n",
|
"shutil.unpack_archive(best_model_file_name, model_dbfs)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"model_p_best = PipelineModel.load(model_name)"
|
"model_p_best = PipelineModel.load(model_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# make prediction\n",
|
"# make prediction\n",
|
||||||
"pred = model_p_best.transform(test)\n",
|
"pred = model_p_best.transform(test)\n",
|
||||||
"output = pred[['hours_per_week','age','workclass','marital_status','income','prediction']]\n",
|
"output = pred[['hours_per_week','age','workclass','marital_status','income','prediction']]\n",
|
||||||
"display(output.limit(5))"
|
"display(output.limit(5))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# evaluate. note only 2 metrics are supported out of the box by Spark ML.\n",
|
"# evaluate. note only 2 metrics are supported out of the box by Spark ML.\n",
|
||||||
"bce = BinaryClassificationEvaluator(rawPredictionCol='rawPrediction')\n",
|
"bce = BinaryClassificationEvaluator(rawPredictionCol='rawPrediction')\n",
|
||||||
@@ -301,80 +327,54 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"print(\"Area under ROC: {}\".format(au_roc))\n",
|
"print(\"Area under ROC: {}\".format(au_roc))\n",
|
||||||
"print(\"Area Under PR: {}\".format(au_prc))"
|
"print(\"Area Under PR: {}\".format(au_prc))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#Model Persistence"
|
"#Model Persistence"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"##NOTE: by default the model is saved to and loaded from /dbfs/ instead of cwd!\n",
|
"##NOTE: by default the model is saved to and loaded from /dbfs/ instead of cwd!\n",
|
||||||
"model_p_best.write().overwrite().save(model_name)\n",
|
"model_p_best.write().overwrite().save(model_name)\n",
|
||||||
"print(\"saved model to {}\".format(model_dbfs))"
|
"print(\"saved model to {}\".format(model_dbfs))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%sh\n",
|
"%sh\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ls -la /dbfs/AdultCensus_runHistory.mml/*"
|
"ls -la /dbfs/AdultCensus_runHistory.mml/*"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"dbutils.notebook.exit(\"success\")"
|
"dbutils.notebook.exit(\"success\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "pasha"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"name": "build-model-run-history-03",
|
|
||||||
"notebookId": 3836944406456339
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
"nbformat_minor": 1
|
||||||
}
|
}
|
||||||
@@ -1,7 +1,33 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"name": "deploy-to-aci-04",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "pasha"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"notebookId": 3836944406456376
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
||||||
@@ -9,53 +35,53 @@
|
|||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Please ensure you have run all previous notebooks in sequence before running this.\n",
|
"Please ensure you have run all previous notebooks in sequence before running this.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Please Register Azure Container Instance(ACI) using Azure Portal: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-supported-services#portal in your subscription before using the SDK to deploy your ML model to ACI."
|
"Please Register Azure Container Instance(ACI) using Azure Portal: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-supported-services#portal in your subscription before using the SDK to deploy your ML model to ACI."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Set auth to be used by workspace related APIs.\n",
|
"# Set auth to be used by workspace related APIs.\n",
|
||||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
||||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
||||||
"auth = None"
|
"auth = None"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -64,13 +90,13 @@
|
|||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"##NOTE: service deployment always gets the model from the current working dir.\n",
|
"##NOTE: service deployment always gets the model from the current working dir.\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
@@ -81,13 +107,13 @@
|
|||||||
"print(\"copy model from dbfs to local\")\n",
|
"print(\"copy model from dbfs to local\")\n",
|
||||||
"model_local = \"file:\" + os.getcwd() + \"/\" + model_name\n",
|
"model_local = \"file:\" + os.getcwd() + \"/\" + model_name\n",
|
||||||
"dbutils.fs.cp(model_name, model_local, True)"
|
"dbutils.fs.cp(model_name, model_local, True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Register the model\n",
|
"#Register the model\n",
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
@@ -97,13 +123,13 @@
|
|||||||
" workspace = ws)\n",
|
" workspace = ws)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(mymodel.name, mymodel.description, mymodel.version)"
|
"print(mymodel.name, mymodel.description, mymodel.version)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#%%writefile score_sparkml.py\n",
|
"#%%writefile score_sparkml.py\n",
|
||||||
"score_sparkml = \"\"\"\n",
|
"score_sparkml = \"\"\"\n",
|
||||||
@@ -154,13 +180,13 @@
|
|||||||
" \n",
|
" \n",
|
||||||
"with open(\"score_sparkml.py\", \"w\") as file:\n",
|
"with open(\"score_sparkml.py\", \"w\") as file:\n",
|
||||||
" file.write(score_sparkml)"
|
" file.write(score_sparkml)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -168,13 +194,13 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"with open(\"mydeployenv.yml\",\"w\") as f:\n",
|
"with open(\"mydeployenv.yml\",\"w\") as f:\n",
|
||||||
" f.write(myacienv.serialize_to_string())"
|
" f.write(myacienv.serialize_to_string())"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#deploy to ACI\n",
|
"#deploy to ACI\n",
|
||||||
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
||||||
@@ -184,13 +210,13 @@
|
|||||||
" memory_gb = 2, \n",
|
" memory_gb = 2, \n",
|
||||||
" tags = {'name':'Databricks Azure ML ACI'}, \n",
|
" tags = {'name':'Databricks Azure ML ACI'}, \n",
|
||||||
" description = 'This is for ADB and AML example. Azure Databricks & Azure ML SDK demo with ACI by Parashar.')"
|
" description = 'This is for ADB and AML example. Azure Databricks & Azure ML SDK demo with ACI by Parashar.')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# this will take 10-15 minutes to finish\n",
|
"# this will take 10-15 minutes to finish\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -215,44 +241,44 @@
|
|||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"myservice.wait_for_deployment(show_output=True)"
|
"myservice.wait_for_deployment(show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"help(Webservice)"
|
"help(Webservice)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# List images by ws\n",
|
"# List images by ws\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for i in ContainerImage.list(workspace = ws):\n",
|
"for i in ContainerImage.list(workspace = ws):\n",
|
||||||
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#for using the Web HTTP API \n",
|
"#for using the Web HTTP API \n",
|
||||||
"print(myservice.scoring_uri)"
|
"print(myservice.scoring_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -263,62 +289,36 @@
|
|||||||
"test_json = json.dumps(test.toJSON().collect())\n",
|
"test_json = json.dumps(test.toJSON().collect())\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(test_json)"
|
"print(test_json)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#using data defined above predict if income is >50K (1) or <=50K (0)\n",
|
"#using data defined above predict if income is >50K (1) or <=50K (0)\n",
|
||||||
"myservice.run(input_data=test_json)"
|
"myservice.run(input_data=test_json)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#comment to not delete the web service\n",
|
"#comment to not delete the web service\n",
|
||||||
"myservice.delete()"
|
"myservice.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "pasha"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"name": "deploy-to-aci-04",
|
|
||||||
"notebookId": 3836944406456376
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
"nbformat_minor": 1
|
||||||
}
|
}
|
||||||
@@ -1,7 +1,33 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"name": "deploy-to-aks-existingimage-05",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "pasha"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"notebookId": 1030695628045968
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
||||||
@@ -9,51 +35,51 @@
|
|||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This notebook uses image from ACI notebook for deploying to AKS."
|
"This notebook uses image from ACI notebook for deploying to AKS."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Set auth to be used by workspace related APIs.\n",
|
"# Set auth to be used by workspace related APIs.\n",
|
||||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
||||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
||||||
"auth = None"
|
"auth = None"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -62,36 +88,36 @@
|
|||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# List images by ws\n",
|
"# List images by ws\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.core.image import ContainerImage\n",
|
"from azureml.core.image import ContainerImage\n",
|
||||||
"for i in ContainerImage.list(workspace = ws):\n",
|
"for i in ContainerImage.list(workspace = ws):\n",
|
||||||
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.image import Image\n",
|
"from azureml.core.image import Image\n",
|
||||||
"myimage = Image(workspace=ws, name=\"aciws\")"
|
"myimage = Image(workspace=ws, name=\"aciws\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#create AKS compute\n",
|
"#create AKS compute\n",
|
||||||
"#it may take 20-25 minutes to create a new cluster\n",
|
"#it may take 20-25 minutes to create a new cluster\n",
|
||||||
@@ -112,23 +138,23 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"print(aks_target.provisioning_state)\n",
|
"print(aks_target.provisioning_state)\n",
|
||||||
"print(aks_target.provisioning_errors)"
|
"print(aks_target.provisioning_errors)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"help( Webservice.deploy_from_image)"
|
"help( Webservice.deploy_from_image)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
"from azureml.core.image import ContainerImage\n",
|
"from azureml.core.image import ContainerImage\n",
|
||||||
@@ -149,33 +175,33 @@
|
|||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aks_service.wait_for_deployment(show_output=True)"
|
"aks_service.wait_for_deployment(show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"aks_service.deployment_status"
|
"aks_service.deployment_status"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#for using the Web HTTP API \n",
|
"#for using the Web HTTP API \n",
|
||||||
"print(aks_service.scoring_uri)\n",
|
"print(aks_service.scoring_uri)\n",
|
||||||
"print(aks_service.get_keys())"
|
"print(aks_service.get_keys())"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -186,65 +212,39 @@
|
|||||||
"test_json = json.dumps(test.toJSON().collect())\n",
|
"test_json = json.dumps(test.toJSON().collect())\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(test_json)"
|
"print(test_json)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#using data defined above predict if income is >50K (1) or <=50K (0)\n",
|
"#using data defined above predict if income is >50K (1) or <=50K (0)\n",
|
||||||
"aks_service.run(input_data=test_json)"
|
"aks_service.run(input_data=test_json)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#comment to not delete the web service\n",
|
"#comment to not delete the web service\n",
|
||||||
"aks_service.delete()\n",
|
"aks_service.delete()\n",
|
||||||
"#image.delete()\n",
|
"#image.delete()\n",
|
||||||
"#model.delete()\n",
|
"#model.delete()\n",
|
||||||
"aks_target.delete() "
|
"aks_target.delete() "
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "pasha"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"name": "deploy-to-aks-existingimage-05",
|
|
||||||
"notebookId": 1030695628045968
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
"nbformat_minor": 1
|
||||||
}
|
}
|
||||||
@@ -1,7 +1,33 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"name": "ingest-data-02",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "pasha"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"notebookId": 3836944406456362
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
||||||
@@ -9,37 +35,37 @@
|
|||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#Data Ingestion"
|
"#Data Ingestion"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import urllib"
|
"import urllib"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Download AdultCensusIncome.csv from Azure CDN. This file has 32,561 rows.\n",
|
"# Download AdultCensusIncome.csv from Azure CDN. This file has 32,561 rows.\n",
|
||||||
"dataurl = \"https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv\"\n",
|
"dataurl = \"https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv\"\n",
|
||||||
@@ -51,53 +77,53 @@
|
|||||||
"else:\n",
|
"else:\n",
|
||||||
" print(\"downloading {} to {}\".format(datafile, datafile_dbfs))\n",
|
" print(\"downloading {} to {}\".format(datafile, datafile_dbfs))\n",
|
||||||
" urllib.request.urlretrieve(dataurl, datafile_dbfs)"
|
" urllib.request.urlretrieve(dataurl, datafile_dbfs)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Create a Spark dataframe out of the csv file.\n",
|
"# Create a Spark dataframe out of the csv file.\n",
|
||||||
"data_all = sqlContext.read.format('csv').options(header='true', inferSchema='true', ignoreLeadingWhiteSpace='true', ignoreTrailingWhiteSpace='true').load(datafile)\n",
|
"data_all = sqlContext.read.format('csv').options(header='true', inferSchema='true', ignoreLeadingWhiteSpace='true', ignoreTrailingWhiteSpace='true').load(datafile)\n",
|
||||||
"print(\"({}, {})\".format(data_all.count(), len(data_all.columns)))\n",
|
"print(\"({}, {})\".format(data_all.count(), len(data_all.columns)))\n",
|
||||||
"data_all.printSchema()"
|
"data_all.printSchema()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#renaming columns\n",
|
"#renaming columns\n",
|
||||||
"columns_new = [col.replace(\"-\", \"_\") for col in data_all.columns]\n",
|
"columns_new = [col.replace(\"-\", \"_\") for col in data_all.columns]\n",
|
||||||
"data_all = data_all.toDF(*columns_new)\n",
|
"data_all = data_all.toDF(*columns_new)\n",
|
||||||
"data_all.printSchema()"
|
"data_all.printSchema()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"display(data_all.limit(5))"
|
"display(data_all.limit(5))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#Data Preparation"
|
"#Data Preparation"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Choose feature columns and the label column.\n",
|
"# Choose feature columns and the label column.\n",
|
||||||
"label = \"income\"\n",
|
"label = \"income\"\n",
|
||||||
@@ -113,20 +139,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"print(\"train ({}, {})\".format(train.count(), len(train.columns)))\n",
|
"print(\"train ({}, {})\".format(train.count(), len(train.columns)))\n",
|
||||||
"print(\"test ({}, {})\".format(test.count(), len(test.columns)))"
|
"print(\"test ({}, {})\".format(test.count(), len(test.columns)))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#Data Persistence"
|
"#Data Persistence"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Write the train and test data sets to intermediate storage\n",
|
"# Write the train and test data sets to intermediate storage\n",
|
||||||
"train_data_path = \"AdultCensusIncomeTrain\"\n",
|
"train_data_path = \"AdultCensusIncomeTrain\"\n",
|
||||||
@@ -138,49 +164,23 @@
|
|||||||
"train.write.mode('overwrite').parquet(train_data_path)\n",
|
"train.write.mode('overwrite').parquet(train_data_path)\n",
|
||||||
"test.write.mode('overwrite').parquet(test_data_path)\n",
|
"test.write.mode('overwrite').parquet(test_data_path)\n",
|
||||||
"print(\"train and test datasets saved to {} and {}\".format(train_data_path_dbfs, test_data_path_dbfs))"
|
"print(\"train and test datasets saved to {} and {}\".format(train_data_path_dbfs, test_data_path_dbfs))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": []
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "pasha"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"name": "ingest-data-02",
|
|
||||||
"notebookId": 3836944406456362
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
"nbformat_minor": 1
|
||||||
}
|
}
|
||||||
@@ -1,7 +1,33 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"name": "installation-and-configuration-01",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "pasha"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"notebookId": 3688394266452835
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
||||||
@@ -9,17 +35,17 @@
|
|||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
|
"We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
|
||||||
@@ -28,22 +54,22 @@
|
|||||||
"* Source: Upload Python Egg or PyPi\n",
|
"* Source: Upload Python Egg or PyPi\n",
|
||||||
"* PyPi Name: `azureml-sdk[databricks]`\n",
|
"* PyPi Name: `azureml-sdk[databricks]`\n",
|
||||||
"* Select Install Library"
|
"* Select Install Library"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number - based on build number of preview/master.\n",
|
"# Check core SDK version number - based on build number of preview/master.\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Please specify the Azure subscription Id, resource group name, workspace name, and the region in which you want to create the Azure Machine Learning Workspace.\n",
|
"Please specify the Azure subscription Id, resource group name, workspace name, and the region in which you want to create the Azure Machine Learning Workspace.\n",
|
||||||
@@ -53,37 +79,37 @@
|
|||||||
"For the resource_group, use the name of the resource group that contains your Azure Databricks Workspace.\n",
|
"For the resource_group, use the name of the resource group that contains your Azure Databricks Workspace.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: If you provide a resource group name that does not exist, the resource group will be automatically created. This may or may not succeed in your environment, depending on the permissions you have on your Azure Subscription."
|
"NOTE: If you provide a resource group name that does not exist, the resource group will be automatically created. This may or may not succeed in your environment, depending on the permissions you have on your Azure Subscription."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# subscription_id = \"<your-subscription-id>\"\n",
|
"# subscription_id = \"<your-subscription-id>\"\n",
|
||||||
"# resource_group = \"<your-existing-resource-group>\"\n",
|
"# resource_group = \"<your-existing-resource-group>\"\n",
|
||||||
"# workspace_name = \"<a-new-or-existing-workspace; it is unrelated to Databricks workspace>\"\n",
|
"# workspace_name = \"<a-new-or-existing-workspace; it is unrelated to Databricks workspace>\"\n",
|
||||||
"# workspace_region = \"<your-resource group-region>\""
|
"# workspace_region = \"<your-resource group-region>\""
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Set auth to be used by workspace related APIs.\n",
|
"# Set auth to be used by workspace related APIs.\n",
|
||||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
||||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
||||||
"auth = None"
|
"auth = None"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# import the Workspace class and check the azureml SDK version\n",
|
"# import the Workspace class and check the azureml SDK version\n",
|
||||||
"# exist_ok checks if workspace exists or not.\n",
|
"# exist_ok checks if workspace exists or not.\n",
|
||||||
@@ -96,23 +122,23 @@
|
|||||||
" location = workspace_region,\n",
|
" location = workspace_region,\n",
|
||||||
" auth = auth,\n",
|
" auth = auth,\n",
|
||||||
" exist_ok=True)"
|
" exist_ok=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#get workspace details\n",
|
"#get workspace details\n",
|
||||||
"ws.get_details()"
|
"ws.get_details()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace(workspace_name = workspace_name,\n",
|
"ws = Workspace(workspace_name = workspace_name,\n",
|
||||||
" subscription_id = subscription_id,\n",
|
" subscription_id = subscription_id,\n",
|
||||||
@@ -123,22 +149,22 @@
|
|||||||
"ws.write_config()\n",
|
"ws.write_config()\n",
|
||||||
"#if you need to give a different path/filename please use this\n",
|
"#if you need to give a different path/filename please use this\n",
|
||||||
"#write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
"#write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"help(Workspace)"
|
"help(Workspace)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# import the Workspace class and check the azureml SDK version\n",
|
"# import the Workspace class and check the azureml SDK version\n",
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
@@ -149,42 +175,16 @@
|
|||||||
" 'Azure region: ' + ws.location, \n",
|
" 'Azure region: ' + ws.location, \n",
|
||||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "pasha"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
},
|
|
||||||
"name": "installation-and-configuration-01",
|
|
||||||
"notebookId": 3688394266452835
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
"nbformat_minor": 1
|
||||||
}
|
}
|
||||||
@@ -1,16 +1,45 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"name": "auto-ml-classification-local-adb",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "sasum"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
},
|
||||||
|
"notebookId": 587284549713154
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated ML on Azure Databricks\n",
|
"# Automated ML on Azure Databricks\n",
|
||||||
@@ -27,10 +56,10 @@
|
|||||||
"7. Test the best fitted model.\n",
|
"7. Test the best fitted model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Before running this notebook, please follow the <a href=\"https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks\" target=\"_blank\">readme for using Automated ML on Azure Databricks</a> for installing necessary libraries to your cluster."
|
"Before running this notebook, please follow the <a href=\"https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks\" target=\"_blank\">readme for using Automated ML on Azure Databricks</a> for installing necessary libraries to your cluster."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We support installing AML SDK with Automated ML as library from GUI. When attaching a library follow <a href=\"https://docs.databricks.com/user-guide/libraries.html\" target=\"_blank\">this link</a> and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
|
"We support installing AML SDK with Automated ML as library from GUI. When attaching a library follow <a href=\"https://docs.databricks.com/user-guide/libraries.html\" target=\"_blank\">this link</a> and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
|
||||||
@@ -39,28 +68,28 @@
|
|||||||
"* Source: Upload Python Egg or PyPi\n",
|
"* Source: Upload Python Egg or PyPi\n",
|
||||||
"* PyPi Name: `azureml-sdk[automl_databricks]`\n",
|
"* PyPi Name: `azureml-sdk[automl_databricks]`\n",
|
||||||
"* Select Install Library"
|
"* Select Install Library"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Check the Azure ML Core SDK Version to Validate Your Installation"
|
"### Check the Azure ML Core SDK Version to Validate Your Installation"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK Version:\", azureml.core.VERSION)"
|
"print(\"SDK Version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize an Azure ML Workspace\n",
|
"## Initialize an Azure ML Workspace\n",
|
||||||
@@ -76,22 +105,22 @@
|
|||||||
"* Your subscription id. Use the `id` value from the `az account show` command output above.\n",
|
"* Your subscription id. Use the `id` value from the `az account show` command output above.\n",
|
||||||
"* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n",
|
"* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n",
|
||||||
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
|
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
"subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
||||||
"resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
"resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
||||||
"workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
"workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
||||||
"workspace_region = \"<azureregion>\" #your region"
|
"workspace_region = \"<azureregion>\" #your region"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Creating a Workspace\n",
|
"## Creating a Workspace\n",
|
||||||
@@ -105,13 +134,13 @@
|
|||||||
"If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n",
|
"If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** Creation of a new workspace can take several minutes."
|
"**Note:** Creation of a new workspace can take several minutes."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
@@ -122,21 +151,21 @@
|
|||||||
" location = workspace_region, \n",
|
" location = workspace_region, \n",
|
||||||
" exist_ok=True)\n",
|
" exist_ok=True)\n",
|
||||||
"ws.get_details()"
|
"ws.get_details()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configuring Your Local Environment\n",
|
"## Configuring Your Local Environment\n",
|
||||||
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -146,21 +175,21 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
||||||
"ws.write_config()"
|
"ws.write_config()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create a Folder to Host Sample Projects\n",
|
"## Create a Folder to Host Sample Projects\n",
|
||||||
"Finally, create a folder where all the sample projects will be hosted."
|
"Finally, create a folder where all the sample projects will be hosted."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -170,22 +199,22 @@
|
|||||||
" os.mkdir(sample_projects_folder)\n",
|
" os.mkdir(sample_projects_folder)\n",
|
||||||
" \n",
|
" \n",
|
||||||
"print('Sample projects will be created in {}.'.format(sample_projects_folder))"
|
"print('Sample projects will be created in {}.'.format(sample_projects_folder))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create an Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
@@ -202,13 +231,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"experiment_name = 'automl-local-classification'\n",
|
"experiment_name = 'automl-local-classification'\n",
|
||||||
@@ -226,48 +255,48 @@
|
|||||||
"output['Experiment Name'] = experiment.name\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
"pd.DataFrame(data = output, index = ['']).T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Registering Datastore"
|
"## Registering Datastore"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Datastore is the way to save connection information to a storage service (e.g. Azure Blob, Azure Data Lake, Azure SQL) information to your workspace so you can access them without exposing credentials in your code. The first thing you will need to do is register a datastore, you can refer to our [python SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) on how to register datastores. __Note: for best security practices, please do not check in code that contains registering datastores with secrets into your source control__\n",
|
"Datastore is the way to save connection information to a storage service (e.g. Azure Blob, Azure Data Lake, Azure SQL) information to your workspace so you can access them without exposing credentials in your code. The first thing you will need to do is register a datastore, you can refer to our [python SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) on how to register datastores. __Note: for best security practices, please do not check in code that contains registering datastores with secrets into your source control__\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The code below registers a datastore pointing to a publicly readable blob container."
|
"The code below registers a datastore pointing to a publicly readable blob container."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Datastore\n",
|
"from azureml.core import Datastore\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -281,10 +310,10 @@
|
|||||||
" account_name = account_name,\n",
|
" account_name = account_name,\n",
|
||||||
" overwrite = True\n",
|
" overwrite = True\n",
|
||||||
")"
|
")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Below is an example on how to register a private blob container\n",
|
"Below is an example on how to register a private blob container\n",
|
||||||
@@ -308,17 +337,17 @@
|
|||||||
" client_secret = 'client-secret-of-service-principal'\n",
|
" client_secret = 'client-secret-of-service-principal'\n",
|
||||||
")\n",
|
")\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Load Training Data Using DataPrep"
|
"## Load Training Data Using DataPrep"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Automated ML takes a Dataflow as input.\n",
|
"Automated ML takes a Dataflow as input.\n",
|
||||||
@@ -333,13 +362,13 @@
|
|||||||
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n",
|
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
|
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.dataprep as dprep\n",
|
"import azureml.dataprep as dprep\n",
|
||||||
"from azureml.data.datapath import DataPath\n",
|
"from azureml.data.datapath import DataPath\n",
|
||||||
@@ -348,36 +377,36 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"X_train = dprep.read_csv(datastore.path('X.csv'))\n",
|
"X_train = dprep.read_csv(datastore.path('X.csv'))\n",
|
||||||
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Review the Data Preparation Result\n",
|
"## Review the Data Preparation Result\n",
|
||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets."
|
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_train.get_profile()"
|
"X_train.get_profile()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_train.get_profile()"
|
"y_train.get_profile()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configure AutoML\n",
|
"## Configure AutoML\n",
|
||||||
@@ -399,13 +428,13 @@
|
|||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"|**preprocess**|set this to True to enable pre-processing of data eg. string to numeric using one-hot encoding|\n",
|
"|**preprocess**|set this to True to enable pre-processing of data eg. string to numeric using one-hot encoding|\n",
|
||||||
"|**exit_score**|Target score for experiment. It is associated with the metric. eg. exit_score=0.995 will exit experiment after that|"
|
"|**exit_score**|Target score for experiment. It is associated with the metric. eg. exit_score=0.995 will exit experiment after that|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -420,91 +449,91 @@
|
|||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train the Models\n",
|
"## Train the Models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while."
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Continue experiment"
|
"## Continue experiment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run.continue_experiment(iterations=2,\n",
|
"local_run.continue_experiment(iterations=2,\n",
|
||||||
" X=X_train, \n",
|
" X=X_train, \n",
|
||||||
" y=y_train,\n",
|
" y=y_train,\n",
|
||||||
" spark_context=sc,\n",
|
" spark_context=sc,\n",
|
||||||
" show_output=True)"
|
" show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Explore the Results"
|
"## Explore the Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Portal URL for Monitoring Runs\n",
|
"#### Portal URL for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The following will provide a link to the web interface to explore individual run details and status. In the future we might support output displayed in the notebook."
|
"The following will provide a link to the web interface to explore individual run details and status. In the future we might support output displayed in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"displayHTML(\"<a href={} target='_blank'>Your experiment in Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
|
"displayHTML(\"<a href={} target='_blank'>Your experiment in Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The following will show the child runs and waits for the parent run to complete."
|
"The following will show the child runs and waits for the parent run to complete."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve All Child Runs after the experiment is completed (in portal)\n",
|
"#### Retrieve All Child Runs after the experiment is completed (in portal)\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -515,67 +544,67 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model after the above run is complete \n",
|
"### Retrieve the Best Model after the above run is complete \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric after the above run is complete based on the child run\n",
|
"#### Best Model Based on Any Other Metric after the above run is complete based on the child run\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View the engineered names for featurized data\n",
|
"#### View the engineered names for featurized data\n",
|
||||||
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
|
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
|
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View the featurization summary\n",
|
"#### View the featurization summary\n",
|
||||||
@@ -585,52 +614,52 @@
|
|||||||
"- Type detected\n",
|
"- Type detected\n",
|
||||||
"- If feature was dropped\n",
|
"- If feature was dropped\n",
|
||||||
"- List of feature transformations for the raw feature"
|
"- List of feature transformations for the raw feature"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Test the Best Fitted Model\n",
|
"### Test the Best Fitted Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data - you can split the dataset beforehand & pass Train dataset to AutoML and use Test dataset to evaluate the best model."
|
"#### Load Test Data - you can split the dataset beforehand & pass Train dataset to AutoML and use Test dataset to evaluate the best model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"blob_location = \"https://{}.blob.core.windows.net/{}\".format(account_name, container_name)\n",
|
"blob_location = \"https://{}.blob.core.windows.net/{}\".format(account_name, container_name)\n",
|
||||||
"X_test = pd.read_csv(\"{}./X_valid.csv\".format(blob_location), header=0)\n",
|
"X_test = pd.read_csv(\"{}./X_valid.csv\".format(blob_location), header=0)\n",
|
||||||
"y_test = pd.read_csv(\"{}/y_valid.csv\".format(blob_location), header=0)\n",
|
"y_test = pd.read_csv(\"{}/y_valid.csv\".format(blob_location), header=0)\n",
|
||||||
"images = pd.read_csv(\"{}/images.csv\".format(blob_location), header=None)\n",
|
"images = pd.read_csv(\"{}/images.csv\".format(blob_location), header=None)\n",
|
||||||
"images = np.reshape(images.values, (100,8,8))"
|
"images = np.reshape(images.values, (100,8,8))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will try to predict digits and see how our model works. This is just an example to show you."
|
"We will try to predict digits and see how our model works. This is just an example to show you."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
@@ -643,61 +672,32 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" display(fig)"
|
" display(fig)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"When deploying an automated ML trained model, please specify _pippackages=['azureml-sdk[automl]']_ in your CondaDependencies.\n",
|
"When deploying an automated ML trained model, please specify _pippackages=['azureml-sdk[automl]']_ in your CondaDependencies.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Please refer to only the **Deploy** section in this notebook - <a href=\"https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment\" target=\"_blank\">Deployment of Automated ML trained model</a>"
|
"Please refer to only the **Deploy** section in this notebook - <a href=\"https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment\" target=\"_blank\">Deployment of Automated ML trained model</a>"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": []
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "sasum"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
},
|
|
||||||
"name": "auto-ml-classification-local-adb",
|
|
||||||
"notebookId": 587284549713154
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
"nbformat_minor": 1
|
||||||
}
|
}
|
||||||
@@ -1,16 +1,45 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"name": "auto-ml-classification-local-adb",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "sasum"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.5"
|
||||||
|
},
|
||||||
|
"notebookId": 2733885892129020
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
|
"We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
|
||||||
@@ -19,10 +48,10 @@
|
|||||||
"* Source: Upload Python Egg or PyPi\n",
|
"* Source: Upload Python Egg or PyPi\n",
|
||||||
"* PyPi Name: `azureml-sdk[automl_databricks]`\n",
|
"* PyPi Name: `azureml-sdk[automl_databricks]`\n",
|
||||||
"* Select Install Library"
|
"* Select Install Library"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML : Classification with Local Compute on Azure DataBricks with deployment to ACI\n",
|
"# AutoML : Classification with Local Compute on Azure DataBricks with deployment to ACI\n",
|
||||||
@@ -41,10 +70,10 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"Prerequisites:\n",
|
"Prerequisites:\n",
|
||||||
"Before running this notebook, please follow the readme for installing necessary libraries to your cluster."
|
"Before running this notebook, please follow the readme for installing necessary libraries to your cluster."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register Machine Learning Services Resource Provider\n",
|
"## Register Machine Learning Services Resource Provider\n",
|
||||||
@@ -54,28 +83,28 @@
|
|||||||
"Select the subscription that you want to use.\n",
|
"Select the subscription that you want to use.\n",
|
||||||
"Click on Resource providers\n",
|
"Click on Resource providers\n",
|
||||||
"Click the Register link next to Microsoft.MachineLearningServices"
|
"Click the Register link next to Microsoft.MachineLearningServices"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Check the Azure ML Core SDK Version to Validate Your Installation"
|
"### Check the Azure ML Core SDK Version to Validate Your Installation"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK Version:\", azureml.core.VERSION)"
|
"print(\"SDK Version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize an Azure ML Workspace\n",
|
"## Initialize an Azure ML Workspace\n",
|
||||||
@@ -91,22 +120,22 @@
|
|||||||
"* Your subscription id. Use the `id` value from the `az account show` command output above.\n",
|
"* Your subscription id. Use the `id` value from the `az account show` command output above.\n",
|
||||||
"* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n",
|
"* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n",
|
||||||
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
|
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
"subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
||||||
"resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
"resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
||||||
"workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
"workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
||||||
"workspace_region = \"<azureregion>\" #your region"
|
"workspace_region = \"<azureregion>\" #your region"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Creating a Workspace\n",
|
"## Creating a Workspace\n",
|
||||||
@@ -120,13 +149,13 @@
|
|||||||
"If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n",
|
"If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note:** Creation of a new workspace can take several minutes."
|
"**Note:** Creation of a new workspace can take several minutes."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
@@ -137,21 +166,21 @@
|
|||||||
" location = workspace_region, \n",
|
" location = workspace_region, \n",
|
||||||
" exist_ok=True)\n",
|
" exist_ok=True)\n",
|
||||||
"ws.get_details()"
|
"ws.get_details()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configuring Your Local Environment\n",
|
"## Configuring Your Local Environment\n",
|
||||||
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -162,21 +191,21 @@
|
|||||||
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
||||||
"ws.write_config()\n",
|
"ws.write_config()\n",
|
||||||
"write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
"write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create a Folder to Host Sample Projects\n",
|
"## Create a Folder to Host Sample Projects\n",
|
||||||
"Finally, create a folder where all the sample projects will be hosted."
|
"Finally, create a folder where all the sample projects will be hosted."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -186,22 +215,22 @@
|
|||||||
" os.mkdir(sample_projects_folder)\n",
|
" os.mkdir(sample_projects_folder)\n",
|
||||||
" \n",
|
" \n",
|
||||||
"print('Sample projects will be created in {}.'.format(sample_projects_folder))"
|
"print('Sample projects will be created in {}.'.format(sample_projects_folder))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create an Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
@@ -219,13 +248,13 @@
|
|||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"experiment_name = 'automl-local-classification'\n",
|
"experiment_name = 'automl-local-classification'\n",
|
||||||
@@ -243,48 +272,48 @@
|
|||||||
"output['Experiment Name'] = experiment.name\n",
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
"pd.DataFrame(data = output, index = ['']).T"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Registering Datastore"
|
"## Registering Datastore"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Datastore is the way to save connection information to a storage service (e.g. Azure Blob, Azure Data Lake, Azure SQL) information to your workspace so you can access them without exposing credentials in your code. The first thing you will need to do is register a datastore, you can refer to our [python SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) on how to register datastores. __Note: for best security practices, please do not check in code that contains registering datastores with secrets into your source control__\n",
|
"Datastore is the way to save connection information to a storage service (e.g. Azure Blob, Azure Data Lake, Azure SQL) information to your workspace so you can access them without exposing credentials in your code. The first thing you will need to do is register a datastore, you can refer to our [python SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) on how to register datastores. __Note: for best security practices, please do not check in code that contains registering datastores with secrets into your source control__\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The code below registers a datastore pointing to a publicly readable blob container."
|
"The code below registers a datastore pointing to a publicly readable blob container."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Datastore\n",
|
"from azureml.core import Datastore\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -298,10 +327,10 @@
|
|||||||
" account_name = account_name,\n",
|
" account_name = account_name,\n",
|
||||||
" overwrite = True\n",
|
" overwrite = True\n",
|
||||||
")"
|
")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Below is an example on how to register a private blob container\n",
|
"Below is an example on how to register a private blob container\n",
|
||||||
@@ -325,17 +354,17 @@
|
|||||||
" client_secret = 'client-secret-of-service-principal'\n",
|
" client_secret = 'client-secret-of-service-principal'\n",
|
||||||
")\n",
|
")\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Load Training Data Using DataPrep"
|
"## Load Training Data Using DataPrep"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Automated ML takes a Dataflow as input.\n",
|
"Automated ML takes a Dataflow as input.\n",
|
||||||
@@ -350,13 +379,13 @@
|
|||||||
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n",
|
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
|
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import azureml.dataprep as dprep\n",
|
"import azureml.dataprep as dprep\n",
|
||||||
"from azureml.data.datapath import DataPath\n",
|
"from azureml.data.datapath import DataPath\n",
|
||||||
@@ -365,36 +394,36 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"X_train = dprep.read_csv(datastore.path('X.csv'))\n",
|
"X_train = dprep.read_csv(datastore.path('X.csv'))\n",
|
||||||
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Review the Data Preparation Result\n",
|
"## Review the Data Preparation Result\n",
|
||||||
"You can peek the result of a Dataflow at any range using skip(i) and head(j). Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets."
|
"You can peek the result of a Dataflow at any range using skip(i) and head(j). Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"X_train.get_profile()"
|
"X_train.get_profile()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"y_train.get_profile()"
|
"y_train.get_profile()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configure AutoML\n",
|
"## Configure AutoML\n",
|
||||||
@@ -416,13 +445,13 @@
|
|||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
"|**preprocess**|set this to True to enable pre-processing of data eg. string to numeric using one-hot encoding|\n",
|
"|**preprocess**|set this to True to enable pre-processing of data eg. string to numeric using one-hot encoding|\n",
|
||||||
"|**exit_score**|Target score for experiment. It is associated with the metric. eg. exit_score=0.995 will exit experiment after that|"
|
"|**exit_score**|Target score for experiment. It is associated with the metric. eg. exit_score=0.995 will exit experiment after that|"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -437,71 +466,71 @@
|
|||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train the Models\n",
|
"## Train the Models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while."
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Explore the Results"
|
"## Explore the Results"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Portal URL for Monitoring Runs\n",
|
"#### Portal URL for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The following will provide a link to the web interface to explore individual run details and status. In the future we might support output displayed in the notebook."
|
"The following will provide a link to the web interface to explore individual run details and status. In the future we might support output displayed in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"displayHTML(\"<a href={} target='_blank'>Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
|
"displayHTML(\"<a href={} target='_blank'>Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The following will show the child runs and waits for the parent run to complete."
|
"The following will show the child runs and waits for the parent run to complete."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve All Child Runs after the experiment is completed (in portal)\n",
|
"#### Retrieve All Child Runs after the experiment is completed (in portal)\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
"metricslist = {}\n",
|
"metricslist = {}\n",
|
||||||
@@ -513,81 +542,81 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
"rundata"
|
"rundata"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model after the above run is complete \n",
|
"### Retrieve the Best Model after the above run is complete \n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
"best_run, fitted_model = local_run.get_output()\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model Based on Any Other Metric after the above run is complete based on the child run\n",
|
"#### Best Model Based on Any Other Metric after the above run is complete based on the child run\n",
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"lookup_metric = \"log_loss\"\n",
|
"lookup_metric = \"log_loss\"\n",
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
||||||
"print(best_run)\n",
|
"print(best_run)\n",
|
||||||
"print(fitted_model)"
|
"print(fitted_model)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register the Fitted Model for Deployment\n",
|
"## Register the Fitted Model for Deployment\n",
|
||||||
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = local_run.register_model(description = description, tags = tags)\n",
|
"model = local_run.register_model(description = description, tags = tags)\n",
|
||||||
"local_run.model_id # This will be written to the scoring script file later in the notebook."
|
"local_run.model_id # This will be written to the scoring script file later in the notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Scoring Script\n",
|
"## Create Scoring Script\n",
|
||||||
"Replace model_id with name of model from output of above register cell"
|
"Replace model_id with name of model from output of above register cell"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -612,13 +641,13 @@
|
|||||||
" result = str(e)\n",
|
" result = str(e)\n",
|
||||||
" return json.dumps({\"error\": result})\n",
|
" return json.dumps({\"error\": result})\n",
|
||||||
" return json.dumps({\"result\":result.tolist()})"
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#Replace <<model_id>>\n",
|
"#Replace <<model_id>>\n",
|
||||||
"content = \"\"\n",
|
"content = \"\"\n",
|
||||||
@@ -628,20 +657,20 @@
|
|||||||
"new_content = content.replace(\"<<model_id>>\", local_run.model_id)\n",
|
"new_content = content.replace(\"<<model_id>>\", local_run.model_id)\n",
|
||||||
"with open(\"score.py\", \"w\") as fw:\n",
|
"with open(\"score.py\", \"w\") as fw:\n",
|
||||||
" fw.write(new_content)"
|
" fw.write(new_content)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Create a YAML File for the Environment"
|
"#### Create a YAML File for the Environment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -649,20 +678,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"conda_env_file_name = 'mydeployenv.yml'\n",
|
"conda_env_file_name = 'mydeployenv.yml'\n",
|
||||||
"myenv.save_to_file('.', conda_env_file_name)"
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Create ACI config"
|
"#### Create ACI config"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#deploy to ACI\n",
|
"#deploy to ACI\n",
|
||||||
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
||||||
@@ -672,21 +701,21 @@
|
|||||||
" memory_gb = 2, \n",
|
" memory_gb = 2, \n",
|
||||||
" tags = {'name':'Databricks Azure ML ACI'}, \n",
|
" tags = {'name':'Databricks Azure ML ACI'}, \n",
|
||||||
" description = 'This is for ADB and AutoML example.')"
|
" description = 'This is for ADB and AutoML example.')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy the Image as a Web Service on Azure Container Instance\n",
|
"## Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
"Replace servicename with any meaningful name of service"
|
"Replace servicename with any meaningful name of service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# this will take 10-15 minutes to finish\n",
|
"# this will take 10-15 minutes to finish\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -715,53 +744,53 @@
|
|||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"myservice.wait_for_deployment(show_output=True)"
|
"myservice.wait_for_deployment(show_output=True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"#for using the Web HTTP API \n",
|
"#for using the Web HTTP API \n",
|
||||||
"print(myservice.scoring_uri)"
|
"print(myservice.scoring_uri)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Test the Best Fitted Model\n",
|
"### Test the Best Fitted Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data - you can split the dataset beforehand & pass Train dataset to AutoML and use Test dataset to evaluate the best model."
|
"#### Load Test Data - you can split the dataset beforehand & pass Train dataset to AutoML and use Test dataset to evaluate the best model."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"blob_location = \"https://{}.blob.core.windows.net/{}\".format(account_name, container_name)\n",
|
"blob_location = \"https://{}.blob.core.windows.net/{}\".format(account_name, container_name)\n",
|
||||||
"X_test = pd.read_csv(\"{}./X_valid.csv\".format(blob_location), header=0)\n",
|
"X_test = pd.read_csv(\"{}./X_valid.csv\".format(blob_location), header=0)\n",
|
||||||
"y_test = pd.read_csv(\"{}/y_valid.csv\".format(blob_location), header=0)\n",
|
"y_test = pd.read_csv(\"{}/y_valid.csv\".format(blob_location), header=0)\n",
|
||||||
"images = pd.read_csv(\"{}/images.csv\".format(blob_location), header=None)\n",
|
"images = pd.read_csv(\"{}/images.csv\".format(blob_location), header=None)\n",
|
||||||
"images = np.reshape(images.values, (100,8,8))"
|
"images = np.reshape(images.values, (100,8,8))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing Our Best Fitted Model\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will try to predict digits and see how our model works. This is just an example to show you."
|
"We will try to predict digits and see how our model works. This is just an example to show you."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"# Randomly select digits and test.\n",
|
"# Randomly select digits and test.\n",
|
||||||
@@ -777,63 +806,34 @@
|
|||||||
" ax1.set_title(title)\n",
|
" ax1.set_title(title)\n",
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" display(fig)"
|
" display(fig)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"### Delete the service"
|
"### Delete the service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"myservice.delete()"
|
"myservice.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "sasum"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
},
|
|
||||||
"name": "auto-ml-classification-local-adb",
|
|
||||||
"notebookId": 2733885892129020
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
"nbformat_minor": 1
|
||||||
}
|
}
|
||||||
@@ -1,16 +0,0 @@
|
|||||||
# Using Databricks as a Compute Target from Azure Machine Learning Pipeline
|
|
||||||
To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. This notebook demonstrates the use of DatabricksStep in Azure Machine Learning Pipeline.
|
|
||||||
|
|
||||||
The notebook will show:
|
|
||||||
|
|
||||||
1. Running an arbitrary Databricks notebook that the customer has in Databricks workspace
|
|
||||||
2. Running an arbitrary Python script that the customer has in DBFS
|
|
||||||
3. Running an arbitrary Python script that is available on local computer (will upload to DBFS, and then run in Databricks)
|
|
||||||
4. Running a JAR job that the customer has in DBFS.
|
|
||||||
|
|
||||||
## Before you begin:
|
|
||||||
1. **Create an Azure Databricks workspace** in the same subscription where you have your Azure Machine Learning workspace.
|
|
||||||
You will need details of this workspace later on to define DatabricksStep. [More information](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.Databricks%2Fworkspaces).
|
|
||||||
2. **Create PAT (access token)** at the Azure Databricks portal. [More information](https://docs.databricks.com/api/latest/authentication.html#generate-a-token).
|
|
||||||
3. **Add demo notebook to ADB** This notebook has a sample you can use as is. Launch Azure Databricks attached to your Azure Machine Learning workspace and add a new notebook.
|
|
||||||
4. **Create/attach a Blob storage** for use from ADB
|
|
||||||
@@ -1,15 +1,39 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "diray"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.6.2"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Using Databricks as a Compute Target from Azure Machine Learning Pipeline\n",
|
"# Using Databricks as a Compute Target from Azure Machine Learning Pipeline\n",
|
||||||
@@ -27,18 +51,18 @@
|
|||||||
"2. **Create PAT (access token)**: Manually create a Databricks access token at the Azure Databricks portal. See [this](https://docs.databricks.com/api/latest/authentication.html#generate-a-token) for more information.\n",
|
"2. **Create PAT (access token)**: Manually create a Databricks access token at the Azure Databricks portal. See [this](https://docs.databricks.com/api/latest/authentication.html#generate-a-token) for more information.\n",
|
||||||
"3. **Add demo notebook to ADB**: This notebook has a sample you can use as is. Launch Azure Databricks attached to your Azure Machine Learning workspace and add a new notebook. \n",
|
"3. **Add demo notebook to ADB**: This notebook has a sample you can use as is. Launch Azure Databricks attached to your Azure Machine Learning workspace and add a new notebook. \n",
|
||||||
"4. **Create/attach a Blob storage** for use from ADB"
|
"4. **Create/attach a Blob storage** for use from ADB"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Add demo notebook to ADB Workspace\n",
|
"## Add demo notebook to ADB Workspace\n",
|
||||||
"Copy and paste the below code to create a new notebook in your ADB workspace."
|
"Copy and paste the below code to create a new notebook in your ADB workspace."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"```python\n",
|
"```python\n",
|
||||||
@@ -69,20 +93,20 @@
|
|||||||
"z = o + \"/output.txt\"\n",
|
"z = o + \"/output.txt\"\n",
|
||||||
"df2.write.csv(z)\n",
|
"df2.write.csv(z)\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Azure Machine Learning and Pipeline SDK-specific imports"
|
"## Azure Machine Learning and Pipeline SDK-specific imports"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
@@ -97,29 +121,29 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json"
|
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Attach Databricks compute target\n",
|
"## Attach Databricks compute target\n",
|
||||||
@@ -130,13 +154,13 @@
|
|||||||
"- **Databricks Access Token** - The access token you created in ADB\n",
|
"- **Databricks Access Token** - The access token you created in ADB\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**The Databricks workspace need to be present in the same subscription as your AML workspace**"
|
"**The Databricks workspace need to be present in the same subscription as your AML workspace**"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Replace with your account info before running.\n",
|
"# Replace with your account info before running.\n",
|
||||||
" \n",
|
" \n",
|
||||||
@@ -161,10 +185,10 @@
|
|||||||
" access_token= db_access_token)\n",
|
" access_token= db_access_token)\n",
|
||||||
" databricks_compute=ComputeTarget.attach(ws, db_compute_name, config)\n",
|
" databricks_compute=ComputeTarget.attach(ws, db_compute_name, config)\n",
|
||||||
" databricks_compute.wait_for_completion(True)\n"
|
" databricks_compute.wait_for_completion(True)\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data Connections with Inputs and Outputs\n",
|
"## Data Connections with Inputs and Outputs\n",
|
||||||
@@ -177,18 +201,18 @@
|
|||||||
"Databricks allows to interact with Azure Blob and ADLS in two ways.\n",
|
"Databricks allows to interact with Azure Blob and ADLS in two ways.\n",
|
||||||
"- **Direct Access**: Databricks allows you to interact with Azure Blob or ADLS URIs directly. The input or output URIs will be mapped to a Databricks widget param in the Databricks notebook.\n",
|
"- **Direct Access**: Databricks allows you to interact with Azure Blob or ADLS URIs directly. The input or output URIs will be mapped to a Databricks widget param in the Databricks notebook.\n",
|
||||||
"- **Mounting**: You will be supplied with additional parameters and secrets that will enable you to mount your ADLS or Azure Blob input or output location in your Databricks notebook."
|
"- **Mounting**: You will be supplied with additional parameters and secrets that will enable you to mount your ADLS or Azure Blob input or output location in your Databricks notebook."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Direct Access: Python sample code\n",
|
"#### Direct Access: Python sample code\n",
|
||||||
"If you have a data reference named \"input\" it will represent the URI of the input and you can access it directly in the Databricks python notebook like so:"
|
"If you have a data reference named \"input\" it will represent the URI of the input and you can access it directly in the Databricks python notebook like so:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"```python\n",
|
"```python\n",
|
||||||
@@ -196,18 +220,18 @@
|
|||||||
"y = getArgument(\"input\")\n",
|
"y = getArgument(\"input\")\n",
|
||||||
"df = spark.read.csv(y)\n",
|
"df = spark.read.csv(y)\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Mounting: Python sample code for Azure Blob\n",
|
"#### Mounting: Python sample code for Azure Blob\n",
|
||||||
"Given an Azure Blob data reference named \"input\" the following widget params will be made available in the Databricks notebook:"
|
"Given an Azure Blob data reference named \"input\" the following widget params will be made available in the Databricks notebook:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"```python\n",
|
"```python\n",
|
||||||
@@ -232,18 +256,18 @@
|
|||||||
" mount_point = \"/mnt/input\",\n",
|
" mount_point = \"/mnt/input\",\n",
|
||||||
" extra_configs = {myinput_blob_config:dbutils.secrets.get(scope = \"amlscope\", key = myinput_blob_secretname)})\n",
|
" extra_configs = {myinput_blob_config:dbutils.secrets.get(scope = \"amlscope\", key = myinput_blob_secretname)})\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Mounting: Python sample code for ADLS\n",
|
"#### Mounting: Python sample code for ADLS\n",
|
||||||
"Given an ADLS data reference named \"input\" the following widget params will be made available in the Databricks notebook:"
|
"Given an ADLS data reference named \"input\" the following widget params will be made available in the Databricks notebook:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"```python\n",
|
"```python\n",
|
||||||
@@ -276,21 +300,21 @@
|
|||||||
" mount_point = \"/mnt/output\",\n",
|
" mount_point = \"/mnt/output\",\n",
|
||||||
" extra_configs = configs)\n",
|
" extra_configs = configs)\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Use Databricks from Azure Machine Learning Pipeline\n",
|
"## Use Databricks from Azure Machine Learning Pipeline\n",
|
||||||
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. Let's define a datasource (via DataReference) and intermediate data (via PipelineData) to be used in DatabricksStep."
|
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. Let's define a datasource (via DataReference) and intermediate data (via PipelineData) to be used in DatabricksStep."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Use the default blob storage\n",
|
"# Use the default blob storage\n",
|
||||||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||||
@@ -303,10 +327,10 @@
|
|||||||
" data_reference_name=\"input\")\n",
|
" data_reference_name=\"input\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"step_1_output = PipelineData(\"output\", datastore=def_blob_store)"
|
"step_1_output = PipelineData(\"output\", datastore=def_blob_store)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Add a DatabricksStep\n",
|
"### Add a DatabricksStep\n",
|
||||||
@@ -390,21 +414,21 @@
|
|||||||
"runconfig = RunConfiguration()\n",
|
"runconfig = RunConfiguration()\n",
|
||||||
"runconfig.load(path='<directory_where_runconfig_is_stored>', name='<runconfig_file_name>')\n",
|
"runconfig.load(path='<directory_where_runconfig_is_stored>', name='<runconfig_file_name>')\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### 1. Running the demo notebook already added to the Databricks workspace\n",
|
"### 1. Running the demo notebook already added to the Databricks workspace\n",
|
||||||
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:"
|
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"notebook_path=os.getenv(\"DATABRICKS_NOTEBOOK_PATH\", \"<my-databricks-notebook-path>\") # Databricks notebook path\n",
|
"notebook_path=os.getenv(\"DATABRICKS_NOTEBOOK_PATH\", \"<my-databricks-notebook-path>\") # Databricks notebook path\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -419,46 +443,46 @@
|
|||||||
" compute_target=databricks_compute,\n",
|
" compute_target=databricks_compute,\n",
|
||||||
" allow_reuse=True\n",
|
" allow_reuse=True\n",
|
||||||
")"
|
")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Build and submit the Experiment"
|
"#### Build and submit the Experiment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"steps = [dbNbStep]\n",
|
"steps = [dbNbStep]\n",
|
||||||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||||||
"pipeline_run = Experiment(ws, 'DB_Notebook_demo').submit(pipeline)\n",
|
"pipeline_run = Experiment(ws, 'DB_Notebook_demo').submit(pipeline)\n",
|
||||||
"pipeline_run.wait_for_completion()"
|
"pipeline_run.wait_for_completion()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View Run Details"
|
"#### View Run Details"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(pipeline_run).show()"
|
"RunDetails(pipeline_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### 2. Running a Python script from DBFS\n",
|
"### 2. Running a Python script from DBFS\n",
|
||||||
@@ -471,13 +495,13 @@
|
|||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The code in the below cell assumes that you have completed the previous step of uploading the script `train-db-dbfs.py` to the root folder in DBFS."
|
"The code in the below cell assumes that you have completed the previous step of uploading the script `train-db-dbfs.py` to the root folder in DBFS."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"python_script_path = os.getenv(\"DATABRICKS_PYTHON_SCRIPT_PATH\", \"<my-databricks-python-script-path>\") # Databricks python script path\n",
|
"python_script_path = os.getenv(\"DATABRICKS_PYTHON_SCRIPT_PATH\", \"<my-databricks-python-script-path>\") # Databricks python script path\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -491,46 +515,46 @@
|
|||||||
" compute_target=databricks_compute,\n",
|
" compute_target=databricks_compute,\n",
|
||||||
" allow_reuse=True\n",
|
" allow_reuse=True\n",
|
||||||
")"
|
")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Build and submit the Experiment"
|
"#### Build and submit the Experiment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"steps = [dbPythonInDbfsStep]\n",
|
"steps = [dbPythonInDbfsStep]\n",
|
||||||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||||||
"pipeline_run = Experiment(ws, 'DB_Python_demo').submit(pipeline)\n",
|
"pipeline_run = Experiment(ws, 'DB_Python_demo').submit(pipeline)\n",
|
||||||
"pipeline_run.wait_for_completion()"
|
"pipeline_run.wait_for_completion()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View Run Details"
|
"#### View Run Details"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(pipeline_run).show()"
|
"RunDetails(pipeline_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### 3. Running a Python script in Databricks that currenlty is in local computer\n",
|
"### 3. Running a Python script in Databricks that currenlty is in local computer\n",
|
||||||
@@ -539,13 +563,13 @@
|
|||||||
"The commented out code below code assumes that you have `train-db-local.py` in the `scripts` subdirectory under the current working directory.\n",
|
"The commented out code below code assumes that you have `train-db-local.py` in the `scripts` subdirectory under the current working directory.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this case, the Python script will be uploaded first to DBFS, and then the script will be run in Databricks."
|
"In this case, the Python script will be uploaded first to DBFS, and then the script will be run in Databricks."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"python_script_name = \"train-db-local.py\"\n",
|
"python_script_name = \"train-db-local.py\"\n",
|
||||||
"source_directory = \".\"\n",
|
"source_directory = \".\"\n",
|
||||||
@@ -560,46 +584,46 @@
|
|||||||
" compute_target=databricks_compute,\n",
|
" compute_target=databricks_compute,\n",
|
||||||
" allow_reuse=True\n",
|
" allow_reuse=True\n",
|
||||||
")"
|
")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Build and submit the Experiment"
|
"#### Build and submit the Experiment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"steps = [dbPythonInLocalMachineStep]\n",
|
"steps = [dbPythonInLocalMachineStep]\n",
|
||||||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||||||
"pipeline_run = Experiment(ws, 'DB_Python_Local_demo').submit(pipeline)\n",
|
"pipeline_run = Experiment(ws, 'DB_Python_Local_demo').submit(pipeline)\n",
|
||||||
"pipeline_run.wait_for_completion()"
|
"pipeline_run.wait_for_completion()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View Run Details"
|
"#### View Run Details"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(pipeline_run).show()"
|
"RunDetails(pipeline_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### 4. Running a JAR job that is alreay added in DBFS\n",
|
"### 4. Running a JAR job that is alreay added in DBFS\n",
|
||||||
@@ -610,13 +634,13 @@
|
|||||||
"```\n",
|
"```\n",
|
||||||
"dbfs cp ./train-db-dbfs.jar dbfs:/train-db-dbfs.jar\n",
|
"dbfs cp ./train-db-dbfs.jar dbfs:/train-db-dbfs.jar\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"main_jar_class_name = \"com.microsoft.aeva.Main\"\n",
|
"main_jar_class_name = \"com.microsoft.aeva.Main\"\n",
|
||||||
"jar_library_dbfs_path = os.getenv(\"DATABRICKS_JAR_LIB_PATH\", \"<my-databricks-jar-lib-path>\") # Databricks jar library path\n",
|
"jar_library_dbfs_path = os.getenv(\"DATABRICKS_JAR_LIB_PATH\", \"<my-databricks-jar-lib-path>\") # Databricks jar library path\n",
|
||||||
@@ -632,84 +656,60 @@
|
|||||||
" compute_target=databricks_compute,\n",
|
" compute_target=databricks_compute,\n",
|
||||||
" allow_reuse=True\n",
|
" allow_reuse=True\n",
|
||||||
")"
|
")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Build and submit the Experiment"
|
"#### Build and submit the Experiment"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"steps = [dbJarInDbfsStep]\n",
|
"steps = [dbJarInDbfsStep]\n",
|
||||||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||||||
"pipeline_run = Experiment(ws, 'DB_JAR_demo').submit(pipeline)\n",
|
"pipeline_run = Experiment(ws, 'DB_JAR_demo').submit(pipeline)\n",
|
||||||
"pipeline_run.wait_for_completion()"
|
"pipeline_run.wait_for_completion()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### View Run Details"
|
"#### View Run Details"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.widgets import RunDetails\n",
|
"from azureml.widgets import RunDetails\n",
|
||||||
"RunDetails(pipeline_run).show()"
|
"RunDetails(pipeline_run).show()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Next: ADLA as a Compute Target\n",
|
"# Next: ADLA as a Compute Target\n",
|
||||||
"To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](./aml-pipelines-use-adla-as-compute-target.ipynb) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline."
|
"To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](./aml-pipelines-use-adla-as-compute-target.ipynb) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "diray"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.2"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,55 +0,0 @@
|
|||||||
**Azure HDInsight**
|
|
||||||
|
|
||||||
Azure HDInsight is a fully managed cloud Hadoop & Spark offering the gives
|
|
||||||
optimized open-source analytic clusters for Spark, Hive, MapReduce, HBase,
|
|
||||||
Storm, and Kafka. HDInsight Spark clusters provide kernels that you can use with
|
|
||||||
the Jupyter notebook on [Apache Spark](https://spark.apache.org/) for testing
|
|
||||||
your applications.
|
|
||||||
|
|
||||||
How Azure HDInsight works with Azure Machine Learning service
|
|
||||||
|
|
||||||
- You can train a model using Spark clusters and deploy the model to ACI/AKS
|
|
||||||
from within Azure HDInsight.
|
|
||||||
|
|
||||||
- You can also use [automated machine
|
|
||||||
learning](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml) capabilities
|
|
||||||
integrated within Azure HDInsight.
|
|
||||||
|
|
||||||
You can use Azure HDInsight as a compute target from an [Azure Machine Learning
|
|
||||||
pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines).
|
|
||||||
|
|
||||||
**Set up your HDInsight cluster**
|
|
||||||
|
|
||||||
Create [HDInsight
|
|
||||||
cluster](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters)
|
|
||||||
|
|
||||||
**Quick create: Basic cluster setup**
|
|
||||||
|
|
||||||
This article walks you through setup in the [Azure
|
|
||||||
portal](https://portal.azure.com/), where you can create an HDInsight cluster
|
|
||||||
using *Quick create* or *Custom*.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Follow instructions on the screen to do a basic cluster setup. Details are
|
|
||||||
provided below for:
|
|
||||||
|
|
||||||
- [Resource group
|
|
||||||
name](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters#resource-group-name)
|
|
||||||
|
|
||||||
- [Cluster types and
|
|
||||||
configuration](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters#cluster-types)
|
|
||||||
(Cluster must be Spark 2.3 (HDI 3.6) or greater)
|
|
||||||
|
|
||||||
- Cluster login and SSH username
|
|
||||||
|
|
||||||
- [Location](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters#location)
|
|
||||||
|
|
||||||
**Import the sample HDI notebook in Jupyter**
|
|
||||||
|
|
||||||
**Important links:**
|
|
||||||
|
|
||||||
Create HDI cluster:
|
|
||||||
<https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters>
|
|
||||||
|
|
||||||

|
|
||||||
@@ -1,612 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Automated ML on Azure HDInsight\n",
|
|
||||||
"\n",
|
|
||||||
"In this example we use the scikit-learn's <a href=\"http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset\" target=\"_blank\">digit dataset</a> to showcase how you can use AutoML for a simple classification problem.\n",
|
|
||||||
"\n",
|
|
||||||
"In this notebook you will learn how to:\n",
|
|
||||||
"1. Create Azure Machine Learning Workspace object and initialize your notebook directory to easily reload this object from a configuration file.\n",
|
|
||||||
"2. Create an `Experiment` in an existing `Workspace`.\n",
|
|
||||||
"3. Configure Automated ML using `AutoMLConfig`.\n",
|
|
||||||
"4. Train the model using Azure HDInsight.\n",
|
|
||||||
"5. Explore the results.\n",
|
|
||||||
"6. Test the best fitted model.\n",
|
|
||||||
"\n",
|
|
||||||
"Before running this notebook, please follow the readme for using Automated ML on Azure HDI for installing necessary libraries to your cluster."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Check the Azure ML Core SDK Version to Validate Your Installation"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import azureml.core\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
|
||||||
"from azureml.core.experiment import Experiment\n",
|
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
|
||||||
"from azureml.train.automl.run import AutoMLRun\n",
|
|
||||||
"import logging\n",
|
|
||||||
"\n",
|
|
||||||
"print(\"SDK Version:\", azureml.core.VERSION)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Initialize an Azure ML Workspace\n",
|
|
||||||
"### What is an Azure ML Workspace and Why Do I Need One?\n",
|
|
||||||
"\n",
|
|
||||||
"An Azure ML workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"### What do I Need?\n",
|
|
||||||
"\n",
|
|
||||||
"To create or access an Azure ML workspace, you will need to import the Azure ML library and specify following information:\n",
|
|
||||||
"* A name for your workspace. You can choose one.\n",
|
|
||||||
"* Your subscription id. Use the `id` value from the `az account show` command output above.\n",
|
|
||||||
"* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n",
|
|
||||||
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import azureml.core\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
|
||||||
"from azureml.core.experiment import Experiment\n",
|
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
|
||||||
"from azureml.train.automl.run import AutoMLRun\n",
|
|
||||||
"import logging\n",
|
|
||||||
"\n",
|
|
||||||
"subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
|
||||||
"resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
|
||||||
"workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
|
||||||
"workspace_region = \"<azureregion>\" #your region\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"tenant_id = \"<tenant_id>\"\n",
|
|
||||||
"app_id = \"<app_id>\"\n",
|
|
||||||
"app_key = \"<app_key>\"\n",
|
|
||||||
"\n",
|
|
||||||
"auth_sp = ServicePrincipalAuthentication(tenant_id = tenant_id,\n",
|
|
||||||
" service_principal_id = app_id,\n",
|
|
||||||
" service_principal_password = app_key)\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Creating a Workspace\n",
|
|
||||||
"If you already have access to an Azure ML workspace you want to use, you can skip this cell. Otherwise, this cell will create an Azure ML workspace for you in the specified subscription, provided you have the correct permissions for the given `subscription_id`.\n",
|
|
||||||
"\n",
|
|
||||||
"This will fail when:\n",
|
|
||||||
"1. The workspace already exists.\n",
|
|
||||||
"2. You do not have permission to create a workspace in the resource group.\n",
|
|
||||||
"3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription.\n",
|
|
||||||
"\n",
|
|
||||||
"If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n",
|
|
||||||
"\n",
|
|
||||||
"**Note:** Creation of a new workspace can take several minutes."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Configuring Your Local Environment\n",
|
|
||||||
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"ws = Workspace(workspace_name = workspace_name,\n",
|
|
||||||
" subscription_id = subscription_id,\n",
|
|
||||||
" resource_group = resource_group,\n",
|
|
||||||
" auth = auth_sp)\n",
|
|
||||||
"\n",
|
|
||||||
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
|
||||||
"ws.write_config()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Create a Folder to Host Sample Projects\n",
|
|
||||||
"Finally, create a folder where all the sample projects will be hosted."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import os\n",
|
|
||||||
"\n",
|
|
||||||
"sample_projects_folder = './sample_projects'\n",
|
|
||||||
"\n",
|
|
||||||
"if not os.path.isdir(sample_projects_folder):\n",
|
|
||||||
" os.mkdir(sample_projects_folder)\n",
|
|
||||||
" \n",
|
|
||||||
"print('Sample projects will be created in {}.'.format(sample_projects_folder))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Create an Experiment\n",
|
|
||||||
"\n",
|
|
||||||
"As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import logging\n",
|
|
||||||
"import os\n",
|
|
||||||
"import random\n",
|
|
||||||
"import time\n",
|
|
||||||
"\n",
|
|
||||||
"from matplotlib import pyplot as plt\n",
|
|
||||||
"from matplotlib.pyplot import imshow\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"\n",
|
|
||||||
"import azureml.core\n",
|
|
||||||
"from azureml.core.experiment import Experiment\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
|
||||||
"from azureml.train.automl.run import AutoMLRun"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Choose a name for the experiment and specify the project folder.\n",
|
|
||||||
"experiment_name = 'automl-local-classification-hdi'\n",
|
|
||||||
"project_folder = './sample_projects/automl-local-classification-hdi'\n",
|
|
||||||
"\n",
|
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
|
||||||
"\n",
|
|
||||||
"output = {}\n",
|
|
||||||
"output['SDK version'] = azureml.core.VERSION\n",
|
|
||||||
"output['Subscription ID'] = ws.subscription_id\n",
|
|
||||||
"output['Workspace Name'] = ws.name\n",
|
|
||||||
"output['Resource Group'] = ws.resource_group\n",
|
|
||||||
"output['Location'] = ws.location\n",
|
|
||||||
"output['Project Directory'] = project_folder\n",
|
|
||||||
"output['Experiment Name'] = experiment.name\n",
|
|
||||||
"pd.set_option('display.max_colwidth', -1)\n",
|
|
||||||
"pd.DataFrame(data = output, index = ['']).T"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Diagnostics\n",
|
|
||||||
"\n",
|
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
|
||||||
"set_diagnostics_collection(send_diagnostics = True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Registering Datastore"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Datastore is the way to save connection information to a storage service (e.g. Azure Blob, Azure Data Lake, Azure SQL) information to your workspace so you can access them without exposing credentials in your code. The first thing you will need to do is register a datastore, you can refer to our [python SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) on how to register datastores. __Note: for best security practices, please do not check in code that contains registering datastores with secrets into your source control__\n",
|
|
||||||
"\n",
|
|
||||||
"The code below registers a datastore pointing to a publicly readable blob container."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core import Datastore\n",
|
|
||||||
"\n",
|
|
||||||
"datastore_name = 'demo_training'\n",
|
|
||||||
"container_name = 'digits' \n",
|
|
||||||
"account_name = 'automlpublicdatasets'\n",
|
|
||||||
"Datastore.register_azure_blob_container(\n",
|
|
||||||
" workspace = ws, \n",
|
|
||||||
" datastore_name = datastore_name, \n",
|
|
||||||
" container_name = container_name, \n",
|
|
||||||
" account_name = account_name,\n",
|
|
||||||
" overwrite = True\n",
|
|
||||||
")"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Below is an example on how to register a private blob container\n",
|
|
||||||
"```python\n",
|
|
||||||
"datastore = Datastore.register_azure_blob_container(\n",
|
|
||||||
" workspace = ws, \n",
|
|
||||||
" datastore_name = 'example_datastore', \n",
|
|
||||||
" container_name = 'example-container', \n",
|
|
||||||
" account_name = 'storageaccount',\n",
|
|
||||||
" account_key = 'accountkey'\n",
|
|
||||||
")\n",
|
|
||||||
"```\n",
|
|
||||||
"The example below shows how to register an Azure Data Lake store. Please make sure you have granted the necessary permissions for the service principal to access the data lake.\n",
|
|
||||||
"```python\n",
|
|
||||||
"datastore = Datastore.register_azure_data_lake(\n",
|
|
||||||
" workspace = ws,\n",
|
|
||||||
" datastore_name = 'example_datastore',\n",
|
|
||||||
" store_name = 'adlsstore',\n",
|
|
||||||
" tenant_id = 'tenant-id-of-service-principal',\n",
|
|
||||||
" client_id = 'client-id-of-service-principal',\n",
|
|
||||||
" client_secret = 'client-secret-of-service-principal'\n",
|
|
||||||
")\n",
|
|
||||||
"```"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Load Training Data Using DataPrep"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Automated ML takes a Dataflow as input.\n",
|
|
||||||
"\n",
|
|
||||||
"If you are familiar with Pandas and have done your data preparation work in Pandas already, you can use the `read_pandas_dataframe` method in dprep to convert the DataFrame to a Dataflow.\n",
|
|
||||||
"```python\n",
|
|
||||||
"df = pd.read_csv(...)\n",
|
|
||||||
"# apply some transforms\n",
|
|
||||||
"dprep.read_pandas_dataframe(df, temp_folder='/path/accessible/by/both/driver/and/worker')\n",
|
|
||||||
"```\n",
|
|
||||||
"\n",
|
|
||||||
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n",
|
|
||||||
"\n",
|
|
||||||
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import azureml.dataprep as dprep\n",
|
|
||||||
"from azureml.data.datapath import DataPath\n",
|
|
||||||
"\n",
|
|
||||||
"datastore = Datastore.get(workspace = ws, datastore_name = datastore_name)\n",
|
|
||||||
"\n",
|
|
||||||
"X_train = dprep.read_csv(datastore.path('X.csv'))\n",
|
|
||||||
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Review the Data Preparation Result\n",
|
|
||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"X_train.get_profile()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"y_train.get_profile()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Configure AutoML\n",
|
|
||||||
"\n",
|
|
||||||
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
|
||||||
"\n",
|
|
||||||
"|Property|Description|\n",
|
|
||||||
"|-|-|\n",
|
|
||||||
"|**task**|classification or regression|\n",
|
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
|
||||||
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
|
||||||
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
|
||||||
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
|
||||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
|
||||||
"|**spark_context**|Spark Context object. for HDInsight, use spark_context=sc|\n",
|
|
||||||
"|**max_concurrent_iterations**|Maximum number of iterations to execute in parallel. This should be <= number of worker nodes in your Azure HDInsight cluster.|\n",
|
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
|
||||||
"|**preprocess**|set this to True to enable pre-processing of data eg. string to numeric using one-hot encoding|\n",
|
|
||||||
"|**exit_score**|Target score for experiment. It is associated with the metric. eg. exit_score=0.995 will exit experiment after that|"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
|
||||||
" debug_log = 'automl_errors.log',\n",
|
|
||||||
" primary_metric = 'AUC_weighted',\n",
|
|
||||||
" iteration_timeout_minutes = 10,\n",
|
|
||||||
" iterations = 3,\n",
|
|
||||||
" preprocess = True,\n",
|
|
||||||
" n_cross_validations = 10,\n",
|
|
||||||
" max_concurrent_iterations = 2, #change it based on number of worker nodes\n",
|
|
||||||
" verbosity = logging.INFO,\n",
|
|
||||||
" spark_context=sc, #HDI /spark related\n",
|
|
||||||
" X = X_train, \n",
|
|
||||||
" y = y_train,\n",
|
|
||||||
" path = project_folder)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Train the Models\n",
|
|
||||||
"\n",
|
|
||||||
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Explore the Results"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"The following will show the child runs and waits for the parent run to complete."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Retrieve All Child Runs after the experiment is completed (in portal)\n",
|
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"children = list(local_run.get_children())\n",
|
|
||||||
"metricslist = {}\n",
|
|
||||||
"for run in children:\n",
|
|
||||||
" properties = run.get_properties()\n",
|
|
||||||
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n",
|
|
||||||
" metricslist[int(properties['iteration'])] = metrics\n",
|
|
||||||
"\n",
|
|
||||||
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
|
||||||
"rundata"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Retrieve the Best Model after the above run is complete \n",
|
|
||||||
"\n",
|
|
||||||
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"best_run, fitted_model = local_run.get_output()\n",
|
|
||||||
"print(best_run)\n",
|
|
||||||
"print(fitted_model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Best Model Based on Any Other Metric after the above run is complete based on the child run\n",
|
|
||||||
"Show the run and the model that has the smallest `log_loss` value:"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"lookup_metric = \"log_loss\"\n",
|
|
||||||
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
|
|
||||||
"print(best_run)\n",
|
|
||||||
"print(fitted_model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Test the Best Fitted Model\n",
|
|
||||||
"\n",
|
|
||||||
"#### Load Test Data - you can split the dataset beforehand & pass Train dataset to AutoML and use Test dataset to evaluate the best model."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"blob_location = \"https://{}.blob.core.windows.net/{}\".format(account_name, container_name)\n",
|
|
||||||
"X_test = pd.read_csv(\"{}./X_valid.csv\".format(blob_location), header=0)\n",
|
|
||||||
"y_test = pd.read_csv(\"{}/y_valid.csv\".format(blob_location), header=0)\n",
|
|
||||||
"images = pd.read_csv(\"{}/images.csv\".format(blob_location), header=None)\n",
|
|
||||||
"images = np.reshape(images.values, (100,8,8))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Testing Our Best Fitted Model\n",
|
|
||||||
"We will try to predict digits and see how our model works. This is just an example to show you."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Randomly select digits and test.\n",
|
|
||||||
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
|
||||||
" print(index)\n",
|
|
||||||
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
|
||||||
" label = y_test.values[index]\n",
|
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
|
||||||
" fig = plt.figure(3, figsize = (5,5))\n",
|
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
|
||||||
" ax1.set_title(title)\n",
|
|
||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
|
||||||
" display(fig)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"When deploying an automated ML trained model, please specify _pippackages=['azureml-sdk[automl]']_ in your CondaDependencies.\n",
|
|
||||||
"\n",
|
|
||||||
"Please refer to only the **Deploy** section in this notebook - <a href=\"https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment\" target=\"_blank\">Deployment of Automated ML trained model</a>"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "savitam"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "sasum"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "Python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "python",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "pyspark3",
|
|
||||||
"pygments_lexer": "python3"
|
|
||||||
},
|
|
||||||
"name": "auto-ml-classification-local-adb",
|
|
||||||
"notebookId": 587284549713154
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 1
|
|
||||||
}
|
|
||||||
@@ -1,709 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Track Data Drift between Training and Inference Data in Production \n",
|
|
||||||
"\n",
|
|
||||||
"With this notebook, you will learn how to enable the DataDrift service to automatically track and determine whether your inference data is drifting from the data your model was initially trained on. The DataDrift service provides metrics and visualizations to help stakeholders identify which specific features cause the concept drift to occur.\n",
|
|
||||||
"\n",
|
|
||||||
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
|
|
||||||
"\n",
|
|
||||||
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Prerequisites and Setup"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Install the DataDrift package\n",
|
|
||||||
"\n",
|
|
||||||
"Install the azureml-contrib-datadrift, azureml-contrib-opendatasets and lightgbm packages before running this notebook.\n",
|
|
||||||
"```\n",
|
|
||||||
"pip install azureml-contrib-datadrift\n",
|
|
||||||
"pip install azureml-contrib-datasets\n",
|
|
||||||
"pip install lightgbm\n",
|
|
||||||
"```"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Import Dependencies"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import json\n",
|
|
||||||
"import os\n",
|
|
||||||
"import time\n",
|
|
||||||
"from datetime import datetime, timedelta\n",
|
|
||||||
"\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"import requests\n",
|
|
||||||
"from azureml.contrib.datadrift import DataDriftDetector, AlertConfiguration\n",
|
|
||||||
"from azureml.contrib.opendatasets import NoaaIsdWeather\n",
|
|
||||||
"from azureml.core import Dataset, Workspace, Run\n",
|
|
||||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
|
||||||
"from azureml.core.experiment import Experiment\n",
|
|
||||||
"from azureml.core.image import ContainerImage\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
|
||||||
"from azureml.widgets import RunDetails\n",
|
|
||||||
"from sklearn.externals import joblib\n",
|
|
||||||
"from sklearn.model_selection import train_test_split\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Set up Configuraton and Create Azure ML Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
|
|
||||||
"prefix = \"dd\"\n",
|
|
||||||
"\n",
|
|
||||||
"# NOTE: Please do not change the model_name, as it's required by the score.py file\n",
|
|
||||||
"model_name = \"driftmodel\"\n",
|
|
||||||
"image_name = \"{}driftimage\".format(prefix)\n",
|
|
||||||
"service_name = \"{}driftservice\".format(prefix)\n",
|
|
||||||
"\n",
|
|
||||||
"# optionally, set email address to receive an email alert for DataDrift\n",
|
|
||||||
"email_address = \"\""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ws = Workspace.from_config()\n",
|
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Generate Train/Testing Data\n",
|
|
||||||
"\n",
|
|
||||||
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
|
|
||||||
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
|
|
||||||
" '725486', '727883', '723177', '722075', '723086', '724053',\n",
|
|
||||||
" '725070', '722073', '726060', '725224', '725260', '724520',\n",
|
|
||||||
" '720305', '724020', '726510', '725126', '722523', '703333',\n",
|
|
||||||
" '722249', '722728', '725483', '722972', '724975', '742079',\n",
|
|
||||||
" '727468', '722193', '725624', '722030', '726380', '720309',\n",
|
|
||||||
" '722071', '720326', '725415', '724504', '725665', '725424',\n",
|
|
||||||
" '725066']\n",
|
|
||||||
"\n",
|
|
||||||
"columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n",
|
|
||||||
"\n",
|
|
||||||
"def enrich_weather_noaa_data(noaa_df):\n",
|
|
||||||
" hours_in_day = 23\n",
|
|
||||||
" week_in_year = 52\n",
|
|
||||||
" \n",
|
|
||||||
"\n",
|
|
||||||
" noaa_df = noaa_df.assign(hour=noaa_df[\"datetime\"].dt.hour,\n",
|
|
||||||
" weekofyear=noaa_df[\"datetime\"].dt.week,\n",
|
|
||||||
" sine_weekofyear=noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year)),\n",
|
|
||||||
" cosine_weekofyear=noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year)),\n",
|
|
||||||
" sine_hourofday=noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day)),\n",
|
|
||||||
" cosine_hourofday=noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
|
|
||||||
" )\n",
|
|
||||||
" \n",
|
|
||||||
" return noaa_df\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"def add_window_col(input_df):\n",
|
|
||||||
" shift_interval = pd.Timedelta('-7 days') # your X days interval\n",
|
|
||||||
" df_shifted = input_df.copy()\n",
|
|
||||||
" df_shifted.loc[:,'datetime'] = df_shifted['datetime'] - shift_interval\n",
|
|
||||||
" df_shifted.drop(list(input_df.columns.difference(['datetime', 'usaf', 'wban', 'sine_hourofday', 'temperature'])), axis=1, inplace=True)\n",
|
|
||||||
"\n",
|
|
||||||
" # merge, keeping only observations where -1 lag is present\n",
|
|
||||||
" df2 = pd.merge(input_df,\n",
|
|
||||||
" df_shifted,\n",
|
|
||||||
" on=['datetime', 'usaf', 'wban', 'sine_hourofday'],\n",
|
|
||||||
" how='inner', # use 'left' to keep observations without lags\n",
|
|
||||||
" suffixes=['', '-7'])\n",
|
|
||||||
" return df2\n",
|
|
||||||
"\n",
|
|
||||||
"def get_noaa_data(start_time, end_time, cols, station_list):\n",
|
|
||||||
" isd = NoaaIsdWeather(start_time, end_time, cols=cols)\n",
|
|
||||||
" # Read into Pandas data frame.\n",
|
|
||||||
" noaa_df = isd.to_pandas_dataframe()\n",
|
|
||||||
" noaa_df = noaa_df.rename(columns={\"stationName\": \"station_name\"})\n",
|
|
||||||
" \n",
|
|
||||||
" df_filtered = noaa_df[noaa_df[\"usaf\"].isin(station_list)]\n",
|
|
||||||
" df_filtered.reset_index(drop=True)\n",
|
|
||||||
" \n",
|
|
||||||
" # Enrich with time features\n",
|
|
||||||
" df_enriched = enrich_weather_noaa_data(df_filtered)\n",
|
|
||||||
" \n",
|
|
||||||
" return df_enriched\n",
|
|
||||||
"\n",
|
|
||||||
"def get_featurized_noaa_df(start_time, end_time, cols, station_list):\n",
|
|
||||||
" df_1 = get_noaa_data(start_time - timedelta(days=7), start_time - timedelta(seconds=1), cols, station_list)\n",
|
|
||||||
" df_2 = get_noaa_data(start_time, end_time, cols, station_list)\n",
|
|
||||||
" noaa_df = pd.concat([df_1, df_2])\n",
|
|
||||||
" \n",
|
|
||||||
" print(\"Adding window feature\")\n",
|
|
||||||
" df_window = add_window_col(noaa_df)\n",
|
|
||||||
" \n",
|
|
||||||
" cat_columns = df_window.dtypes == object\n",
|
|
||||||
" cat_columns = cat_columns[cat_columns == True]\n",
|
|
||||||
" \n",
|
|
||||||
" print(\"Encoding categorical columns\")\n",
|
|
||||||
" df_encoded = pd.get_dummies(df_window, columns=cat_columns.keys().tolist())\n",
|
|
||||||
" \n",
|
|
||||||
" print(\"Dropping unnecessary columns\")\n",
|
|
||||||
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
|
|
||||||
" \n",
|
|
||||||
" return df_featurized"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Train model on Jan 1 - 14, 2009 data\n",
|
|
||||||
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
|
|
||||||
"df.head()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"label = \"temperature\"\n",
|
|
||||||
"x_df = df.drop(label, axis=1)\n",
|
|
||||||
"y_df = df[[label]]\n",
|
|
||||||
"x_train, x_test, y_train, y_test = train_test_split(df, y_df, test_size=0.2, random_state=223)\n",
|
|
||||||
"print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)\n",
|
|
||||||
"\n",
|
|
||||||
"training_dir = 'outputs/training'\n",
|
|
||||||
"training_file = \"training.csv\"\n",
|
|
||||||
"\n",
|
|
||||||
"# Generate training dataframe to register as Training Dataset\n",
|
|
||||||
"os.makedirs(training_dir, exist_ok=True)\n",
|
|
||||||
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
|
|
||||||
"training_df.to_csv(training_dir + \"/\" + training_file)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Create/Register Training Dataset"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"dataset_name = \"dataset\"\n",
|
|
||||||
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
|
|
||||||
"snapshot_name = \"snapshot-{}\".format(name_suffix)\n",
|
|
||||||
"\n",
|
|
||||||
"dstore = ws.get_default_datastore()\n",
|
|
||||||
"dstore.upload(training_dir, \"data/training\", show_progress=True)\n",
|
|
||||||
"dpath = dstore.path(\"data/training/training.csv\")\n",
|
|
||||||
"trainingDataset = Dataset.auto_read_files(dpath, include_path=True)\n",
|
|
||||||
"trainingDataset = trainingDataset.register(workspace=ws, name=dataset_name, description=\"dset\", exist_ok=True)\n",
|
|
||||||
"\n",
|
|
||||||
"trainingDataSnapshot = trainingDataset.create_snapshot(snapshot_name=snapshot_name, compute_target=None, create_data_snapshot=True)\n",
|
|
||||||
"datasets = [(Dataset.Scenario.TRAINING, trainingDataSnapshot)]\n",
|
|
||||||
"print(\"dataset registration done.\\n\")\n",
|
|
||||||
"datasets"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Train and Save Model"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import lightgbm as lgb\n",
|
|
||||||
"\n",
|
|
||||||
"train = lgb.Dataset(data=x_train, \n",
|
|
||||||
" label=y_train)\n",
|
|
||||||
"\n",
|
|
||||||
"test = lgb.Dataset(data=x_test, \n",
|
|
||||||
" label=y_test,\n",
|
|
||||||
" reference=train)\n",
|
|
||||||
"\n",
|
|
||||||
"params = {'learning_rate' : 0.1,\n",
|
|
||||||
" 'boosting' : 'gbdt',\n",
|
|
||||||
" 'metric' : 'rmse',\n",
|
|
||||||
" 'feature_fraction' : 1,\n",
|
|
||||||
" 'bagging_fraction' : 1,\n",
|
|
||||||
" 'max_depth': 6,\n",
|
|
||||||
" 'num_leaves' : 31,\n",
|
|
||||||
" 'objective' : 'regression',\n",
|
|
||||||
" 'bagging_freq' : 1,\n",
|
|
||||||
" \"verbose\": -1,\n",
|
|
||||||
" 'min_data_per_leaf': 100}\n",
|
|
||||||
"\n",
|
|
||||||
"model = lgb.train(params, \n",
|
|
||||||
" num_boost_round=500,\n",
|
|
||||||
" train_set=train,\n",
|
|
||||||
" valid_sets=[train, test],\n",
|
|
||||||
" verbose_eval=50,\n",
|
|
||||||
" early_stopping_rounds=25)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
|
|
||||||
"\n",
|
|
||||||
"os.makedirs('outputs', exist_ok=True)\n",
|
|
||||||
"joblib.dump(model, model_file)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Register Model"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"model = Model.register(model_path=model_file,\n",
|
|
||||||
" model_name=model_name,\n",
|
|
||||||
" workspace=ws,\n",
|
|
||||||
" datasets=datasets)\n",
|
|
||||||
"\n",
|
|
||||||
"print(model_name, image_name, service_name, model)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Deploy Model To AKS"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": []
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Prepare Environment"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
|
|
||||||
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
|
|
||||||
"\n",
|
|
||||||
"with open(\"myenv.yml\",\"w\") as f:\n",
|
|
||||||
" f.write(myenv.serialize_to_string())"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Create Image"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Image creation may take up to 15 minutes.\n",
|
|
||||||
"\n",
|
|
||||||
"image_name = image_name + str(model.version)\n",
|
|
||||||
"\n",
|
|
||||||
"if not image_name in ws.images:\n",
|
|
||||||
" # Use the score.py defined in this directory as the execution script\n",
|
|
||||||
" # NOTE: The Model Data Collector must be enabled in the execution script for DataDrift to run correctly\n",
|
|
||||||
" image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
|
|
||||||
" runtime=\"python\",\n",
|
|
||||||
" conda_file=\"myenv.yml\",\n",
|
|
||||||
" description=\"Image with weather dataset model\")\n",
|
|
||||||
" image = ContainerImage.create(name=image_name,\n",
|
|
||||||
" models=[model],\n",
|
|
||||||
" image_config=image_config,\n",
|
|
||||||
" workspace=ws)\n",
|
|
||||||
"\n",
|
|
||||||
" image.wait_for_creation(show_output=True)\n",
|
|
||||||
"else:\n",
|
|
||||||
" image = ws.images[image_name]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Create Compute Target"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"aks_name = 'dd-demo-e2e'\n",
|
|
||||||
"prov_config = AksCompute.provisioning_configuration()\n",
|
|
||||||
"\n",
|
|
||||||
"if not aks_name in ws.compute_targets:\n",
|
|
||||||
" aks_target = ComputeTarget.create(workspace=ws,\n",
|
|
||||||
" name=aks_name,\n",
|
|
||||||
" provisioning_configuration=prov_config)\n",
|
|
||||||
"\n",
|
|
||||||
" aks_target.wait_for_completion(show_output=True)\n",
|
|
||||||
" print(aks_target.provisioning_state)\n",
|
|
||||||
" print(aks_target.provisioning_errors)\n",
|
|
||||||
"else:\n",
|
|
||||||
" aks_target=ws.compute_targets[aks_name]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Deploy Service"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"aks_service_name = service_name\n",
|
|
||||||
"\n",
|
|
||||||
"if not aks_service_name in ws.webservices:\n",
|
|
||||||
" aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)\n",
|
|
||||||
" aks_service = Webservice.deploy_from_image(workspace=ws,\n",
|
|
||||||
" name=aks_service_name,\n",
|
|
||||||
" image=image,\n",
|
|
||||||
" deployment_config=aks_config,\n",
|
|
||||||
" deployment_target=aks_target)\n",
|
|
||||||
" aks_service.wait_for_deployment(show_output=True)\n",
|
|
||||||
" print(aks_service.state)\n",
|
|
||||||
"else:\n",
|
|
||||||
" aks_service = ws.webservices[aks_service_name]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Run DataDrift Analysis"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Send Scoring Data to Service"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Download Scoring Data"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Score Model on March 15, 2016 data\n",
|
|
||||||
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
|
|
||||||
"# Add the window feature column\n",
|
|
||||||
"scoring_df = add_window_col(scoring_df)\n",
|
|
||||||
"\n",
|
|
||||||
"# Drop features not used by the model\n",
|
|
||||||
"print(\"Dropping unnecessary columns\")\n",
|
|
||||||
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
|
|
||||||
"scoring_df.head()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
|
|
||||||
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
|
|
||||||
"extra_cols = ('Path', 'Column1')\n",
|
|
||||||
"for k in extra_cols:\n",
|
|
||||||
" columns_dict.pop(k, None)\n",
|
|
||||||
"training_columns = list(columns_dict.keys())\n",
|
|
||||||
"\n",
|
|
||||||
"categorical_columns = scoring_df.dtypes == object\n",
|
|
||||||
"categorical_columns = categorical_columns[categorical_columns == True]\n",
|
|
||||||
"\n",
|
|
||||||
"test_df = pd.get_dummies(scoring_df[categorical_columns.keys().tolist()])\n",
|
|
||||||
"encoded_df = scoring_df.join(test_df)\n",
|
|
||||||
"\n",
|
|
||||||
"# Populate missing OHE columns with 0 values to match traning dataset schema\n",
|
|
||||||
"difference = list(set(training_columns) - set(encoded_df.columns.tolist()))\n",
|
|
||||||
"for col in difference:\n",
|
|
||||||
" encoded_df[col] = 0\n",
|
|
||||||
"encoded_df.head()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Serialize dataframe to list of row dictionaries\n",
|
|
||||||
"encoded_dict = encoded_df.to_dict('records')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Submit Scoring Data to Service"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"%%time\n",
|
|
||||||
"\n",
|
|
||||||
"# retreive the API keys. AML generates two keys.\n",
|
|
||||||
"key1, key2 = aks_service.get_keys()\n",
|
|
||||||
"\n",
|
|
||||||
"total_count = len(scoring_df)\n",
|
|
||||||
"i = 0\n",
|
|
||||||
"load = []\n",
|
|
||||||
"for row in encoded_dict:\n",
|
|
||||||
" load.append(row)\n",
|
|
||||||
" i = i + 1\n",
|
|
||||||
" if i % 100 == 0:\n",
|
|
||||||
" payload = json.dumps({\"data\": load})\n",
|
|
||||||
" \n",
|
|
||||||
" # construct raw HTTP request and send to the service\n",
|
|
||||||
" payload_binary = bytes(payload,encoding = 'utf8')\n",
|
|
||||||
" headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
|
|
||||||
" resp = requests.post(aks_service.scoring_uri, payload_binary, headers=headers)\n",
|
|
||||||
" \n",
|
|
||||||
" print(\"prediction:\", resp.content, \"Progress: {}/{}\".format(i, total_count)) \n",
|
|
||||||
"\n",
|
|
||||||
" load = []\n",
|
|
||||||
" time.sleep(3)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Configure DataDrift"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"services = [service_name]\n",
|
|
||||||
"start = datetime.now() - timedelta(days=2)\n",
|
|
||||||
"end = datetime(year=2020, month=1, day=22, hour=15, minute=16)\n",
|
|
||||||
"feature_list = ['usaf', 'wban', 'latitude', 'longitude', 'station_name', 'p_k', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n",
|
|
||||||
"alert_config = AlertConfiguration([email_address]) if email_address else None\n",
|
|
||||||
"\n",
|
|
||||||
"# there will be an exception indicating using get() method if DataDrift object already exist\n",
|
|
||||||
"try:\n",
|
|
||||||
" datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", alert_config=alert_config)\n",
|
|
||||||
"except KeyError:\n",
|
|
||||||
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
|
|
||||||
" \n",
|
|
||||||
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Run an Adhoc DataDriftDetector Run"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"target_date = datetime.today()\n",
|
|
||||||
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"exp = Experiment(ws, datadrift._id)\n",
|
|
||||||
"dd_run = Run(experiment=exp, run_id=run)\n",
|
|
||||||
"RunDetails(dd_run).show()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Get Drift Analysis Results"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"children = list(dd_run.get_children())\n",
|
|
||||||
"for child in children:\n",
|
|
||||||
" child.wait_for_completion()\n",
|
|
||||||
"\n",
|
|
||||||
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
|
|
||||||
"drift_metrics"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Show all drift figures, one per serivice.\n",
|
|
||||||
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
|
|
||||||
"\n",
|
|
||||||
"drift_figures = datadrift.show(with_details=True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Enable DataDrift Schedule"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"datadrift.enable_schedule()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "rafarmah"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.6"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,3 +0,0 @@
|
|||||||
## Using data drift APIs
|
|
||||||
|
|
||||||
1. [Detect data drift for a model](azure-ml-datadrift.ipynb): Detect data drift for a deployed model.
|
|
||||||
@@ -1,58 +0,0 @@
|
|||||||
import pickle
|
|
||||||
import json
|
|
||||||
import numpy
|
|
||||||
import azureml.train.automl
|
|
||||||
from sklearn.externals import joblib
|
|
||||||
from sklearn.linear_model import Ridge
|
|
||||||
from azureml.core.model import Model
|
|
||||||
from azureml.core.run import Run
|
|
||||||
from azureml.monitoring import ModelDataCollector
|
|
||||||
import time
|
|
||||||
import pandas as pd
|
|
||||||
|
|
||||||
|
|
||||||
def init():
|
|
||||||
global model, inputs_dc, prediction_dc, feature_names, categorical_features
|
|
||||||
|
|
||||||
print("Model is initialized" + time.strftime("%H:%M:%S"))
|
|
||||||
model_path = Model.get_model_path(model_name="driftmodel")
|
|
||||||
model = joblib.load(model_path)
|
|
||||||
|
|
||||||
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
|
|
||||||
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
|
|
||||||
"temperature-7"]
|
|
||||||
|
|
||||||
categorical_features = ["usaf", "wban", "p_k", "station_name"]
|
|
||||||
|
|
||||||
inputs_dc = ModelDataCollector(model_name="driftmodel",
|
|
||||||
identifier="inputs",
|
|
||||||
feature_names=feature_names)
|
|
||||||
|
|
||||||
prediction_dc = ModelDataCollector("driftmodel",
|
|
||||||
identifier="predictions",
|
|
||||||
feature_names=["temperature"])
|
|
||||||
|
|
||||||
|
|
||||||
def run(raw_data):
|
|
||||||
global inputs_dc, prediction_dc
|
|
||||||
|
|
||||||
try:
|
|
||||||
data = json.loads(raw_data)["data"]
|
|
||||||
data = pd.DataFrame(data)
|
|
||||||
|
|
||||||
# Remove the categorical features as the model expects OHE values
|
|
||||||
input_data = data.drop(categorical_features, axis=1)
|
|
||||||
|
|
||||||
result = model.predict(input_data)
|
|
||||||
|
|
||||||
# Collect the non-OHE dataframe
|
|
||||||
collected_df = data[feature_names]
|
|
||||||
|
|
||||||
inputs_dc.collect(collected_df.values)
|
|
||||||
prediction_dc.collect(result)
|
|
||||||
return result.tolist()
|
|
||||||
except Exception as e:
|
|
||||||
error = str(e)
|
|
||||||
|
|
||||||
print(error + time.strftime("%H:%M:%S"))
|
|
||||||
return error
|
|
||||||
@@ -1,30 +1,54 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "aashishb"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.7.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Register Model and deploy as Webservice\n",
|
"# Register Model and deploy as Webservice\n",
|
||||||
@@ -33,78 +57,78 @@
|
|||||||
"\n",
|
"\n",
|
||||||
" 1. Register Model\n",
|
" 1. Register Model\n",
|
||||||
" 2. Deploy Model as Webservice"
|
" 2. Deploy Model as Webservice"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration."
|
"Initialize a workspace object from persisted configuration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register Model"
|
"### Register Model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can add tags and descriptions to your Models. Note you need to have a `sklearn_regression_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a Model with the same name `sklearn_regression_model.pkl` in the workspace.\n",
|
"You can add tags and descriptions to your Models. Note you need to have a `sklearn_regression_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a Model with the same name `sklearn_regression_model.pkl` in the workspace.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
|
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"register model from file"
|
"register model from file"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -113,10 +137,10 @@
|
|||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||||
" workspace = ws)"
|
" workspace = ws)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Inference Configuration\n",
|
"## Create Inference Configuration\n",
|
||||||
@@ -139,17 +163,17 @@
|
|||||||
" - entry_script = contains logic specific to initializing your model and running predictions\n",
|
" - entry_script = contains logic specific to initializing your model and running predictions\n",
|
||||||
" - conda_file = manages conda and python package dependencies.\n",
|
" - conda_file = manages conda and python package dependencies.\n",
|
||||||
" - extra_docker_file_steps = optional: any extra steps you want to inject into docker file"
|
" - extra_docker_file_steps = optional: any extra steps you want to inject into docker file"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create image"
|
"create image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -157,22 +181,22 @@
|
|||||||
" entry_script=\"score.py\",\n",
|
" entry_script=\"score.py\",\n",
|
||||||
" conda_file=\"myenv.yml\", \n",
|
" conda_file=\"myenv.yml\", \n",
|
||||||
" extra_docker_file_steps=\"helloworld.txt\")"
|
" extra_docker_file_steps=\"helloworld.txt\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy Model as Webservice on Azure Container Instance\n",
|
"### Deploy Model as Webservice on Azure Container Instance\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Note that the service creation can take few minutes."
|
"Note that the service creation can take few minutes."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
||||||
"from azureml.exceptions import WebserviceException\n",
|
"from azureml.exceptions import WebserviceException\n",
|
||||||
@@ -194,20 +218,20 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"service.wait_for_deployment(True)\n",
|
"service.wait_for_deployment(True)\n",
|
||||||
"print(service.state)"
|
"print(service.state)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Test web service"
|
"#### Test web service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"test_sample = json.dumps({'data': [\n",
|
"test_sample = json.dumps({'data': [\n",
|
||||||
@@ -218,18 +242,17 @@
|
|||||||
"test_sample_encoded = bytes(test_sample,encoding = 'utf8')\n",
|
"test_sample_encoded = bytes(test_sample,encoding = 'utf8')\n",
|
||||||
"prediction = service.run(input_data=test_sample_encoded)\n",
|
"prediction = service.run(input_data=test_sample_encoded)\n",
|
||||||
"print(prediction)"
|
"print(prediction)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Delete ACI to clean up"
|
"#### Delete ACI to clean up"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"deploy service",
|
"deploy service",
|
||||||
@@ -237,12 +260,13 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"service.delete()"
|
"service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Model Profiling\n",
|
"### Model Profiling\n",
|
||||||
@@ -257,33 +281,9 @@
|
|||||||
"print(profiling_results)\n",
|
"print(profiling_results)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "aashishb"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.7.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "raymondl"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.7.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Register model and deploy locally with advanced usages\n",
|
"# Register model and deploy locally with advanced usages\n",
|
||||||
@@ -28,78 +52,78 @@
|
|||||||
" 2. Deploy the image as a web service in a local Docker container.\n",
|
" 2. Deploy the image as a web service in a local Docker container.\n",
|
||||||
" 3. Quickly test changes to your entry script by reloading the local service.\n",
|
" 3. Quickly test changes to your entry script by reloading the local service.\n",
|
||||||
" 4. Optionally, you can also make changes to model, conda or extra_docker_file_steps and update local service"
|
" 4. Optionally, you can also make changes to model, conda or extra_docker_file_steps and update local service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration."
|
"Initialize a workspace object from persisted configuration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"create workspace"
|
"create workspace"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register Model"
|
"## Register Model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the same name `sklearn_regression_model.pkl` in the workspace.\n",
|
"You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the same name `sklearn_regression_model.pkl` in the workspace.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric."
|
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"register model from file"
|
"register model from file"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -108,20 +132,20 @@
|
|||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||||
" workspace = ws)"
|
" workspace = ws)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Manage your dependencies in a folder"
|
"Manage your dependencies in a folder"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -131,20 +155,20 @@
|
|||||||
"os.makedirs(\"C:/abc/x/y\", exist_ok = True)\n",
|
"os.makedirs(\"C:/abc/x/y\", exist_ok = True)\n",
|
||||||
"os.makedirs(\"C:/abc/env\", exist_ok = True)\n",
|
"os.makedirs(\"C:/abc/env\", exist_ok = True)\n",
|
||||||
"os.makedirs(\"C:/abc/dockerstep\", exist_ok = True)"
|
"os.makedirs(\"C:/abc/dockerstep\", exist_ok = True)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_regression_model.pkl` registered under the workspace. It is NOT referencing the local file."
|
"Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_regression_model.pkl` registered under the workspace. It is NOT referencing the local file."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile C:/abc/x/y/score.py\n",
|
"%%writefile C:/abc/x/y/score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -184,13 +208,13 @@
|
|||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" error = str(e)\n",
|
" error = str(e)\n",
|
||||||
" return error"
|
" return error"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile C:/abc/env/myenv.yml\n",
|
"%%writefile C:/abc/env/myenv.yml\n",
|
||||||
"name: project_environment\n",
|
"name: project_environment\n",
|
||||||
@@ -201,23 +225,23 @@
|
|||||||
" - scikit-learn\n",
|
" - scikit-learn\n",
|
||||||
" - numpy\n",
|
" - numpy\n",
|
||||||
" - inference-schema[numpy-support]"
|
" - inference-schema[numpy-support]"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile C:/abc/dockerstep/customDockerStep.txt\n",
|
"%%writefile C:/abc/dockerstep/customDockerStep.txt\n",
|
||||||
"RUN echo \"this is test\""
|
"RUN echo \"this is test\""
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile C:/abc/extradata.json\n",
|
"%%writefile C:/abc/extradata.json\n",
|
||||||
"{\n",
|
"{\n",
|
||||||
@@ -229,10 +253,10 @@
|
|||||||
" }\n",
|
" }\n",
|
||||||
" ]\n",
|
" ]\n",
|
||||||
"}"
|
"}"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Inference Configuration\n",
|
"## Create Inference Configuration\n",
|
||||||
@@ -242,13 +266,13 @@
|
|||||||
" - entry_script = contains logic specific to initializing your model and running predictions\n",
|
" - entry_script = contains logic specific to initializing your model and running predictions\n",
|
||||||
" - conda_file = manages conda and python package dependencies.\n",
|
" - conda_file = manages conda and python package dependencies.\n",
|
||||||
" - extra_docker_file_steps = optional: any extra steps you want to inject into docker file"
|
" - extra_docker_file_steps = optional: any extra steps you want to inject into docker file"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -257,10 +281,10 @@
|
|||||||
" entry_script=\"x/y/score.py\",\n",
|
" entry_script=\"x/y/score.py\",\n",
|
||||||
" conda_file=\"env/myenv.yml\", \n",
|
" conda_file=\"env/myenv.yml\", \n",
|
||||||
" extra_docker_file_steps=\"dockerstep/customDockerStep.txt\")"
|
" extra_docker_file_steps=\"dockerstep/customDockerStep.txt\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy Model as a Local Docker Web Service\n",
|
"## Deploy Model as a Local Docker Web Service\n",
|
||||||
@@ -280,11 +304,10 @@
|
|||||||
"sometimes you have to reshare c drive as docker \n",
|
"sometimes you have to reshare c drive as docker \n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"./dockerSharedDrive.JPG\" align=\"left\"/>"
|
"<img src=\"./dockerSharedDrive.JPG\" align=\"left\"/>"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"deploy service",
|
"deploy service",
|
||||||
@@ -292,6 +315,7 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import LocalWebservice\n",
|
"from azureml.core.webservice import LocalWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -301,52 +325,52 @@
|
|||||||
"local_service = Model.deploy(ws, \"test\", [model], inference_config, deployment_config)\n",
|
"local_service = Model.deploy(ws, \"test\", [model], inference_config, deployment_config)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"local_service.wait_for_deployment()"
|
"local_service.wait_for_deployment()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print('Local service port: {}'.format(local_service.port))"
|
"print('Local service port: {}'.format(local_service.port))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Check Status and Get Container Logs\n"
|
"## Check Status and Get Container Logs\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(local_service.get_logs())"
|
"print(local_service.get_logs())"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test Web Service"
|
"## Test Web Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the web service with some input data to get a prediction."
|
"Call the web service with some input data to get a prediction."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -360,22 +384,22 @@
|
|||||||
"sample_input = bytes(sample_input, encoding='utf-8')\n",
|
"sample_input = bytes(sample_input, encoding='utf-8')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(local_service.run(input_data=sample_input))"
|
"print(local_service.run(input_data=sample_input))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Reload Service\n",
|
"## Reload Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can update your score.py file and then call `reload()` to quickly restart the service. This will only reload your execution script and dependency files, it will not rebuild the underlying Docker image. As a result, `reload()` is fast, but if you do need to rebuild the image -- to add a new Conda or pip package, for instance -- you will have to call `update()`, instead (see below)."
|
"You can update your score.py file and then call `reload()` to quickly restart the service. This will only reload your execution script and dependency files, it will not rebuild the underlying Docker image. As a result, `reload()` is fast, but if you do need to rebuild the image -- to add a new Conda or pip package, for instance -- you will have to call `update()`, instead (see below)."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile C:/abc/x/y/score.py\n",
|
"%%writefile C:/abc/x/y/score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -416,13 +440,13 @@
|
|||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" error = str(e)\n",
|
" error = str(e)\n",
|
||||||
" return error"
|
" return error"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_service.reload()\n",
|
"local_service.reload()\n",
|
||||||
"print(\"--------------------------------------------------------------\")\n",
|
"print(\"--------------------------------------------------------------\")\n",
|
||||||
@@ -430,10 +454,10 @@
|
|||||||
"# after reload now if you call run this will return updated return message\n",
|
"# after reload now if you call run this will return updated return message\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(local_service.run(input_data=sample_input))"
|
"print(local_service.run(input_data=sample_input))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Update Service\n",
|
"## Update Service\n",
|
||||||
@@ -446,49 +470,25 @@
|
|||||||
" deployment_config = local_config,\n",
|
" deployment_config = local_config,\n",
|
||||||
" inference_config = inference_config)\n",
|
" inference_config = inference_config)\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Delete Service"
|
"## Delete Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_service.delete()"
|
"local_service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "raymondl"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.7.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,23 +1,47 @@
|
|||||||
{
|
{
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"name": "python36",
|
||||||
|
"language": "python"
|
||||||
|
},
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "raymondl"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"name": "python",
|
||||||
|
"file_extension": ".py",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.7.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Licensed under the MIT License."
|
"Licensed under the MIT License."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Register model and deploy locally\n",
|
"# Register model and deploy locally\n",
|
||||||
@@ -28,74 +52,74 @@
|
|||||||
" 2. Deploy the image as a web service in a local Docker container.\n",
|
" 2. Deploy the image as a web service in a local Docker container.\n",
|
||||||
" 3. Quickly test changes to your entry script by reloading the local service.\n",
|
" 3. Quickly test changes to your entry script by reloading the local service.\n",
|
||||||
" 4. Optionally, you can also make changes to model, conda or extra_docker_file_steps and update local service"
|
" 4. Optionally, you can also make changes to model, conda or extra_docker_file_steps and update local service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize Workspace\n",
|
"## Initialize Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Initialize a workspace object from persisted configuration."
|
"Initialize a workspace object from persisted configuration."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register Model"
|
"## Register Model"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the same name `sklearn_regression_model.pkl` in the workspace.\n",
|
"You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the same name `sklearn_regression_model.pkl` in the workspace.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric."
|
"Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"tags": [
|
"tags": [
|
||||||
"register model from file"
|
"register model from file"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -104,30 +128,30 @@
|
|||||||
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||||
" description = \"Ridge regression model to predict diabetes\",\n",
|
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||||
" workspace = ws)"
|
" workspace = ws)"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Inference Configuration"
|
"## Create Inference Configuration"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import InferenceConfig\n",
|
"from azureml.core.model import InferenceConfig\n",
|
||||||
"\n",
|
"\n",
|
||||||
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
"inference_config = InferenceConfig(runtime= \"python\", \n",
|
||||||
" entry_script=\"score.py\",\n",
|
" entry_script=\"score.py\",\n",
|
||||||
" conda_file=\"myenv.yml\")"
|
" conda_file=\"myenv.yml\")"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy Model as a Local Docker Web Service\n",
|
"## Deploy Model as a Local Docker Web Service\n",
|
||||||
@@ -147,13 +171,13 @@
|
|||||||
"sometimes you have to reshare c drive as docker \n",
|
"sometimes you have to reshare c drive as docker \n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"./dockerSharedDrive.JPG\" align=\"left\"/>"
|
"<img src=\"./dockerSharedDrive.JPG\" align=\"left\"/>"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import LocalWebservice\n",
|
"from azureml.core.webservice import LocalWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -163,52 +187,52 @@
|
|||||||
"local_service = Model.deploy(ws, \"test\", [model], inference_config, deployment_config)\n",
|
"local_service = Model.deploy(ws, \"test\", [model], inference_config, deployment_config)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"local_service.wait_for_deployment()"
|
"local_service.wait_for_deployment()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print('Local service port: {}'.format(local_service.port))"
|
"print('Local service port: {}'.format(local_service.port))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Check Status and Get Container Logs\n"
|
"## Check Status and Get Container Logs\n"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"print(local_service.get_logs())"
|
"print(local_service.get_logs())"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Test Web Service"
|
"## Test Web Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call the web service with some input data to get a prediction."
|
"Call the web service with some input data to get a prediction."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -222,22 +246,22 @@
|
|||||||
"sample_input = bytes(sample_input, encoding='utf-8')\n",
|
"sample_input = bytes(sample_input, encoding='utf-8')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(local_service.run(input_data=sample_input))"
|
"print(local_service.run(input_data=sample_input))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Reload Service\n",
|
"## Reload Service\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can update your score.py file and then call `reload()` to quickly restart the service. This will only reload your execution script and dependency files, it will not rebuild the underlying Docker image. As a result, `reload()` is fast, but if you do need to rebuild the image -- to add a new Conda or pip package, for instance -- you will have to call `update()`, instead (see below)."
|
"You can update your score.py file and then call `reload()` to quickly restart the service. This will only reload your execution script and dependency files, it will not rebuild the underlying Docker image. As a result, `reload()` is fast, but if you do need to rebuild the image -- to add a new Conda or pip package, for instance -- you will have to call `update()`, instead (see below)."
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile score.py\n",
|
"%%writefile score.py\n",
|
||||||
"import pickle\n",
|
"import pickle\n",
|
||||||
@@ -271,13 +295,13 @@
|
|||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" error = str(e)\n",
|
" error = str(e)\n",
|
||||||
" return error"
|
" return error"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_service.reload()\n",
|
"local_service.reload()\n",
|
||||||
"print(\"--------------------------------------------------------------\")\n",
|
"print(\"--------------------------------------------------------------\")\n",
|
||||||
@@ -285,10 +309,10 @@
|
|||||||
"# after reload now if you call run this will return updated return message\n",
|
"# after reload now if you call run this will return updated return message\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(local_service.run(input_data=sample_input))"
|
"print(local_service.run(input_data=sample_input))"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Update Service\n",
|
"## Update Service\n",
|
||||||
@@ -301,49 +325,25 @@
|
|||||||
" deployment_config = local_config,\n",
|
" deployment_config = local_config,\n",
|
||||||
" inference_config = inference_config)\n",
|
" inference_config = inference_config)\n",
|
||||||
"```"
|
"```"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Delete Service"
|
"## Delete Service"
|
||||||
]
|
],
|
||||||
|
"cell_type": "markdown"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
"source": [
|
"source": [
|
||||||
"local_service.delete()"
|
"local_service.delete()"
|
||||||
]
|
],
|
||||||
|
"cell_type": "code"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "raymondl"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.7.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
}
|
}
|
||||||
@@ -1,102 +0,0 @@
|
|||||||
|
|
||||||
# Notebooks for Microsoft Azure Machine Learning Hardware Accelerated Models SDK
|
|
||||||
|
|
||||||
Easily create and train a model using various deep neural networks (DNNs) as a featurizer for deployment to Azure or a Data Box Edge device for ultra-low latency inferencing using FPGA's. These models are currently available:
|
|
||||||
|
|
||||||
* ResNet 50
|
|
||||||
* ResNet 152
|
|
||||||
* DenseNet-121
|
|
||||||
* VGG-16
|
|
||||||
* SSD-VGG
|
|
||||||
|
|
||||||
To learn more about the azureml-accel-model classes, see the section [Model Classes](#model-classes) below or the [Azure ML Accel Models SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel?view=azure-ml-py).
|
|
||||||
|
|
||||||
### Step 1: Create an Azure ML workspace
|
|
||||||
Follow [these instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python) to install the Azure ML SDK on your local machine, create an Azure ML workspace, and set up your notebook environment, which is required for the next step.
|
|
||||||
|
|
||||||
### Step 2: Check your FPGA quota
|
|
||||||
Use the Azure CLI to check whether you have quota.
|
|
||||||
|
|
||||||
```shell
|
|
||||||
az vm list-usage --location "eastus" -o table
|
|
||||||
```
|
|
||||||
|
|
||||||
The other locations are ``southeastasia``, ``westeurope``, and ``westus2``.
|
|
||||||
|
|
||||||
Under the "Name" column, look for "Standard PBS Family vCPUs" and ensure you have at least 6 vCPUs under "CurrentValue."
|
|
||||||
|
|
||||||
If you do not have quota, then submit a request form [here](https://aka.ms/accelerateAI).
|
|
||||||
|
|
||||||
### Step 3: Install the Azure ML Accelerated Models SDK
|
|
||||||
Once you have set up your environment, install the Azure ML Accel Models SDK. This package requires tensorflow >= 1.6,<2.0 to be installed.
|
|
||||||
|
|
||||||
If you already have tensorflow >= 1.6,<2.0 installed in your development environment, you can install the SDK package using:
|
|
||||||
|
|
||||||
```
|
|
||||||
pip install azureml-accel-models
|
|
||||||
```
|
|
||||||
|
|
||||||
If you do not have tensorflow >= 1.6,<2.0 and are using a CPU-only development environment, our SDK with tensorflow can be installed using:
|
|
||||||
|
|
||||||
```
|
|
||||||
pip install azureml-accel-models[cpu]
|
|
||||||
```
|
|
||||||
|
|
||||||
If your machine supports GPU (for example, on an [Azure DSVM](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview)), then you can leverage the tensorflow-gpu functionality using:
|
|
||||||
|
|
||||||
```
|
|
||||||
pip install azureml-accel-models[gpu]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Follow our notebooks
|
|
||||||
|
|
||||||
The notebooks in this repo walk through the following scenarios:
|
|
||||||
* [Quickstart](accelerated-models-quickstart.ipynb), deploy and inference a ResNet50 model trained on ImageNet
|
|
||||||
* [Object Detection](accelerated-models-object-detection.ipynb), deploy and inference an SSD-VGG model that can do object detection
|
|
||||||
* [Training models](accelerated-models-training.ipynb), train one of our accelerated models on the Kaggle Cats and Dogs dataset to see how to improve accuracy on custom datasets
|
|
||||||
|
|
||||||
<a name="model-classes"></a>
|
|
||||||
## Model Classes
|
|
||||||
As stated above, we support 5 Accelerated Models. Here's more information on their input and output tensors.
|
|
||||||
|
|
||||||
**Available models and output tensors**
|
|
||||||
|
|
||||||
The available models and the corresponding default classifier output tensors are below. This is the value that you would use during inferencing if you used the default classifier.
|
|
||||||
* Resnet50, QuantizedResnet50
|
|
||||||
``
|
|
||||||
output_tensors = "classifier_1/resnet_v1_50/predictions/Softmax:0"
|
|
||||||
``
|
|
||||||
* Resnet152, QuantizedResnet152
|
|
||||||
``
|
|
||||||
output_tensors = "classifier/resnet_v1_152/predictions/Softmax:0"
|
|
||||||
``
|
|
||||||
* Densenet121, QuantizedDensenet121
|
|
||||||
``
|
|
||||||
output_tensors = "classifier/densenet121/predictions/Softmax:0"
|
|
||||||
``
|
|
||||||
* Vgg16, QuantizedVgg16
|
|
||||||
``
|
|
||||||
output_tensors = "classifier/vgg_16/fc8/squeezed:0"
|
|
||||||
``
|
|
||||||
* SsdVgg, QuantizedSsdVgg
|
|
||||||
``
|
|
||||||
output_tensors = ['ssd_300_vgg/block4_box/Reshape_1:0', 'ssd_300_vgg/block7_box/Reshape_1:0', 'ssd_300_vgg/block8_box/Reshape_1:0', 'ssd_300_vgg/block9_box/Reshape_1:0', 'ssd_300_vgg/block10_box/Reshape_1:0', 'ssd_300_vgg/block11_box/Reshape_1:0', 'ssd_300_vgg/block4_box/Reshape:0', 'ssd_300_vgg/block7_box/Reshape:0', 'ssd_300_vgg/block8_box/Reshape:0', 'ssd_300_vgg/block9_box/Reshape:0', 'ssd_300_vgg/block10_box/Reshape:0', 'ssd_300_vgg/block11_box/Reshape:0']
|
|
||||||
``
|
|
||||||
|
|
||||||
For more information, please reference the azureml.accel.models package in the [Azure ML Python SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.models?view=azure-ml-py).
|
|
||||||
|
|
||||||
**Input tensors**
|
|
||||||
|
|
||||||
The input_tensors value defaults to "Placeholder:0" and is created in the [Image Preprocessing](#construct-model) step in the line:
|
|
||||||
``
|
|
||||||
in_images = tf.placeholder(tf.string)
|
|
||||||
``
|
|
||||||
|
|
||||||
You can change the input_tensors name by doing this:
|
|
||||||
``
|
|
||||||
in_images = tf.placeholder(tf.string, name="images")
|
|
||||||
``
|
|
||||||
|
|
||||||
|
|
||||||
## Resources
|
|
||||||
* [Read more about FPGAs](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-accelerate-with-fpgas)
|
|
||||||
@@ -1,494 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Azure ML Hardware Accelerated Object Detection"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"This tutorial will show you how to deploy an object detection service based on the SSD-VGG model in just a few minutes using the Azure Machine Learning Accelerated AI service.\n",
|
|
||||||
"\n",
|
|
||||||
"We will use the SSD-VGG model accelerated on an FPGA. Our Accelerated Models Service handles translating deep neural networks (DNN) into an FPGA program.\n",
|
|
||||||
"\n",
|
|
||||||
"The steps in this notebook are: \n",
|
|
||||||
"1. [Setup Environment](#set-up-environment)\n",
|
|
||||||
"* [Construct Model](#construct-model)\n",
|
|
||||||
" * Image Preprocessing\n",
|
|
||||||
" * Featurizer\n",
|
|
||||||
" * Save Model\n",
|
|
||||||
" * Save input and output tensor names\n",
|
|
||||||
"* [Create Image](#create-image)\n",
|
|
||||||
"* [Deploy Image](#deploy-image)\n",
|
|
||||||
"* [Test the Service](#test-service)\n",
|
|
||||||
" * Create Client\n",
|
|
||||||
" * Serve the model\n",
|
|
||||||
"* [Cleanup](#cleanup)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"set-up-environment\"></a>\n",
|
|
||||||
"## 1. Set up Environment\n",
|
|
||||||
"### 1.a. Imports"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import os\n",
|
|
||||||
"import tensorflow as tf"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 1.b. Retrieve Workspace\n",
|
|
||||||
"If you haven't created a Workspace, please follow [this notebook](\"../../../configuration.ipynb\") to do so. If you have, run the codeblock below to retrieve it. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"ws = Workspace.from_config()\n",
|
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"construct-model\"></a>\n",
|
|
||||||
"## 2. Construct model\n",
|
|
||||||
"### 2.a. Image preprocessing\n",
|
|
||||||
"We'd like our service to accept JPEG images as input. However the input to SSD-VGG is a float tensor of shape \\[1, 300, 300, 3\\]. The first dimension is batch, then height, width, and channels (i.e. NHWC). To bridge this gap, we need code that decodes JPEG images and resizes them appropriately for input to SSD-VGG. The Accelerated AI service can execute TensorFlow graphs as part of the service and we'll use that ability to do the image preprocessing. This code defines a TensorFlow graph that preprocesses an array of JPEG images (as TensorFlow strings) and produces a tensor that is ready to be featurized by SSD-VGG.\n",
|
|
||||||
"\n",
|
|
||||||
"**Note:** Expect to see TF deprecation warnings until we port our SDK over to use Tensorflow 2.0."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n",
|
|
||||||
"import azureml.accel.models.utils as utils\n",
|
|
||||||
"tf.reset_default_graph()\n",
|
|
||||||
"\n",
|
|
||||||
"in_images = tf.placeholder(tf.string)\n",
|
|
||||||
"image_tensors = utils.preprocess_array(in_images, output_width=300, output_height=300, preserve_aspect_ratio=False)\n",
|
|
||||||
"print(image_tensors.shape)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 2.b. Featurizer\n",
|
|
||||||
"The SSD-VGG model is different from our other models in that it generates 12 tensor outputs. These corresponds to x,y displacements of the anchor boxes and the detection confidence (for 21 classes). Because these outputs are not convenient to work with, we will later use a pre-defined post-processing utility to transform the outputs into a simplified list of bounding boxes with their respective class and confidence.\n",
|
|
||||||
"\n",
|
|
||||||
"For more information about the output tensors, take this example: the output tensor 'ssd_300_vgg/block4_box/Reshape_1:0' has a shape of [None, 37, 37, 4, 21]. This gives the pre-softmax confidence for 4 anchor boxes situated at each site of a 37 x 37 grid imposed on the image, one confidence score for each of the 21 classes. The first dimension is the batch dimension. Likewise, 'ssd_300_vgg/block4_box/Reshape:0' has shape [None, 37, 37, 4, 4] and encodes the (cx, cy) center shift and rescaling (sw, sh) relative to each anchor box. Refer to the [SSD-VGG paper](https://arxiv.org/abs/1512.02325) to understand how these are computed. The other 10 tensors are defined similarly."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.accel.models import SsdVgg\n",
|
|
||||||
"\n",
|
|
||||||
"saved_model_dir = os.path.join(os.path.expanduser('~'), 'models')\n",
|
|
||||||
"model_graph = SsdVgg(saved_model_dir, is_frozen = True)\n",
|
|
||||||
"\n",
|
|
||||||
"print('SSD-VGG Input Tensors:')\n",
|
|
||||||
"for idx, input_name in enumerate(model_graph.input_tensor_list):\n",
|
|
||||||
" print('{}, {}'.format(input_name, model_graph.get_input_dims(idx)))\n",
|
|
||||||
" \n",
|
|
||||||
"print('SSD-VGG Output Tensors:')\n",
|
|
||||||
"for idx, output_name in enumerate(model_graph.output_tensor_list):\n",
|
|
||||||
" print('{}, {}'.format(output_name, model_graph.get_output_dims(idx)))\n",
|
|
||||||
"\n",
|
|
||||||
"ssd_outputs = model_graph.import_graph_def(image_tensors, is_training=False)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 2.c. Save Model\n",
|
|
||||||
"Now that we loaded both parts of the tensorflow graph (preprocessor and SSD-VGG featurizer), we can save the graph and associated variables to a directory which we can register as an Azure ML Model."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"model_name = \"ssdvgg\"\n",
|
|
||||||
"model_save_path = os.path.join(saved_model_dir, model_name, \"saved_model\")\n",
|
|
||||||
"print(\"Saving model in {}\".format(model_save_path))\n",
|
|
||||||
"\n",
|
|
||||||
"output_map = {}\n",
|
|
||||||
"for i, output in enumerate(ssd_outputs):\n",
|
|
||||||
" output_map['out_{}'.format(i)] = output\n",
|
|
||||||
"\n",
|
|
||||||
"with tf.Session() as sess:\n",
|
|
||||||
" model_graph.restore_weights(sess)\n",
|
|
||||||
" tf.saved_model.simple_save(sess, \n",
|
|
||||||
" model_save_path, \n",
|
|
||||||
" inputs={'images': in_images}, \n",
|
|
||||||
" outputs=output_map)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 2.d. Important! Save names of input and output tensors\n",
|
|
||||||
"\n",
|
|
||||||
"These input and output tensors that were created during the preprocessing and classifier steps are also going to be used when **converting the model** to an Accelerated Model that can run on FPGA's and for **making an inferencing request**. It is very important to save this information!"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"register model from file"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"input_tensors = in_images.name\n",
|
|
||||||
"# We will use the list of output tensors during inferencing\n",
|
|
||||||
"output_tensors = [output.name for output in ssd_outputs]\n",
|
|
||||||
"# However, for multiple output tensors, our AccelOnnxConverter will \n",
|
|
||||||
"# accept comma-delimited strings (lists will cause error)\n",
|
|
||||||
"output_tensors_str = \",\".join(output_tensors)\n",
|
|
||||||
"\n",
|
|
||||||
"print(input_tensors)\n",
|
|
||||||
"print(output_tensors)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"create-image\"></a>\n",
|
|
||||||
"## 3. Create AccelContainerImage\n",
|
|
||||||
"Below we will execute all the same steps as in the [Quickstart](./accelerated-models-quickstart.ipynb#create-image) to package the model we have saved locally into an accelerated Docker image saved in our workspace. To complete all the steps, it may take a few minutes. For more details on each step, check out the [Quickstart section on model registration](./accelerated-models-quickstart.ipynb#register-model)."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"from azureml.core.image import Image\n",
|
|
||||||
"from azureml.accel import AccelOnnxConverter\n",
|
|
||||||
"from azureml.accel import AccelContainerImage\n",
|
|
||||||
"\n",
|
|
||||||
"# Retrieve workspace\n",
|
|
||||||
"ws = Workspace.from_config()\n",
|
|
||||||
"print(\"Successfully retrieved workspace:\", ws.name, ws.resource_group, ws.location, ws.subscription_id, '\\n')\n",
|
|
||||||
"\n",
|
|
||||||
"# Register model\n",
|
|
||||||
"registered_model = Model.register(workspace = ws,\n",
|
|
||||||
" model_path = model_save_path,\n",
|
|
||||||
" model_name = model_name)\n",
|
|
||||||
"print(\"Successfully registered: \", registered_model.name, registered_model.description, registered_model.version, '\\n', sep = '\\t')\n",
|
|
||||||
"\n",
|
|
||||||
"# Convert model\n",
|
|
||||||
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors_str)\n",
|
|
||||||
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
|
|
||||||
"convert_request.wait_for_completion(show_output=False)\n",
|
|
||||||
"converted_model = convert_request.result\n",
|
|
||||||
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
|
||||||
" converted_model.id, converted_model.created_time, '\\n')\n",
|
|
||||||
"\n",
|
|
||||||
"# Package into AccelContainerImage\n",
|
|
||||||
"image_config = AccelContainerImage.image_configuration()\n",
|
|
||||||
"# Image name must be lowercase\n",
|
|
||||||
"image_name = \"{}-image\".format(model_name)\n",
|
|
||||||
"image = Image.create(name = image_name,\n",
|
|
||||||
" models = [converted_model],\n",
|
|
||||||
" image_config = image_config, \n",
|
|
||||||
" workspace = ws)\n",
|
|
||||||
"image.wait_for_creation()\n",
|
|
||||||
"print(\"Created AccelContainerImage: {} {} {}\\n\".format(image.name, image.creation_state, image.image_location))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"deploy-image\"></a>\n",
|
|
||||||
"## 4. Deploy image\n",
|
|
||||||
"Once you have an Azure ML Accelerated Image in your Workspace, you can deploy it to two destinations, to a Databox Edge machine or to an AKS cluster. \n",
|
|
||||||
"\n",
|
|
||||||
"### 4.a. Deploy to Databox Edge Machine using IoT Hub\n",
|
|
||||||
"See the sample [here](https://github.com/Azure-Samples/aml-real-time-ai/) for using the Azure IoT CLI extension for deploying your Docker image to your Databox Edge Machine.\n",
|
|
||||||
"\n",
|
|
||||||
"### 4.b. Deploy to AKS Cluster\n",
|
|
||||||
"Same as in the [Quickstart section on image deployment](./accelerated-models-quickstart.ipynb#deploy-image), we are going to create an AKS cluster with FPGA-enabled machines, then deploy our service to it.\n",
|
|
||||||
"#### Create AKS ComputeTarget"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
|
||||||
"\n",
|
|
||||||
"# Uses the specific FPGA enabled VM (sku: Standard_PB6s)\n",
|
|
||||||
"# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia\n",
|
|
||||||
"prov_config = AksCompute.provisioning_configuration(vm_size = \"Standard_PB6s\",\n",
|
|
||||||
" agent_count = 1, \n",
|
|
||||||
" location = \"eastus\")\n",
|
|
||||||
"\n",
|
|
||||||
"aks_name = 'aks-pb6-obj'\n",
|
|
||||||
"# Create the cluster\n",
|
|
||||||
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
|
||||||
" name = aks_name, \n",
|
|
||||||
" provisioning_configuration = prov_config)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Provisioning an AKS cluster might take awhile (15 or so minutes), and we want to wait until it's successfully provisioned before we can deploy a service to it. If you interrupt this cell, provisioning of the cluster will continue. You can re-run it or check the status in your Workspace under Compute."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"aks_target.wait_for_completion(show_output = True)\n",
|
|
||||||
"print(aks_target.provisioning_state)\n",
|
|
||||||
"print(aks_target.provisioning_errors)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Deploy AccelContainerImage to AKS ComputeTarget"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
|
||||||
"\n",
|
|
||||||
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
|
||||||
"# Authentication is enabled by default, but for testing we specify False\n",
|
|
||||||
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
|
|
||||||
" num_replicas=1,\n",
|
|
||||||
" auth_enabled = False)\n",
|
|
||||||
"\n",
|
|
||||||
"aks_service_name ='my-aks-service'\n",
|
|
||||||
"\n",
|
|
||||||
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
|
|
||||||
" name = aks_service_name,\n",
|
|
||||||
" image = image,\n",
|
|
||||||
" deployment_config = aks_config,\n",
|
|
||||||
" deployment_target = aks_target)\n",
|
|
||||||
"aks_service.wait_for_deployment(show_output = True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"test-service\"></a>\n",
|
|
||||||
"## 5. Test the service\n",
|
|
||||||
"<a id=\"create-client\"></a>\n",
|
|
||||||
"### 5.a. Create Client\n",
|
|
||||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions. \n",
|
|
||||||
"\n",
|
|
||||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
|
|
||||||
"\n",
|
|
||||||
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
|
||||||
"from azureml.accel.client import PredictionClient\n",
|
|
||||||
"\n",
|
|
||||||
"address = aks_service.scoring_uri\n",
|
|
||||||
"ssl_enabled = address.startswith(\"https\")\n",
|
|
||||||
"address = address[address.find('/')+2:].strip('/')\n",
|
|
||||||
"port = 443 if ssl_enabled else 80\n",
|
|
||||||
"\n",
|
|
||||||
"# Initialize AzureML Accelerated Models client\n",
|
|
||||||
"client = PredictionClient(address=address,\n",
|
|
||||||
" port=port,\n",
|
|
||||||
" use_ssl=ssl_enabled,\n",
|
|
||||||
" service_name=aks_service.name)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"You can adapt the client [code](https://github.com/Azure/aml-real-time-ai/blob/master/pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](https://github.com/Azure/aml-real-time-ai/blob/master/sample-clients/csharp).\n",
|
|
||||||
"\n",
|
|
||||||
"The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"serve-model\"></a>\n",
|
|
||||||
"### 5.b. Serve the model\n",
|
|
||||||
"The SSD-VGG model returns the confidence and bounding boxes for all possible anchor boxes. As mentioned earlier, we will use a post-processing routine to transform this into a list of bounding boxes (y1, x1, y2, x2) where x, y are fractional coordinates measured from left and top respectively. A respective list of classes and scores is also returned to tag each bounding box. Below we make use of this information to draw the bounding boxes on top the original image. Note that in the post-processing routine we select a confidence threshold of 0.5."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import cv2\n",
|
|
||||||
"from matplotlib import pyplot as plt\n",
|
|
||||||
"\n",
|
|
||||||
"colors_tableau = [(255, 255, 255), (31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),\n",
|
|
||||||
" (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),\n",
|
|
||||||
" (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),\n",
|
|
||||||
" (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),\n",
|
|
||||||
" (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"def draw_boxes_on_img(img, classes, scores, bboxes, thickness=2):\n",
|
|
||||||
" shape = img.shape\n",
|
|
||||||
" for i in range(bboxes.shape[0]):\n",
|
|
||||||
" bbox = bboxes[i]\n",
|
|
||||||
" color = colors_tableau[classes[i]]\n",
|
|
||||||
" # Draw bounding box...\n",
|
|
||||||
" p1 = (int(bbox[0] * shape[0]), int(bbox[1] * shape[1]))\n",
|
|
||||||
" p2 = (int(bbox[2] * shape[0]), int(bbox[3] * shape[1]))\n",
|
|
||||||
" cv2.rectangle(img, p1[::-1], p2[::-1], color, thickness)\n",
|
|
||||||
" # Draw text...\n",
|
|
||||||
" s = '%s/%.3f' % (classes[i], scores[i])\n",
|
|
||||||
" p1 = (p1[0]-5, p1[1])\n",
|
|
||||||
" cv2.putText(img, s, p1[::-1], cv2.FONT_HERSHEY_DUPLEX, 0.4, color, 1)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import azureml.accel._external.ssdvgg_utils as ssdvgg_utils\n",
|
|
||||||
"\n",
|
|
||||||
"result = client.score_file(path=\"meeting.jpg\", input_name=input_tensors, outputs=output_tensors)\n",
|
|
||||||
"classes, scores, bboxes = ssdvgg_utils.postprocess(result, select_threshold=0.5)\n",
|
|
||||||
"\n",
|
|
||||||
"img = cv2.imread('meeting.jpg', 1)\n",
|
|
||||||
"img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
|
|
||||||
"draw_boxes_on_img(img, classes, scores, bboxes)\n",
|
|
||||||
"plt.imshow(img)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"cleanup\"></a>\n",
|
|
||||||
"## 6. Cleanup\n",
|
|
||||||
"It's important to clean up your resources, so that you won't incur unnecessary costs. In the [next notebook](./accelerated-models-training.ipynb) you will learn how to train a classfier on a new dataset using transfer learning."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"aks_service.delete()\n",
|
|
||||||
"aks_target.delete()\n",
|
|
||||||
"image.delete()\n",
|
|
||||||
"registered_model.delete()\n",
|
|
||||||
"converted_model.delete()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "coverste"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "paledger"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "sukha"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||
@@ -1,548 +0,0 @@
|
|||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
||||||
"\n",
|
|
||||||
"Licensed under the MIT License."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Azure ML Hardware Accelerated Models Quickstart"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"This tutorial will show you how to deploy an image recognition service based on the ResNet 50 classifier using the Azure Machine Learning Accelerated Models service. Get more information about our service from our [documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-accelerate-with-fpgas), [API reference](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel?view=azure-ml-py), or [forum](https://aka.ms/aml-forum).\n",
|
|
||||||
"\n",
|
|
||||||
"We will use an accelerated ResNet50 featurizer running on an FPGA. Our Accelerated Models Service handles translating deep neural networks (DNN) into an FPGA program.\n",
|
|
||||||
"\n",
|
|
||||||
"For more information about using other models besides Resnet50, see the [README](./README.md).\n",
|
|
||||||
"\n",
|
|
||||||
"The steps covered in this notebook are: \n",
|
|
||||||
"1. [Set up environment](#set-up-environment)\n",
|
|
||||||
"* [Construct model](#construct-model)\n",
|
|
||||||
" * Image Preprocessing\n",
|
|
||||||
" * Featurizer (Resnet50)\n",
|
|
||||||
" * Classifier\n",
|
|
||||||
" * Save Model\n",
|
|
||||||
"* [Register Model](#register-model)\n",
|
|
||||||
"* [Convert into Accelerated Model](#convert-model)\n",
|
|
||||||
"* [Create Image](#create-image)\n",
|
|
||||||
"* [Deploy](#deploy-image)\n",
|
|
||||||
"* [Test service](#test-service)\n",
|
|
||||||
"* [Clean-up](#clean-up)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"set-up-environment\"></a>\n",
|
|
||||||
"## 1. Set up environment"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import os\n",
|
|
||||||
"import tensorflow as tf"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Retrieve Workspace\n",
|
|
||||||
"If you haven't created a Workspace, please follow [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) to do so. If you have, run the codeblock below to retrieve it. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"ws = Workspace.from_config()\n",
|
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"construct-model\"></a>\n",
|
|
||||||
"## 2. Construct model\n",
|
|
||||||
"\n",
|
|
||||||
"There are three parts to the model we are deploying: pre-processing, featurizer with ResNet50, and classifier with ImageNet dataset. Then we will save this complete Tensorflow model graph locally before registering it to your Azure ML Workspace.\n",
|
|
||||||
"\n",
|
|
||||||
"### 2.a. Image preprocessing\n",
|
|
||||||
"We'd like our service to accept JPEG images as input. However the input to ResNet50 is a tensor. So we need code that decodes JPEG images and does the preprocessing required by ResNet50. The Accelerated AI service can execute TensorFlow graphs as part of the service and we'll use that ability to do the image preprocessing. This code defines a TensorFlow graph that preprocesses an array of JPEG images (as strings) and produces a tensor that is ready to be featurized by ResNet50.\n",
|
|
||||||
"\n",
|
|
||||||
"**Note:** Expect to see TF deprecation warnings until we port our SDK over to use Tensorflow 2.0."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings\n",
|
|
||||||
"import azureml.accel.models.utils as utils\n",
|
|
||||||
"tf.reset_default_graph()\n",
|
|
||||||
"\n",
|
|
||||||
"in_images = tf.placeholder(tf.string)\n",
|
|
||||||
"image_tensors = utils.preprocess_array(in_images)\n",
|
|
||||||
"print(image_tensors.shape)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 2.b. Featurizer\n",
|
|
||||||
"We use ResNet50 as a featurizer. In this step we initialize the model. This downloads a TensorFlow checkpoint of the quantized ResNet50."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.accel.models import QuantizedResnet50\n",
|
|
||||||
"save_path = os.path.expanduser('~/models')\n",
|
|
||||||
"model_graph = QuantizedResnet50(save_path, is_frozen = True)\n",
|
|
||||||
"feature_tensor = model_graph.import_graph_def(image_tensors)\n",
|
|
||||||
"print(model_graph.version)\n",
|
|
||||||
"print(feature_tensor.name)\n",
|
|
||||||
"print(feature_tensor.shape)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 2.c. Classifier\n",
|
|
||||||
"The model we downloaded includes a classifier which takes the output of the ResNet50 and identifies an image. This classifier is trained on the ImageNet dataset. We are going to use this classifier for our service. The next [notebook](./accelerated-models-training.ipynb) shows how to train a classifier for a different data set. The input to the classifier is a tensor matching the output of our ResNet50 featurizer."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"classifier_output = model_graph.get_default_classifier(feature_tensor)\n",
|
|
||||||
"print(classifier_output)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 2.d. Save Model\n",
|
|
||||||
"Now that we loaded all three parts of the tensorflow graph (preprocessor, resnet50 featurizer, and the classifier), we can save the graph and associated variables to a directory which we can register as an Azure ML Model."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# model_name must be lowercase\n",
|
|
||||||
"model_name = \"resnet50\"\n",
|
|
||||||
"model_save_path = os.path.join(save_path, model_name)\n",
|
|
||||||
"print(\"Saving model in {}\".format(model_save_path))\n",
|
|
||||||
"\n",
|
|
||||||
"with tf.Session() as sess:\n",
|
|
||||||
" model_graph.restore_weights(sess)\n",
|
|
||||||
" tf.saved_model.simple_save(sess, model_save_path,\n",
|
|
||||||
" inputs={'images': in_images},\n",
|
|
||||||
" outputs={'output_alias': classifier_output})"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 2.e. Important! Save names of input and output tensors\n",
|
|
||||||
"\n",
|
|
||||||
"These input and output tensors that were created during the preprocessing and classifier steps are also going to be used when **converting the model** to an Accelerated Model that can run on FPGA's and for **making an inferencing request**. It is very important to save this information! You can see our defaults for all the models in the [README](./README.md).\n",
|
|
||||||
"\n",
|
|
||||||
"By default for Resnet50, these are the values you should see when running the cell below: \n",
|
|
||||||
"* input_tensors = \"Placeholder:0\"\n",
|
|
||||||
"* output_tensors = \"classifier/resnet_v1_50/predictions/Softmax:0\""
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"register model from file"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"input_tensors = in_images.name\n",
|
|
||||||
"output_tensors = classifier_output.name\n",
|
|
||||||
"\n",
|
|
||||||
"print(input_tensors)\n",
|
|
||||||
"print(output_tensors)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"register-model\"></a>\n",
|
|
||||||
"## 3. Register Model"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"You can add tags and descriptions to your models. Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"register model from file"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.model import Model\n",
|
|
||||||
"\n",
|
|
||||||
"registered_model = Model.register(workspace = ws,\n",
|
|
||||||
" model_path = model_save_path,\n",
|
|
||||||
" model_name = model_name)\n",
|
|
||||||
"\n",
|
|
||||||
"print(\"Successfully registered: \", registered_model.name, registered_model.description, registered_model.version, sep = '\\t')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"convert-model\"></a>\n",
|
|
||||||
"## 4. Convert Model"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"For conversion you need to provide names of input and output tensors. This information can be found from the model_graph you saved in step 2.e. above.\n",
|
|
||||||
"\n",
|
|
||||||
"**Note**: Conversion may take a while and on average for FPGA model it is about 1-3 minutes and it depends on model type."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"register model from file"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.accel import AccelOnnxConverter\n",
|
|
||||||
"\n",
|
|
||||||
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
|
|
||||||
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
|
|
||||||
"convert_request.wait_for_completion(show_output = False)\n",
|
|
||||||
"# If the above call succeeded, get the converted model\n",
|
|
||||||
"converted_model = convert_request.result\n",
|
|
||||||
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
|
||||||
" converted_model.id, converted_model.created_time, '\\n')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"create-image\"></a>\n",
|
|
||||||
"## 5. Package the model into an Image"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"You can add tags and descriptions to image. Also, for FPGA model an image can only contain **single** model.\n",
|
|
||||||
"\n",
|
|
||||||
"**Note**: The following command can take few minutes. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.image import Image\n",
|
|
||||||
"from azureml.accel import AccelContainerImage\n",
|
|
||||||
"\n",
|
|
||||||
"image_config = AccelContainerImage.image_configuration()\n",
|
|
||||||
"# Image name must be lowercase\n",
|
|
||||||
"image_name = \"{}-image\".format(model_name)\n",
|
|
||||||
"\n",
|
|
||||||
"image = Image.create(name = image_name,\n",
|
|
||||||
" models = [converted_model],\n",
|
|
||||||
" image_config = image_config, \n",
|
|
||||||
" workspace = ws)\n",
|
|
||||||
"image.wait_for_creation(show_output = False)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"deploy-image\"></a>\n",
|
|
||||||
"## 6. Deploy\n",
|
|
||||||
"Once you have an Azure ML Accelerated Image in your Workspace, you can deploy it to two destinations, to a Databox Edge machine or to an AKS cluster. \n",
|
|
||||||
"\n",
|
|
||||||
"### 6.a. Databox Edge Machine using IoT Hub\n",
|
|
||||||
"See the sample [here](https://github.com/Azure-Samples/aml-real-time-ai/) for using the Azure IoT CLI extension for deploying your Docker image to your Databox Edge Machine.\n",
|
|
||||||
"\n",
|
|
||||||
"### 6.b. Azure Kubernetes Service (AKS) using Azure ML Service\n",
|
|
||||||
"We are going to create an AKS cluster with FPGA-enabled machines, then deploy our service to it. For more information, see [AKS official docs](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#aks).\n",
|
|
||||||
"\n",
|
|
||||||
"#### Create AKS ComputeTarget"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
|
||||||
"\n",
|
|
||||||
"# Uses the specific FPGA enabled VM (sku: Standard_PB6s)\n",
|
|
||||||
"# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia\n",
|
|
||||||
"prov_config = AksCompute.provisioning_configuration(vm_size = \"Standard_PB6s\",\n",
|
|
||||||
" agent_count = 1, \n",
|
|
||||||
" location = \"eastus\")\n",
|
|
||||||
"\n",
|
|
||||||
"aks_name = 'my-aks-pb6'\n",
|
|
||||||
"# Create the cluster\n",
|
|
||||||
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
|
||||||
" name = aks_name, \n",
|
|
||||||
" provisioning_configuration = prov_config)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Provisioning an AKS cluster might take awhile (15 or so minutes), and we want to wait until it's successfully provisioned before we can deploy a service to it. If you interrupt this cell, provisioning of the cluster will continue. You can also check the status in your Workspace under Compute."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"aks_target.wait_for_completion(show_output = True)\n",
|
|
||||||
"print(aks_target.provisioning_state)\n",
|
|
||||||
"print(aks_target.provisioning_errors)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Deploy AccelContainerImage to AKS ComputeTarget"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
|
||||||
"\n",
|
|
||||||
"#Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
|
||||||
"# Authentication is enabled by default, but for testing we specify False\n",
|
|
||||||
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
|
|
||||||
" num_replicas=1,\n",
|
|
||||||
" auth_enabled = False)\n",
|
|
||||||
"\n",
|
|
||||||
"aks_service_name ='my-aks-service'\n",
|
|
||||||
"\n",
|
|
||||||
"aks_service = Webservice.deploy_from_image(workspace = ws,\n",
|
|
||||||
" name = aks_service_name,\n",
|
|
||||||
" image = image,\n",
|
|
||||||
" deployment_config = aks_config,\n",
|
|
||||||
" deployment_target = aks_target)\n",
|
|
||||||
"aks_service.wait_for_deployment(show_output = True)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"test-service\"></a>\n",
|
|
||||||
"## 7. Test the service"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 7.a. Create Client\n",
|
|
||||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions.\n",
|
|
||||||
"\n",
|
|
||||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice, see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
|
|
||||||
"\n",
|
|
||||||
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
|
||||||
"from azureml.accel.client import PredictionClient\n",
|
|
||||||
"\n",
|
|
||||||
"address = aks_service.scoring_uri\n",
|
|
||||||
"ssl_enabled = address.startswith(\"https\")\n",
|
|
||||||
"address = address[address.find('/')+2:].strip('/')\n",
|
|
||||||
"port = 443 if ssl_enabled else 80\n",
|
|
||||||
"\n",
|
|
||||||
"# Initialize AzureML Accelerated Models client\n",
|
|
||||||
"client = PredictionClient(address=address,\n",
|
|
||||||
" port=port,\n",
|
|
||||||
" use_ssl=ssl_enabled,\n",
|
|
||||||
" service_name=aks_service.name)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"You can adapt the client [code](https://github.com/Azure/aml-real-time-ai/blob/master/pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](https://github.com/Azure/aml-real-time-ai/blob/master/sample-clients/csharp).\n",
|
|
||||||
"\n",
|
|
||||||
"The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup)."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### 7.b. Serve the model\n",
|
|
||||||
"To understand the results we need a mapping to the human readable imagenet classes"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import requests\n",
|
|
||||||
"classes_entries = requests.get(\"https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt\").text.splitlines()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Score image with input and output tensor names\n",
|
|
||||||
"results = client.score_file(path=\"./snowleopardgaze.jpg\", \n",
|
|
||||||
" input_name=input_tensors, \n",
|
|
||||||
" outputs=output_tensors)\n",
|
|
||||||
"\n",
|
|
||||||
"# map results [class_id] => [confidence]\n",
|
|
||||||
"results = enumerate(results)\n",
|
|
||||||
"# sort results by confidence\n",
|
|
||||||
"sorted_results = sorted(results, key=lambda x: x[1], reverse=True)\n",
|
|
||||||
"# print top 5 results\n",
|
|
||||||
"for top in sorted_results[:5]:\n",
|
|
||||||
" print(classes_entries[top[0]], 'confidence:', top[1])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"<a id=\"clean-up\"></a>\n",
|
|
||||||
"## 8. Clean-up\n",
|
|
||||||
"Run the cell below to delete your webservice, image, and model (must be done in that order). In the [next notebook](./accelerated-models-training.ipynb) you will learn how to train a classfier on a new dataset using transfer learning and finetune the weights."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"aks_service.delete()\n",
|
|
||||||
"aks_target.delete()\n",
|
|
||||||
"image.delete()\n",
|
|
||||||
"registered_model.delete()\n",
|
|
||||||
"converted_model.delete()"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "coverste"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "paledger"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "aibhalla"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3.6",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python36"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
|
||||||