mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 09:37:04 -05:00
Compare commits
91 Commits
cli
...
sdk-codete
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6a145086d8 | ||
|
|
cb695c91ce | ||
|
|
de505d67bd | ||
|
|
f19cfa4630 | ||
|
|
7eed2e4b56 | ||
|
|
57b0f701f8 | ||
|
|
7db93bcb1d | ||
|
|
fcbe925640 | ||
|
|
bedfbd649e | ||
|
|
fb760f648d | ||
|
|
a9a0713d2f | ||
|
|
c9d018b52c | ||
|
|
53dbd0afcf | ||
|
|
e3a64b1f16 | ||
|
|
732eecfc7c | ||
|
|
6995c086ff | ||
|
|
80bba4c7ae | ||
|
|
3c581b533f | ||
|
|
cc688caa4e | ||
|
|
da225e116e | ||
|
|
73c5d02880 | ||
|
|
e472b54f1b | ||
|
|
716c6d8bb1 | ||
|
|
23189c6f40 | ||
|
|
361b57ed29 | ||
|
|
3f531fd211 | ||
|
|
111f5e8d73 | ||
|
|
96c59d5c2b | ||
|
|
ce3214b7c6 | ||
|
|
53199d17de | ||
|
|
54c883412c | ||
|
|
d58d57ca44 | ||
|
|
b3cc1b61a2 | ||
|
|
a4792d95ac | ||
|
|
216aa8b6a1 | ||
|
|
9814955b37 | ||
|
|
c96e9fdd5a | ||
|
|
47bd530c6b | ||
|
|
7e53333af6 | ||
|
|
0888050389 | ||
|
|
fb567152a4 | ||
|
|
6d50401af4 | ||
|
|
b1bde7328b | ||
|
|
7fc6b29de8 | ||
|
|
cff9606bf9 | ||
|
|
532799a22c | ||
|
|
90454d5a32 | ||
|
|
076b206515 | ||
|
|
b8b660e5a8 | ||
|
|
6005c0987d | ||
|
|
34eec6abc2 | ||
|
|
208c36b903 | ||
|
|
80e8a5e323 | ||
|
|
989511c581 | ||
|
|
d5c247b005 | ||
|
|
2bdd131b0c | ||
|
|
2c391a4486 | ||
|
|
87b6114156 | ||
|
|
9b701ebaeb | ||
|
|
758b0ee808 | ||
|
|
eeb4d92d7c | ||
|
|
b4df74c72e | ||
|
|
231c1062a8 | ||
|
|
92be6bfd19 | ||
|
|
b0b0756aed | ||
|
|
ff19151d0a | ||
|
|
933c1ffc4e | ||
|
|
f75faaa31e | ||
|
|
ae8874ad32 | ||
|
|
6c3abe2d03 | ||
|
|
4627080ff4 | ||
|
|
69af6e36fe | ||
|
|
e27ab9a58e | ||
|
|
c85e7e52af | ||
|
|
5598e07729 | ||
|
|
d9b62ad651 | ||
|
|
8aa287dadf | ||
|
|
9ab092a4d0 | ||
|
|
1a1a81621f | ||
|
|
d93daa3f38 | ||
|
|
2fb910b0e0 | ||
|
|
2879e00884 | ||
|
|
b574bfd3cf | ||
|
|
6a3b814394 | ||
|
|
1009ffab36 | ||
|
|
995fb1ac8c | ||
|
|
e418e4fbb2 | ||
|
|
cdbfa203e1 | ||
|
|
a9a9635e72 | ||
|
|
b568dc364f | ||
|
|
59bdd5a858 |
7
.amlignore
Normal file
7
.amlignore
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
.ipynb_checkpoints
|
||||||
|
azureml-logs
|
||||||
|
.azureml
|
||||||
|
.git
|
||||||
|
outputs
|
||||||
|
azureml-setup
|
||||||
|
docs
|
||||||
3
.vscode/settings.json
vendored
Normal file
3
.vscode/settings.json
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
{
|
||||||
|
"python.pythonPath": "C:\\Users\\sgilley\\.azureml\\envs\\jan3\\python.exe"
|
||||||
|
}
|
||||||
@@ -14,32 +14,39 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 00. Installation and configuration\n",
|
"# 00. Installation and configuration\n",
|
||||||
|
"This notebook configures your library of notebooks to connect to an Azure Machine Learning Workspace. In this case, a library contains all of the notebooks in the current folder and any nested folders. You can configure this notebook to use an existing workspace or create a new workspace.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Prerequisites:\n",
|
"## What is an Azure ML Workspace and why do I need one?\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### 1. Install Azure ML SDK\n",
|
"An AML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an AML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models."
|
||||||
"Follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment).\n",
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prerequisites"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### 1. Access Azure Subscription\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### 2. Install some additional packages\n",
|
"In order to create an AML Workspace, first you need access to an Azure Subscription. You can [create your own](https://azure.microsoft.com/en-us/free/) or get your existing subscription information from the [Azure portal](https://portal.azure.com).\n",
|
||||||
"This Notebook requires some additional libraries. In the conda environment, run below commands: \n",
|
"\n",
|
||||||
"```shell\n",
|
"### 2. If you're running on your own local environment, install Azure ML SDK and other libraries\n",
|
||||||
|
"\n",
|
||||||
|
"If you are running in your own environment, follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment). If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed.\n",
|
||||||
|
"\n",
|
||||||
|
"Also install following libraries to your environment. Many of the example notebooks depend on them\n",
|
||||||
|
"\n",
|
||||||
|
"```\n",
|
||||||
"(myenv) $ conda install -y matplotlib tqdm scikit-learn\n",
|
"(myenv) $ conda install -y matplotlib tqdm scikit-learn\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### 3. Make sure your subscription is registered to use ACI.\n",
|
"Once installation is complete, check the Azure ML SDK version:"
|
||||||
"This Notebook makes use of Azure Container Instance (ACI). You need to ensure your subscription has been registered to use ACI in order be able to deploy a dev/test web service.\n",
|
|
||||||
"```shell\n",
|
|
||||||
"# check to see if ACI is already registered\n",
|
|
||||||
"(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n",
|
|
||||||
"\n",
|
|
||||||
"# if ACI is not registered, run this command.\n",
|
|
||||||
"# note you need to be the subscription owner in order to execute this command successfully.\n",
|
|
||||||
"(myenv) $ az provider register -n Microsoft.ContainerInstance\n",
|
|
||||||
"```\n",
|
|
||||||
"\n",
|
|
||||||
"In this example you will optionally create an Azure Machine Learning Workspace and initialize your notebook directory to easily use this workspace. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n",
|
|
||||||
"\n",
|
|
||||||
"This notebook also contains optional cells to install and update the require Azure Machine Learning libraries."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -52,7 +59,6 @@
|
|||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Check core SDK version number for debugging purposes\n",
|
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"SDK Version:\", azureml.core.VERSION)"
|
"print(\"SDK Version:\", azureml.core.VERSION)"
|
||||||
@@ -62,20 +68,63 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize an Azure ML Workspace\n",
|
"### 3. Make sure your subscription is registered to use ACI\n",
|
||||||
"### What is an Azure ML Workspace and why do I need one?\n",
|
"Azure Machine Learning makes use of Azure Container Instance (ACI). You need to ensure your subscription has been registered to use ACI in order be able to deploy a dev/test web service. If you have run through the quickstart experience you have already performed this step. Otherwise you will need to use the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and execute the following commands.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"An AML Workspace is an Azure resource that organaizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an AML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n",
|
"```shell\n",
|
||||||
|
"# check to see if ACI is already registered\n",
|
||||||
|
"(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### What do I need\n",
|
"# if ACI is not registered, run this command.\n",
|
||||||
|
"# note you need to be the subscription owner in order to execute this command successfully.\n",
|
||||||
|
"(myenv) $ az provider register -n Microsoft.ContainerInstance\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Set up your Azure Machine Learning workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In order to use an AML Workspace, first you need access to an Azure Subscription. You can [create your own](https://azure.microsoft.com/en-us/free/) or get your existing subscription information from the [Azure portal](https://portal.azure.com). Inside your subscription, you will need access to a _resource group_, which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com)\n",
|
"### Option 1: You have workspace already\n",
|
||||||
|
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can also easily create a new resource group using azure-cli.\n",
|
"If you have a workspace created another way, [these instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#create-workspace-configuration-file) describe how to get your subscription and workspace information.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```sh\n",
|
"If this cell succeeds, you're done configuring this library! Otherwise continue to follow the instructions in the rest of the notebook."
|
||||||
"(myenv) $ az group create -n my_resource_group -l eastus2\n",
|
]
|
||||||
"```\n",
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"subscription_id ='<subscription-id>'\n",
|
||||||
|
"resource_group ='<resource-group>'\n",
|
||||||
|
"workspace_name = '<workspace-name>'\n",
|
||||||
|
"\n",
|
||||||
|
"try:\n",
|
||||||
|
" ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)\n",
|
||||||
|
" ws.write_config()\n",
|
||||||
|
" print('Workspace configuration succeeded. You are all set!')\n",
|
||||||
|
"except:\n",
|
||||||
|
" print('Workspace not found. Run the cells below.')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Option 2: You don't have workspace yet\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"#### Requirements\n",
|
||||||
|
"\n",
|
||||||
|
"Inside your Azure subscription, you will need access to a _resource group_, which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To create or access an Azure ML Workspace, you will need to import the AML library and the following information:\n",
|
"To create or access an Azure ML Workspace, you will need to import the AML library and the following information:\n",
|
||||||
"* A name for your workspace\n",
|
"* A name for your workspace\n",
|
||||||
@@ -89,8 +138,17 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Supported Azure Regions\n",
|
"#### Supported Azure Regions\n",
|
||||||
"Please specify the Azure subscription Id, resource group name, workspace name, and the region in which you want to create the workspace, for example \"eastus2\". "
|
"Specify a region where your workspace will be located from the list of [Azure Machine Learning regions](https://linktoregions)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"workspace_region = \"eastus2\""
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -101,21 +159,22 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
"subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"<my-subscription-id>\")\n",
|
"subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", subscription_id)\n",
|
||||||
"resource_group = os.environ.get(\"RESOURCE_GROUP\", \"<my-rg>\")\n",
|
"resource_group = os.environ.get(\"RESOURCE_GROUP\", resource_group)\n",
|
||||||
"workspace_name = os.environ.get(\"WORKSPACE_NAME\", \"<my-workspace>\")\n",
|
"workspace_name = os.environ.get(\"WORKSPACE_NAME\", workspace_name)\n",
|
||||||
"workspace_region = os.environ.get(\"WORKSPACE_REGION\", \"eastus2\")"
|
"workspace_region = os.environ.get(\"WORKSPACE_REGION\", workspace_region)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Creating a workspace\n",
|
"#### Create the workspace\n",
|
||||||
"If you already have access to an AML Workspace you want to use, you can skip this cell. Otherwise, this cell will create an AML workspace for you in a subscription provided you have the correct permissions.\n",
|
"This cell will create an AML workspace for you in a subscription provided you have the correct permissions.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This will fail when:\n",
|
"This will fail when:\n",
|
||||||
"1. You do not have permission to create a workspace in the resource group\n",
|
"1. You do not have permission to create a workspace in the resource group\n",
|
||||||
|
"2. You do not have permission to create a resource group if it's non-existing.\n",
|
||||||
"2. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
"2. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
||||||
@@ -138,58 +197,12 @@
|
|||||||
" subscription_id = subscription_id,\n",
|
" subscription_id = subscription_id,\n",
|
||||||
" resource_group = resource_group, \n",
|
" resource_group = resource_group, \n",
|
||||||
" location = workspace_region,\n",
|
" location = workspace_region,\n",
|
||||||
|
" create_resource_group = True,\n",
|
||||||
" exist_ok = True)\n",
|
" exist_ok = True)\n",
|
||||||
"ws.get_details()"
|
"ws.get_details()\n",
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Configuring your local environment\n",
|
|
||||||
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"create workspace"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"ws = Workspace(workspace_name = workspace_name,\n",
|
|
||||||
" subscription_id = subscription_id,\n",
|
|
||||||
" resource_group = resource_group)\n",
|
|
||||||
"\n",
|
|
||||||
"# persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
|
||||||
"ws.write_config()"
|
"ws.write_config()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"You can then load the workspace from this config file from any notebook in the current directory."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"tags": [
|
|
||||||
"create workspace"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# load workspace configuratio from ./aml_config/config.json file.\n",
|
|
||||||
"my_workspace = Workspace.from_config()\n",
|
|
||||||
"my_workspace.get_details()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -215,7 +228,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.4"
|
"version": "3.6.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -457,7 +457,8 @@
|
|||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"models = ws.models(name='best_model')\n",
|
"from azureml.core.model import Model\n",
|
||||||
|
"models = Model.list(workspace=ws, name='best_model')\n",
|
||||||
"for m in models:\n",
|
"for m in models:\n",
|
||||||
" print(m.name, m.version)"
|
" print(m.name, m.version)"
|
||||||
]
|
]
|
||||||
@@ -787,6 +788,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -218,7 +218,7 @@
|
|||||||
"run_config_system_managed = RunConfiguration()\n",
|
"run_config_system_managed = RunConfiguration()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"run_config_system_managed.environment.python.user_managed_dependencies = False\n",
|
"run_config_system_managed.environment.python.user_managed_dependencies = False\n",
|
||||||
"run_config_system_managed.prepare_environment = True\n",
|
"run_config_system_managed.auto_prepare_environment = True\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Specify conda dependencies with scikit-learn\n",
|
"# Specify conda dependencies with scikit-learn\n",
|
||||||
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
|
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
|
||||||
@@ -297,7 +297,7 @@
|
|||||||
"run_config_docker = RunConfiguration()\n",
|
"run_config_docker = RunConfiguration()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"run_config_docker.environment.python.user_managed_dependencies = False\n",
|
"run_config_docker.environment.python.user_managed_dependencies = False\n",
|
||||||
"run_config_docker.prepare_environment = True\n",
|
"run_config_docker.auto_prepare_environment = True\n",
|
||||||
"run_config_docker.environment.docker.enabled = True\n",
|
"run_config_docker.environment.docker.enabled = True\n",
|
||||||
"run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
"run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -442,6 +442,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ os.makedirs('./outputs', exist_ok=True)
|
|||||||
|
|
||||||
X, y = load_diabetes(return_X_y=True)
|
X, y = load_diabetes(return_X_y=True)
|
||||||
|
|
||||||
run = Run.get_submitted_run()
|
run = Run.get_context()
|
||||||
|
|
||||||
X_train, X_test, y_train, y_test = train_test_split(X, y,
|
X_train, X_test, y_train, y_test = train_test_split(X, y,
|
||||||
test_size=0.2,
|
test_size=0.2,
|
||||||
|
|||||||
@@ -261,6 +261,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ os.makedirs('./outputs', exist_ok=True)
|
|||||||
|
|
||||||
X, y = load_diabetes(return_X_y=True)
|
X, y = load_diabetes(return_X_y=True)
|
||||||
|
|
||||||
run = Run.get_submitted_run()
|
run = Run.get_context()
|
||||||
|
|
||||||
X_train, X_test, y_train, y_test = train_test_split(X, y,
|
X_train, X_test, y_train, y_test = train_test_split(X, y,
|
||||||
test_size=0.2,
|
test_size=0.2,
|
||||||
|
|||||||
@@ -13,12 +13,16 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 04. Train in a remote VM (MLC managed DSVM)\n",
|
"# 04. Train in a remote Linux VM\n",
|
||||||
"* Create Workspace\n",
|
"* Create Workspace\n",
|
||||||
"* Create Project\n",
|
|
||||||
"* Create `train.py` file\n",
|
"* Create `train.py` file\n",
|
||||||
"* Create DSVM as Machine Learning Compute (MLC) resource\n",
|
"* Create (or attach) DSVM as compute resource.\n",
|
||||||
"* Configure & execute a run in a conda environment in the default miniconda Docker container on DSVM"
|
"* Upoad data files into default datastore\n",
|
||||||
|
"* Configure & execute a run in a few different ways\n",
|
||||||
|
" - Use system-built conda\n",
|
||||||
|
" - Use existing Python environment\n",
|
||||||
|
" - Use Docker \n",
|
||||||
|
"* Find the best model in the run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -80,7 +84,6 @@
|
|||||||
"experiment_name = 'train-on-remote-vm'\n",
|
"experiment_name = 'train-on-remote-vm'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"\n",
|
|
||||||
"exp = Experiment(workspace=ws, name=experiment_name)"
|
"exp = Experiment(workspace=ws, name=experiment_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -88,9 +91,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## View `train.py`\n",
|
"Let's also create a local folder to hold the training script."
|
||||||
"\n",
|
|
||||||
"For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -99,7 +100,87 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"with open('./train.py', 'r') as training_script:\n",
|
"import os\n",
|
||||||
|
"script_folder = './vm-run'\n",
|
||||||
|
"os.makedirs(script_folder, exist_ok=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Upload data files into datastore\n",
|
||||||
|
"Every workspace comes with a default datastore (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# get the default datastore\n",
|
||||||
|
"ds = ws.get_default_datastore()\n",
|
||||||
|
"print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Load diabetes data from `scikit-learn` and save it as 2 local files."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"\n",
|
||||||
|
"training_data = load_diabetes()\n",
|
||||||
|
"np.save(file='./features.npy', arr=training_data['data'])\n",
|
||||||
|
"np.save(file='./labels.npy', arr=training_data['target'])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now let's upload the 2 files into the default datastore under a path named `diabetes`:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ds.upload_files(['./features.npy', './labels.npy'], target_path='diabetes', overwrite=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## View `train.py`\n",
|
||||||
|
"\n",
|
||||||
|
"For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file. Please pay special attention on how we are loading the features and labels from files in the `data_folder` path, which is passed in as an argument of the training script (shown later)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# copy train.py into the script folder\n",
|
||||||
|
"import shutil\n",
|
||||||
|
"shutil.copy('./train.py', os.path.join(script_folder, 'train.py'))\n",
|
||||||
|
"\n",
|
||||||
|
"with open(os.path.join(script_folder, './train.py'), 'r') as training_script:\n",
|
||||||
" print(training_script.read())"
|
" print(training_script.read())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -111,7 +192,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"**Note**: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n",
|
"**Note**: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n",
|
||||||
" \n",
|
" \n",
|
||||||
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can append the port number to the address like the example below."
|
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can specify the port number in the provisioning configuration object."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -139,7 +220,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Attach an existing Linux DSVM as a compute target\n"
|
"## Attach an existing Linux DSVM\n",
|
||||||
|
"You can also attach an existing Linux VM as a compute target. The default port is 22."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -151,15 +233,220 @@
|
|||||||
"'''\n",
|
"'''\n",
|
||||||
"from azureml.core.compute import RemoteCompute \n",
|
"from azureml.core.compute import RemoteCompute \n",
|
||||||
"# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n",
|
"# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n",
|
||||||
" dsvm_compute = RemoteCompute.attach(ws,name=\"attach-from-sdk6\",username=<username>,address=<ipaddress>,ssh_port=22,password=<password>)\n",
|
"attached_dsvm_compute = RemoteCompute.attach(workspace=ws,\n",
|
||||||
"'''"
|
" name=\"attached_vm\",\n",
|
||||||
|
" username='<usename>',\n",
|
||||||
|
" address='<ip_adress_or_fqdn>',\n",
|
||||||
|
" ssh_port=22,\n",
|
||||||
|
" password='<password>')\n",
|
||||||
|
"attached_dsvm_compute.wait_for_completion(show_output=True)\n",
|
||||||
|
"'''\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configure & Run"
|
"## Configure & Run\n",
|
||||||
|
"First let's create a `DataReferenceConfiguration` object to inform the system what data folder to download to the copmute target."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import DataReferenceConfiguration\n",
|
||||||
|
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
|
||||||
|
" path_on_datastore='diabetes', \n",
|
||||||
|
" mode='download', # download files from datastore to compute target\n",
|
||||||
|
" overwrite=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now we can try a few different ways to run the training script in the VM."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Conda run\n",
|
||||||
|
"You can ask the system to build a conda environment based on your dependency specification, and submit your script to run there. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to the Linux DSVM\n",
|
||||||
|
"conda_run_config.target = dsvm_compute.name\n",
|
||||||
|
"\n",
|
||||||
|
"# set the data reference of the run configuration\n",
|
||||||
|
"conda_run_config.data_references = {ds.name: dr}\n",
|
||||||
|
"\n",
|
||||||
|
"# specify CondaDependencies obj\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Run\n",
|
||||||
|
"from azureml.core import ScriptRunConfig\n",
|
||||||
|
"\n",
|
||||||
|
"src = ScriptRunConfig(source_directory=script_folder, \n",
|
||||||
|
" script='train.py', \n",
|
||||||
|
" run_config=conda_run_config, \n",
|
||||||
|
" # pass the datastore reference as a parameter to the training script\n",
|
||||||
|
" arguments=['--data-folder', str(ds.as_download())] \n",
|
||||||
|
" ) \n",
|
||||||
|
"run = exp.submit(config=src)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Show the run object. You can navigate to the Azure portal to see detailed information about the run."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Native VM run\n",
|
||||||
|
"You can also configure to use an exiting Python environment in the VM to execute the script without asking the system to create a conda environment for you."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"vm_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to the Linux DSVM\n",
|
||||||
|
"vm_run_config.target = dsvm_compute.name\n",
|
||||||
|
"\n",
|
||||||
|
"# set the data reference of the run coonfiguration\n",
|
||||||
|
"conda_run_config.data_references = {ds.name: dr}\n",
|
||||||
|
"\n",
|
||||||
|
"# Let system know that you will configure the Python environment yourself.\n",
|
||||||
|
"vm_run_config.environment.python.user_managed_dependencies = True"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The below run will likely fail because `train.py` needs dependency `azureml`, `scikit-learn` and others, which are not found in that Python environment. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"src = ScriptRunConfig(source_directory=script_folder, \n",
|
||||||
|
" script='train.py', \n",
|
||||||
|
" run_config=vm_run_config,\n",
|
||||||
|
" # pass the datastore reference as a parameter to the training script\n",
|
||||||
|
" arguments=['--data-folder', str(ds.as_download())])\n",
|
||||||
|
"run = exp.submit(config=src)\n",
|
||||||
|
"run.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can choose to SSH into the VM and install Azure ML SDK, and any other missing dependencies, in that Python environment. For demonstration purposes, we simply are going to create another script `train2.py` that doesn't have azureml dependencies, and submit it instead."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile $script_folder/train2.py\n",
|
||||||
|
"\n",
|
||||||
|
"print('####################################')\n",
|
||||||
|
"print('Hello World (without Azure ML SDK)!')\n",
|
||||||
|
"print('####################################')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now let's try again. And this time it should work fine."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"src = ScriptRunConfig(source_directory=script_folder, \n",
|
||||||
|
" script='train2.py', \n",
|
||||||
|
" run_config=vm_run_config)\n",
|
||||||
|
"run = exp.submit(config=src)\n",
|
||||||
|
"run.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Note even in this case you get a run record with some basic statistics."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -167,7 +454,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Configure a Docker run with new conda environment on the VM\n",
|
"### Configure a Docker run with new conda environment on the VM\n",
|
||||||
"You can execute in a Docker container in the VM. If you choose this route, you don't need to install anything on the VM yourself. Azure ML execution service will take care of it for you."
|
"You can execute in a Docker container in the VM. If you choose this option, the system will pull down a base Docker image, build a new conda environment in it if you ask for (you can also skip this if you are using a customer Docker image when a preconfigured Python environment), start a container, and run your script in there. This image is also uploaded into your ACR (Azure Container Registry) assoicated with your workspace, an reused if your dependencies don't change in the subsequent runs."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -181,26 +468,23 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Load the \"cpu-dsvm.runconfig\" file (created by the above attach operation) in memory\n",
|
"# Load the \"cpu-dsvm.runconfig\" file (created by the above attach operation) in memory\n",
|
||||||
"run_config = RunConfiguration(framework = \"python\")\n",
|
"docker_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Set compute target to the Linux DSVM\n",
|
"# Set compute target to the Linux DSVM\n",
|
||||||
"run_config.target = compute_target_name\n",
|
"docker_run_config.target = dsvm_compute.name\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Use Docker in the remote VM\n",
|
"# Use Docker in the remote VM\n",
|
||||||
"run_config.environment.docker.enabled = True\n",
|
"docker_run_config.environment.docker.enabled = True\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Use CPU base image from DockerHub\n",
|
"# Use CPU base image from DockerHub\n",
|
||||||
"run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
"docker_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
"print('Base Docker image is:', run_config.environment.docker.base_image)\n",
|
"print('Base Docker image is:', docker_run_config.environment.docker.base_image)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Ask system to provision a new one based on the conda_dependencies.yml file\n",
|
"# set the data reference of the run coonfiguration\n",
|
||||||
"run_config.environment.python.user_managed_dependencies = False\n",
|
"docker_run_config.data_references = {ds.name: dr}\n",
|
||||||
"\n",
|
|
||||||
"# Prepare the Docker and conda environment automatically when executingfor the first time.\n",
|
|
||||||
"run_config.prepare_environment = True\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# specify CondaDependencies obj\n",
|
"# specify CondaDependencies obj\n",
|
||||||
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
|
"docker_run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -217,11 +501,21 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Run\n",
|
"src = ScriptRunConfig(source_directory=script_folder, \n",
|
||||||
"from azureml.core import ScriptRunConfig\n",
|
" script='train.py', \n",
|
||||||
"\n",
|
" run_config=docker_run_config,\n",
|
||||||
"src = ScriptRunConfig(source_directory = '.', script = 'train.py', run_config = run_config)\n",
|
" # pass the datastore reference as a parameter to the training script\n",
|
||||||
"run = exp.submit(src)"
|
" arguments=['--data-folder', str(ds.as_download())])\n",
|
||||||
|
"run = exp.submit(config=src)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run.wait_for_completion(show_output=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -241,19 +535,17 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "markdown",
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output = True)"
|
"### Find the best model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Find the best run"
|
"Now we have tried various execution modes, we can find the best model from the last run."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -273,10 +565,13 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import numpy as np\n",
|
"# find the index where MSE is the smallest\n",
|
||||||
|
"indices = list(range(0, len(metrics['mse'])))\n",
|
||||||
|
"min_mse_index = min(indices, key=lambda x: metrics['mse'][x])\n",
|
||||||
|
"\n",
|
||||||
"print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n",
|
"print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n",
|
||||||
" min(metrics['mse']), \n",
|
" metrics['mse'][min_mse_index], \n",
|
||||||
" metrics['alpha'][np.argmin(metrics['mse'])]\n",
|
" metrics['alpha'][min_mse_index]\n",
|
||||||
"))"
|
"))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -298,6 +593,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "haining"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
@@ -313,7 +613,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -2,7 +2,8 @@
|
|||||||
# Licensed under the MIT license.
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
import os
|
import os
|
||||||
from sklearn.datasets import load_diabetes
|
import argparse
|
||||||
|
|
||||||
from sklearn.linear_model import Ridge
|
from sklearn.linear_model import Ridge
|
||||||
from sklearn.metrics import mean_squared_error
|
from sklearn.metrics import mean_squared_error
|
||||||
from sklearn.model_selection import train_test_split
|
from sklearn.model_selection import train_test_split
|
||||||
@@ -12,10 +13,18 @@ from sklearn.externals import joblib
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
os.makedirs('./outputs', exist_ok=True)
|
os.makedirs('./outputs', exist_ok=True)
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument('--data-folder', type=str,
|
||||||
|
dest='data_folder', help='data folder')
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
X, y = load_diabetes(return_X_y=True)
|
print('Data folder is at:', args.data_folder)
|
||||||
|
print('List all files: ', os.listdir(args.data_folder))
|
||||||
|
|
||||||
run = Run.get_submitted_run()
|
X = np.load(os.path.join(args.data_folder, 'features.npy'))
|
||||||
|
y = np.load(os.path.join(args.data_folder, 'labels.npy'))
|
||||||
|
|
||||||
|
run = Run.get_context()
|
||||||
|
|
||||||
X_train, X_test, y_train, y_test = train_test_split(
|
X_train, X_test, y_train, y_test = train_test_split(
|
||||||
X, y, test_size=0.2, random_state=0)
|
X, y, test_size=0.2, random_state=0)
|
||||||
|
|||||||
@@ -77,7 +77,6 @@
|
|||||||
"experiment_name = 'train-on-spark'\n",
|
"experiment_name = 'train-on-spark'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"\n",
|
|
||||||
"exp = Experiment(workspace=ws, name=experiment_name)"
|
"exp = Experiment(workspace=ws, name=experiment_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -107,13 +106,95 @@
|
|||||||
"## Configure & Run"
|
"## Configure & Run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Configure an ACI run\n",
|
||||||
|
"Before you try running on an actual Spark cluster, you can use a Docker image with Spark already baked in, and run it in ACI(Azure Container Registry)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# use pyspark framework\n",
|
||||||
|
"aci_run_config = RunConfiguration(framework=\"pyspark\")\n",
|
||||||
|
"\n",
|
||||||
|
"# use ACI to run the Spark job\n",
|
||||||
|
"aci_run_config.target = 'containerinstance'\n",
|
||||||
|
"aci_run_config.container_instance.region = 'eastus2'\n",
|
||||||
|
"aci_run_config.container_instance.cpu_cores = 1\n",
|
||||||
|
"aci_run_config.container_instance.memory_gb = 2\n",
|
||||||
|
"\n",
|
||||||
|
"# specify base Docker image to use\n",
|
||||||
|
"aci_run_config.environment.docker.enabled = True\n",
|
||||||
|
"aci_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_MMLSPARK_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"# specify CondaDependencies\n",
|
||||||
|
"cd = CondaDependencies()\n",
|
||||||
|
"cd.add_conda_package('numpy')\n",
|
||||||
|
"aci_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Submit script to ACI to run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import ScriptRunConfig\n",
|
||||||
|
"\n",
|
||||||
|
"script_run_config = ScriptRunConfig(source_directory = '.',\n",
|
||||||
|
" script= 'train-spark.py',\n",
|
||||||
|
" run_config = aci_run_config)\n",
|
||||||
|
"run = exp.submit(script_run_config)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"**Note** you can also create a new VM, or attach an existing VM, and use Docker-based execution to run the Spark job. Please see the `04.train-in-vm` for example on how to configure and run in Docker mode in a VM."
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Attach an HDI cluster\n",
|
"### Attach an HDI cluster\n",
|
||||||
"To use HDI commpute target:\n",
|
"Now we can use a real Spark cluster, HDInsight for Spark, to run this job. To use HDI commpute target:\n",
|
||||||
" 1. Create an Spark for HDI cluster in Azure. Here is some [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor, NOT CentOS.\n",
|
" 1. Create a Spark for HDI cluster in Azure. Here are some [quick instructions](https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-spark-sql). Make sure you use the Ubuntu flavor, NOT CentOS.\n",
|
||||||
" 2. Enter the IP address, username and password below"
|
" 2. Enter the IP address, username and password below"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -124,22 +205,22 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import HDInsightCompute\n",
|
"from azureml.core.compute import HDInsightCompute\n",
|
||||||
|
"from azureml.exceptions import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" # if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase\n",
|
" # if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase\n",
|
||||||
" hdi_compute_new = HDInsightCompute.attach(ws, \n",
|
" hdi_compute = HDInsightCompute.attach(workspace=ws, \n",
|
||||||
" name=\"hdi-attach\", \n",
|
" name=\"myhdi\", \n",
|
||||||
" address=\"hdi-ignite-demo-ssh.azurehdinsight.net\", \n",
|
" address=\"<myhdi-ssh>.azurehdinsight.net\", \n",
|
||||||
" ssh_port=22, \n",
|
" ssh_port=22, \n",
|
||||||
" username='<username>', \n",
|
" username='<ssh-username>', \n",
|
||||||
" password='<password>')\n",
|
" password='<ssh-pwd>')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"except UserErrorException as e:\n",
|
"except ComputeTargetException as e:\n",
|
||||||
" print(\"Caught = {}\".format(e.message))\n",
|
" print(\"Caught = {}\".format(e.message))\n",
|
||||||
" print(\"Compute config already attached.\")\n",
|
|
||||||
" \n",
|
" \n",
|
||||||
" \n",
|
" \n",
|
||||||
"hdi_compute_new.wait_for_completion(show_output=True)"
|
"hdi_compute.wait_for_completion(show_output=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -159,28 +240,16 @@
|
|||||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Load the \"cpu-dsvm.runconfig\" file (created by the above attach operation) in memory\n",
|
"# use pyspark framework\n",
|
||||||
"run_config = RunConfiguration(framework = \"python\")\n",
|
"hdi_run_config = RunConfiguration(framework=\"pyspark\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Set compute target to the Linux DSVM\n",
|
"# Set compute target to the HDI cluster\n",
|
||||||
"run_config.target = hdi_compute.name\n",
|
"hdi_run_config.target = hdi_compute.name\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Use Docker in the remote VM\n",
|
"# specify CondaDependencies object to ask system installing numpy\n",
|
||||||
"# run_config.environment.docker.enabled = True\n",
|
"cd = CondaDependencies()\n",
|
||||||
"\n",
|
"cd.add_conda_package('numpy')\n",
|
||||||
"# Use CPU base image from DockerHub\n",
|
"hdi_run_config.environment.python.conda_dependencies = cd"
|
||||||
"# run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
|
||||||
"# print('Base Docker image is:', run_config.environment.docker.base_image)\n",
|
|
||||||
"\n",
|
|
||||||
"# Ask system to provision a new one based on the conda_dependencies.yml file\n",
|
|
||||||
"run_config.environment.python.user_managed_dependencies = False\n",
|
|
||||||
"\n",
|
|
||||||
"# Prepare the Docker and conda environment automatically when executingfor the first time.\n",
|
|
||||||
"# run_config.prepare_environment = True\n",
|
|
||||||
"\n",
|
|
||||||
"# specify CondaDependencies obj\n",
|
|
||||||
"# run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
|
|
||||||
"# load the runconfig object from the \"myhdi.runconfig\" file generated by the attach operaton above."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -196,10 +265,12 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"from azureml.core import ScriptRunConfig\n",
|
||||||
|
"\n",
|
||||||
"script_run_config = ScriptRunConfig(source_directory = '.',\n",
|
"script_run_config = ScriptRunConfig(source_directory = '.',\n",
|
||||||
" script= 'train-spark.py',\n",
|
" script= 'train-spark.py',\n",
|
||||||
" run_config = run_config)\n",
|
" run_config = hdi_run_config)\n",
|
||||||
"run = experiment.submit(script_run_config)"
|
"run = exp.submit(config=script_run_config)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -218,7 +289,9 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output = True)"
|
"# get all metris logged in the run\n",
|
||||||
|
"metrics = run.get_metrics()\n",
|
||||||
|
"print(metrics)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -226,14 +299,15 @@
|
|||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": []
|
||||||
"# get all metris logged in the run\n",
|
|
||||||
"metrics = run.get_metrics()\n",
|
|
||||||
"print(metrics)"
|
|
||||||
]
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "aashishb"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
@@ -249,7 +323,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -18,7 +18,7 @@ from pyspark.sql.types import DoubleType, IntegerType, StringType
|
|||||||
from azureml.core.run import Run
|
from azureml.core.run import Run
|
||||||
|
|
||||||
# initialize logger
|
# initialize logger
|
||||||
run = Run.get_submitted_run()
|
run = Run.get_context()
|
||||||
|
|
||||||
# start Spark session
|
# start Spark session
|
||||||
spark = pyspark.sql.SparkSession.builder.appName('Iris').getOrCreate()
|
spark = pyspark.sql.SparkSession.builder.appName('Iris').getOrCreate()
|
||||||
|
|||||||
@@ -129,9 +129,9 @@
|
|||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"regression_models = ws.models(tags=['area'])\n",
|
"regression_models = Model.list(workspace=ws, tags=['area'])\n",
|
||||||
"for name, m in regression_models.items():\n",
|
"for m in regression_models:\n",
|
||||||
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -397,6 +397,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "raymondl"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -312,6 +312,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "raymondl"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -102,7 +102,7 @@
|
|||||||
"### b. In your init function add:\n",
|
"### b. In your init function add:\n",
|
||||||
"```python \n",
|
"```python \n",
|
||||||
"global inputs_dc, prediction_d\n",
|
"global inputs_dc, prediction_d\n",
|
||||||
"inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\". \"feat4\", \"feat5\", \"Feat6\"])\n",
|
"inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\", \"feat4\", \"feat5\", \"Feat6\"])\n",
|
||||||
"prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n",
|
"prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n",
|
||||||
" \n",
|
" \n",
|
||||||
"* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n",
|
"* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n",
|
||||||
@@ -286,7 +286,7 @@
|
|||||||
" create_name= 'myaks4'\n",
|
" create_name= 'myaks4'\n",
|
||||||
" aks_target = AksCompute.attach(workspace = ws, \n",
|
" aks_target = AksCompute.attach(workspace = ws, \n",
|
||||||
" name = create_name, \n",
|
" name = create_name, \n",
|
||||||
" #esource_id=resource_id)\n",
|
" resource_id=resource_id)\n",
|
||||||
" ## Wait for the operation to complete\n",
|
" ## Wait for the operation to complete\n",
|
||||||
" aks_target.wait_for_provisioning(True)```"
|
" aks_target.wait_for_provisioning(True)```"
|
||||||
]
|
]
|
||||||
@@ -424,6 +424,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "marthalc"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -0,0 +1,415 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Enabling App Insights for Services in Production\n",
|
||||||
|
"With this notebook, you can learn how to enable App Insights for standard service monitoring, plus, we provide examples for doing custom logging within a scoring files in a model. \n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"## What does Application Insights monitor?\n",
|
||||||
|
"It monitors request rates, response times, failure rates, etc. For more information visit [App Insights docs.](https://docs.microsoft.com/en-us/azure/application-insights/app-insights-overview)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"## What is different compared to standard production deployment process?\n",
|
||||||
|
"If you want to enable generic App Insights for a service run:\n",
|
||||||
|
"```python\n",
|
||||||
|
"aks_service= Webservice(ws, \"aks-w-dc2\")\n",
|
||||||
|
"aks_service.update(enable_app_insights=True)```\n",
|
||||||
|
"Where \"aks-w-dc2\" is your service name. You can also do this from the Azure Portal under your Workspace--> deployments--> Select deployment--> Edit--> Advanced Settings--> Select \"Enable AppInsights diagnostics\"\n",
|
||||||
|
"\n",
|
||||||
|
"If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n",
|
||||||
|
"1. Update scoring file.\n",
|
||||||
|
"2. Update aks configuration.\n",
|
||||||
|
"3. Build new image and deploy it. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 1. Import your dependencies"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace, Run\n",
|
||||||
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
|
"from azureml.core.image import Image\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"print(azureml.core.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 2. Set up your configuration and create a workspace\n",
|
||||||
|
"Follow Notebook 00 instructions to do this.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 3. Register Model\n",
|
||||||
|
"Register an existing trained model, add descirption and tags."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#Register the model\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
|
||||||
|
" model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
|
||||||
|
" tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
|
||||||
|
" description = \"Ridge regression model to predict diabetes\",\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"print(model.name, model.description, model.version)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 4. *Update your scoring file with custom print statements*\n",
|
||||||
|
"Here is an example:\n",
|
||||||
|
"### a. In your init function add:\n",
|
||||||
|
"```python\n",
|
||||||
|
"print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))```\n",
|
||||||
|
"\n",
|
||||||
|
"### b. In your run function add:\n",
|
||||||
|
"```python\n",
|
||||||
|
"print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
|
"print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy \n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from sklearn.linear_model import Ridge\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.monitoring import ModelDataCollector\n",
|
||||||
|
"import time\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" #Print statement for appinsights custom traces:\n",
|
||||||
|
" print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
|
" \n",
|
||||||
|
" # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n",
|
||||||
|
" # this call should return the path to the model.pkl file on the local disk.\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n",
|
||||||
|
" \n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
" \n",
|
||||||
|
" global inputs_dc, prediction_dc\n",
|
||||||
|
" \n",
|
||||||
|
" # this setup will help us save our inputs under the \"inputs\" path in our Azure Blob\n",
|
||||||
|
" inputs_dc = ModelDataCollector(model_name=\"sklearn_regression_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\"]) \n",
|
||||||
|
" \n",
|
||||||
|
" # this setup will help us save our ipredictions under the \"predictions\" path in our Azure Blob\n",
|
||||||
|
" prediction_dc = ModelDataCollector(\"sklearn_regression_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"]) \n",
|
||||||
|
" \n",
|
||||||
|
"# note you can pass in multiple rows for scoring\n",
|
||||||
|
"def run(raw_data):\n",
|
||||||
|
" global inputs_dc, prediction_dc\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(raw_data)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" \n",
|
||||||
|
" #Print statement for appinsights custom traces:\n",
|
||||||
|
" print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
|
" \n",
|
||||||
|
" #this call is saving our input data into our blob\n",
|
||||||
|
" inputs_dc.collect(data) \n",
|
||||||
|
" #this call is saving our prediction data into our blob\n",
|
||||||
|
" prediction_dc.collect(result)\n",
|
||||||
|
" \n",
|
||||||
|
" #Print statement for appinsights custom traces:\n",
|
||||||
|
" print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
|
" \n",
|
||||||
|
" return json.dumps({\"result\": result.tolist()})\n",
|
||||||
|
" \n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" print (result + time.strftime(\"%H:%M:%S\"))\n",
|
||||||
|
" return json.dumps({\"error\": result})"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 5. *Create myenv.yml file*"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
|
||||||
|
"\n",
|
||||||
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
|
" f.write(myenv.serialize_to_string())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 6. Create your new Image"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
|
" runtime = \"python\",\n",
|
||||||
|
" conda_file = \"myenv.yml\",\n",
|
||||||
|
" description = \"Image with ridge regression model\",\n",
|
||||||
|
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
|
||||||
|
" )\n",
|
||||||
|
"\n",
|
||||||
|
"image = ContainerImage.create(name = \"myimage1\",\n",
|
||||||
|
" # this is the model object\n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 7. Deploy to AKS service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create AKS compute if you haven't done so (Notebook 11)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Use the default configuration (can also provide parameters to customize)\n",
|
||||||
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
|
"\n",
|
||||||
|
"aks_name = 'my-aks-test1' \n",
|
||||||
|
"# Create the cluster\n",
|
||||||
|
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
||||||
|
" name = aks_name, \n",
|
||||||
|
" provisioning_configuration = prov_config)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"aks_target.wait_for_completion(show_output = True)\n",
|
||||||
|
"print(aks_target.provisioning_state)\n",
|
||||||
|
"print(aks_target.provisioning_errors)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"If you already have a cluster you can attach the service to it:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"```python \n",
|
||||||
|
"%%time\n",
|
||||||
|
"resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n",
|
||||||
|
"create_name= 'myaks4'\n",
|
||||||
|
"aks_target = AksCompute.attach(workspace = ws, \n",
|
||||||
|
" name = create_name, \n",
|
||||||
|
" #esource_id=resource_id)\n",
|
||||||
|
"## Wait for the operation to complete\n",
|
||||||
|
"aks_target.wait_for_provisioning(True)```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### a. *Activate App Insights through updating AKS Webservice configuration*\n",
|
||||||
|
"In order to enable App Insights in your service you will need to update your AKS configuration file:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#Set the web service configuration\n",
|
||||||
|
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### b. Deploy your service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"aks_service_name ='aks-w-dc3'\n",
|
||||||
|
"\n",
|
||||||
|
"aks_service = Webservice.deploy_from_image(workspace = ws, \n",
|
||||||
|
" name = aks_service_name,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" deployment_config = aks_config,\n",
|
||||||
|
" deployment_target = aks_target\n",
|
||||||
|
" )\n",
|
||||||
|
"aks_service.wait_for_deployment(show_output = True)\n",
|
||||||
|
"print(aks_service.state)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 8. Test your service "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"import json\n",
|
||||||
|
"\n",
|
||||||
|
"test_sample = json.dumps({'data': [\n",
|
||||||
|
" [1,28,13,45,54,6,57,8,8,10], \n",
|
||||||
|
" [101,9,8,37,6,45,4,3,2,41]\n",
|
||||||
|
"]})\n",
|
||||||
|
"test_sample = bytes(test_sample,encoding = 'utf8')\n",
|
||||||
|
"\n",
|
||||||
|
"prediction = aks_service.run(input_data = test_sample)\n",
|
||||||
|
"print(prediction)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 9. See your service telemetry in App Insights\n",
|
||||||
|
"1. Go to the [Azure Portal](https://portal.azure.com/)\n",
|
||||||
|
"2. All resources--> Select the subscription/resource group where you created your Workspace--> Select the App Insights type\n",
|
||||||
|
"3. Click on the AppInsights resource. You'll see a highlevel dashboard with information on Requests, Server response time and availability.\n",
|
||||||
|
"4. Click on the top banner \"Analytics\"\n",
|
||||||
|
"5. In the \"Schema\" section select \"traces\" and run your query.\n",
|
||||||
|
"6. Voila! All your custom traces should be there."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Disable App Insights"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"aks_service.update(enable_app_insights=False)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "marthalc"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
26
README.md
26
README.md
@@ -1,3 +1,9 @@
|
|||||||
|
Get the full documentation for Azure Machine Learning service at:
|
||||||
|
|
||||||
|
https://docs.microsoft.com/azure/machine-learning/service/
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
# Sample notebooks for Azure Machine Learning service
|
# Sample notebooks for Azure Machine Learning service
|
||||||
|
|
||||||
To run the notebooks in this repository use one of these methods:
|
To run the notebooks in this repository use one of these methods:
|
||||||
@@ -5,27 +11,23 @@ To run the notebooks in this repository use one of these methods:
|
|||||||
## Use Azure Notebooks - Jupyter based notebooks in the Azure cloud
|
## Use Azure Notebooks - Jupyter based notebooks in the Azure cloud
|
||||||
|
|
||||||
1. [](https://aka.ms/aml-clone-azure-notebooks)
|
1. [](https://aka.ms/aml-clone-azure-notebooks)
|
||||||
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks if they are not already there.
|
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks.
|
||||||
1. Create a workspace and its configuration file (**config.json**) using [these instructions](https://aka.ms/aml-how-to-configure-environment).
|
1. Follow the instructions in the [00.configuration](00.configuration.ipynb) notebook to create and connect to a workspace.
|
||||||
1. Select `+New` in the Azure Notebook toolbar to add your **config.json** file to the imported folder.
|
1. Open one of the sample notebooks.
|
||||||

|
|
||||||
1. Open the notebook.
|
|
||||||
|
|
||||||
**Make sure the Azure Notebook kernal is set to `Python 3.6`** when you open a notebook.
|
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
## **Use your own notebook server**
|
## **Use your own notebook server**
|
||||||
|
|
||||||
1. Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to:
|
1. Setup a Jupyter Notebook server and [install the Azure Machine Learning SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python).
|
||||||
* Create a workspace and its configuration file (**config.json**).
|
|
||||||
* Configure your notebook server.
|
|
||||||
1. Clone [this repository](https://aka.ms/aml-notebooks).
|
1. Clone [this repository](https://aka.ms/aml-notebooks).
|
||||||
1. Add your **config.json** file to the cloned folder
|
|
||||||
1. You may need to install other packages for specific notebooks
|
1. You may need to install other packages for specific notebooks
|
||||||
1. Start your notebook server.
|
1. Start your notebook server.
|
||||||
1. Open the notebook you want to run.
|
1. Follow the instructions in the [00.configuration](00.configuration.ipynb) notebook to create and connect to a workspace.
|
||||||
|
1. Open one of the sample notebooks.
|
||||||
|
|
||||||
> Note: **Looking for automated machine learning samples?**
|
> Note: **Looking for automated machine learning samples?**
|
||||||
> For your convenience, you can use an installation script instead of the steps below for the automated ML notebooks. Go to the [automl folder README](automl/README.md) and follow the instructions. The script installs all packages needed for notebooks in that folder.
|
> For your convenience, you can use an installation script instead of the steps below for the automated ML notebooks. Go to the [automl folder README](automl/README.md) and follow the instructions. The script installs all packages needed for notebooks in that folder.
|
||||||
|
|||||||
15
aml_config/conda_dependencies.yml
Normal file
15
aml_config/conda_dependencies.yml
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# Conda environment specification. The dependencies defined in this file will
|
||||||
|
|
||||||
|
# be automatically provisioned for runs with userManagedDependencies=False.
|
||||||
|
|
||||||
|
|
||||||
|
# Details about the Conda environment file format:
|
||||||
|
|
||||||
|
# https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually
|
||||||
|
|
||||||
|
|
||||||
|
name: project_environment
|
||||||
|
dependencies:
|
||||||
|
# The python interpreter version.
|
||||||
|
|
||||||
|
# Currently Azure ML only supports 3.5.2 and later.
|
||||||
115
aml_config/docker.runconfig
Normal file
115
aml_config/docker.runconfig
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
# The script to run.
|
||||||
|
script: train.py
|
||||||
|
# The arguments to the script file.
|
||||||
|
arguments: []
|
||||||
|
# The name of the compute target to use for this run.
|
||||||
|
target: local
|
||||||
|
# Framework to execute inside. Allowed values are "Python" , "PySpark", "CNTK", "TensorFlow", and "PyTorch".
|
||||||
|
framework: PySpark
|
||||||
|
# Communicator for the given framework. Allowed values are "None" , "ParameterServer", "OpenMpi", and "IntelMpi".
|
||||||
|
communicator: None
|
||||||
|
# Automatically prepare the run environment as part of the run itself.
|
||||||
|
autoPrepareEnvironment: true
|
||||||
|
# Maximum allowed duration for the run.
|
||||||
|
maxRunDurationSeconds:
|
||||||
|
# Number of nodes to use for running job.
|
||||||
|
nodeCount: 1
|
||||||
|
# Environment details.
|
||||||
|
environment:
|
||||||
|
# Environment variables set for the run.
|
||||||
|
environmentVariables:
|
||||||
|
EXAMPLE_ENV_VAR: EXAMPLE_VALUE
|
||||||
|
# Python details
|
||||||
|
python:
|
||||||
|
# user_managed_dependencies=True indicates that the environmentwill be user managed. False indicates that AzureML willmanage the user environment.
|
||||||
|
userManagedDependencies: false
|
||||||
|
# The python interpreter path
|
||||||
|
interpreterPath: python
|
||||||
|
# Path to the conda dependencies file to use for this run. If a project
|
||||||
|
# contains multiple programs with different sets of dependencies, it may be
|
||||||
|
# convenient to manage those environments with separate files.
|
||||||
|
condaDependenciesFile: aml_config/conda_dependencies.yml
|
||||||
|
# Docker details
|
||||||
|
docker:
|
||||||
|
# Set True to perform this run inside a Docker container.
|
||||||
|
enabled: true
|
||||||
|
# Base image used for Docker-based runs.
|
||||||
|
baseImage: mcr.microsoft.com/azureml/base:0.2.0
|
||||||
|
# Set False if necessary to work around shared volume bugs.
|
||||||
|
sharedVolumes: true
|
||||||
|
# Run with NVidia Docker extension to support GPUs.
|
||||||
|
gpuSupport: false
|
||||||
|
# Extra arguments to the Docker run command.
|
||||||
|
arguments: []
|
||||||
|
# Image registry that contains the base image.
|
||||||
|
baseImageRegistry:
|
||||||
|
# DNS name or IP address of azure container registry(ACR)
|
||||||
|
address:
|
||||||
|
# The username for ACR
|
||||||
|
username:
|
||||||
|
# The password for ACR
|
||||||
|
password:
|
||||||
|
# Spark details
|
||||||
|
spark:
|
||||||
|
# List of spark repositories.
|
||||||
|
repositories:
|
||||||
|
- https://mmlspark.azureedge.net/maven
|
||||||
|
packages:
|
||||||
|
- group: com.microsoft.ml.spark
|
||||||
|
artifact: mmlspark_2.11
|
||||||
|
version: '0.12'
|
||||||
|
precachePackages: true
|
||||||
|
# Databricks details
|
||||||
|
databricks:
|
||||||
|
# List of maven libraries.
|
||||||
|
mavenLibraries: []
|
||||||
|
# List of PyPi libraries
|
||||||
|
pypiLibraries: []
|
||||||
|
# List of RCran libraries
|
||||||
|
rcranLibraries: []
|
||||||
|
# List of JAR libraries
|
||||||
|
jarLibraries: []
|
||||||
|
# List of Egg libraries
|
||||||
|
eggLibraries: []
|
||||||
|
# History details.
|
||||||
|
history:
|
||||||
|
# Enable history tracking -- this allows status, logs, metrics, and outputs
|
||||||
|
# to be collected for a run.
|
||||||
|
outputCollection: true
|
||||||
|
# whether to take snapshots for history.
|
||||||
|
snapshotProject: true
|
||||||
|
# Spark configuration details.
|
||||||
|
spark:
|
||||||
|
configuration:
|
||||||
|
spark.app.name: Azure ML Experiment
|
||||||
|
spark.yarn.maxAppAttempts: 1
|
||||||
|
# HDI details.
|
||||||
|
hdi:
|
||||||
|
# Yarn deploy mode. Options are cluster and client.
|
||||||
|
yarnDeployMode: cluster
|
||||||
|
# Tensorflow details.
|
||||||
|
tensorflow:
|
||||||
|
# The number of worker tasks.
|
||||||
|
workerCount: 1
|
||||||
|
# The number of parameter server tasks.
|
||||||
|
parameterServerCount: 1
|
||||||
|
# Mpi details.
|
||||||
|
mpi:
|
||||||
|
# When using MPI, number of processes per node.
|
||||||
|
processCountPerNode: 1
|
||||||
|
# data reference configuration details
|
||||||
|
dataReferences: {}
|
||||||
|
# Project share datastore reference.
|
||||||
|
sourceDirectoryDataStore:
|
||||||
|
# AmlCompute details.
|
||||||
|
amlcompute:
|
||||||
|
# VM size of the Cluster to be created.Allowed values are Azure vm sizes.The list of vm sizes is available in 'https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-sizes-specs
|
||||||
|
vmSize:
|
||||||
|
# VM priority of the Cluster to be created.Allowed values are "dedicated" , "lowpriority".
|
||||||
|
vmPriority:
|
||||||
|
# A bool that indicates if the cluster has to be retained after job completion.
|
||||||
|
retainCluster: false
|
||||||
|
# Name of the cluster to be created. If not specified, runId will be used as cluster name.
|
||||||
|
name:
|
||||||
|
# Maximum number of nodes in the AmlCompute cluster to be created. Minimum number of nodes will always be set to 0.
|
||||||
|
clusterMaxNodeCount: 1
|
||||||
115
aml_config/local.runconfig
Normal file
115
aml_config/local.runconfig
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
# The script to run.
|
||||||
|
script: train.py
|
||||||
|
# The arguments to the script file.
|
||||||
|
arguments: []
|
||||||
|
# The name of the compute target to use for this run.
|
||||||
|
target: local
|
||||||
|
# Framework to execute inside. Allowed values are "Python" , "PySpark", "CNTK", "TensorFlow", and "PyTorch".
|
||||||
|
framework: Python
|
||||||
|
# Communicator for the given framework. Allowed values are "None" , "ParameterServer", "OpenMpi", and "IntelMpi".
|
||||||
|
communicator: None
|
||||||
|
# Automatically prepare the run environment as part of the run itself.
|
||||||
|
autoPrepareEnvironment: true
|
||||||
|
# Maximum allowed duration for the run.
|
||||||
|
maxRunDurationSeconds:
|
||||||
|
# Number of nodes to use for running job.
|
||||||
|
nodeCount: 1
|
||||||
|
# Environment details.
|
||||||
|
environment:
|
||||||
|
# Environment variables set for the run.
|
||||||
|
environmentVariables:
|
||||||
|
EXAMPLE_ENV_VAR: EXAMPLE_VALUE
|
||||||
|
# Python details
|
||||||
|
python:
|
||||||
|
# user_managed_dependencies=True indicates that the environmentwill be user managed. False indicates that AzureML willmanage the user environment.
|
||||||
|
userManagedDependencies: false
|
||||||
|
# The python interpreter path
|
||||||
|
interpreterPath: python
|
||||||
|
# Path to the conda dependencies file to use for this run. If a project
|
||||||
|
# contains multiple programs with different sets of dependencies, it may be
|
||||||
|
# convenient to manage those environments with separate files.
|
||||||
|
condaDependenciesFile: aml_config/conda_dependencies.yml
|
||||||
|
# Docker details
|
||||||
|
docker:
|
||||||
|
# Set True to perform this run inside a Docker container.
|
||||||
|
enabled: false
|
||||||
|
# Base image used for Docker-based runs.
|
||||||
|
baseImage: mcr.microsoft.com/azureml/base:0.2.0
|
||||||
|
# Set False if necessary to work around shared volume bugs.
|
||||||
|
sharedVolumes: true
|
||||||
|
# Run with NVidia Docker extension to support GPUs.
|
||||||
|
gpuSupport: false
|
||||||
|
# Extra arguments to the Docker run command.
|
||||||
|
arguments: []
|
||||||
|
# Image registry that contains the base image.
|
||||||
|
baseImageRegistry:
|
||||||
|
# DNS name or IP address of azure container registry(ACR)
|
||||||
|
address:
|
||||||
|
# The username for ACR
|
||||||
|
username:
|
||||||
|
# The password for ACR
|
||||||
|
password:
|
||||||
|
# Spark details
|
||||||
|
spark:
|
||||||
|
# List of spark repositories.
|
||||||
|
repositories:
|
||||||
|
- https://mmlspark.azureedge.net/maven
|
||||||
|
packages:
|
||||||
|
- group: com.microsoft.ml.spark
|
||||||
|
artifact: mmlspark_2.11
|
||||||
|
version: '0.12'
|
||||||
|
precachePackages: true
|
||||||
|
# Databricks details
|
||||||
|
databricks:
|
||||||
|
# List of maven libraries.
|
||||||
|
mavenLibraries: []
|
||||||
|
# List of PyPi libraries
|
||||||
|
pypiLibraries: []
|
||||||
|
# List of RCran libraries
|
||||||
|
rcranLibraries: []
|
||||||
|
# List of JAR libraries
|
||||||
|
jarLibraries: []
|
||||||
|
# List of Egg libraries
|
||||||
|
eggLibraries: []
|
||||||
|
# History details.
|
||||||
|
history:
|
||||||
|
# Enable history tracking -- this allows status, logs, metrics, and outputs
|
||||||
|
# to be collected for a run.
|
||||||
|
outputCollection: true
|
||||||
|
# whether to take snapshots for history.
|
||||||
|
snapshotProject: true
|
||||||
|
# Spark configuration details.
|
||||||
|
spark:
|
||||||
|
configuration:
|
||||||
|
spark.app.name: Azure ML Experiment
|
||||||
|
spark.yarn.maxAppAttempts: 1
|
||||||
|
# HDI details.
|
||||||
|
hdi:
|
||||||
|
# Yarn deploy mode. Options are cluster and client.
|
||||||
|
yarnDeployMode: cluster
|
||||||
|
# Tensorflow details.
|
||||||
|
tensorflow:
|
||||||
|
# The number of worker tasks.
|
||||||
|
workerCount: 1
|
||||||
|
# The number of parameter server tasks.
|
||||||
|
parameterServerCount: 1
|
||||||
|
# Mpi details.
|
||||||
|
mpi:
|
||||||
|
# When using MPI, number of processes per node.
|
||||||
|
processCountPerNode: 1
|
||||||
|
# data reference configuration details
|
||||||
|
dataReferences: {}
|
||||||
|
# Project share datastore reference.
|
||||||
|
sourceDirectoryDataStore:
|
||||||
|
# AmlCompute details.
|
||||||
|
amlcompute:
|
||||||
|
# VM size of the Cluster to be created.Allowed values are Azure vm sizes.The list of vm sizes is available in 'https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-sizes-specs
|
||||||
|
vmSize:
|
||||||
|
# VM priority of the Cluster to be created.Allowed values are "dedicated" , "lowpriority".
|
||||||
|
vmPriority:
|
||||||
|
# A bool that indicates if the cluster has to be retained after job completion.
|
||||||
|
retainCluster: false
|
||||||
|
# Name of the cluster to be created. If not specified, runId will be used as cluster name.
|
||||||
|
name:
|
||||||
|
# Maximum number of nodes in the AmlCompute cluster to be created. Minimum number of nodes will always be set to 0.
|
||||||
|
clusterMaxNodeCount: 1
|
||||||
1
aml_config/project.json
Normal file
1
aml_config/project.json
Normal file
@@ -0,0 +1 @@
|
|||||||
|
{"Id": "local-compute", "Scope": "/subscriptions/65a1016d-0f67-45d2-b838-b8f373d6d52e/resourceGroups/sheri/providers/Microsoft.MachineLearningServices/workspaces/sheritestqs3/projects/local-compute"}
|
||||||
@@ -13,53 +13,14 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 00. configuration\n",
|
"# AutoML 00. Configuration\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example you will create an Azure Machine Learning Workspace and initialize your notebook directory to easily use this workspace. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n",
|
"In this example you will create an Azure Machine Learning `Workspace` object and initialize your notebook directory to easily reload this object from a configuration file. Typically you will only need to run this once per notebook directory, and all other notebooks in this directory or any sub-directories will automatically use the settings you indicate here.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Prerequisites:\n",
|
"## Prerequisites:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Before running this notebook, run the automl_setup script described in README.md.\n"
|
"Before running this notebook, run the `automl_setup` script described in README.md.\n"
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Connect to your Azure Subscription\n",
|
|
||||||
"\n",
|
|
||||||
"In order to use an AML Workspace, first you need access to an Azure Subscription. You can [create your own](https://azure.microsoft.com/en-us/free/) or get your existing subscription information from the [Azure portal](https://portal.azure.com).\n",
|
|
||||||
"\n",
|
|
||||||
"First login to azure and follow prompts to authenticate. Then check that your subscription is correct"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"!az login"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"!az account show"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"If you have multiple subscriptions and need to change the active one, you can use a command\n",
|
|
||||||
"```shell\n",
|
|
||||||
"az account set -s <subscription-id>\n",
|
|
||||||
"```"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -68,27 +29,20 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Register Machine Learning Services Resource Provider\n",
|
"### Register Machine Learning Services Resource Provider\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This step is required to use the Azure ML services backing the SDK."
|
"Microsoft.MachineLearningServices only needs to be registed once in the subscription.\n",
|
||||||
]
|
"To register it:\n",
|
||||||
},
|
"1. Start the Azure portal.\n",
|
||||||
{
|
"2. Select your `All services` and then `Subscription`.\n",
|
||||||
"cell_type": "code",
|
"3. Select the subscription that you want to use.\n",
|
||||||
"execution_count": null,
|
"4. Click on `Resource providers`\n",
|
||||||
"metadata": {},
|
"3. Click the `Register` link next to Microsoft.MachineLearningServices"
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# register the new RP\n",
|
|
||||||
"!az provider register -n Microsoft.MachineLearningServices\n",
|
|
||||||
"\n",
|
|
||||||
"# check the registration status\n",
|
|
||||||
"!az provider show -n Microsoft.MachineLearningServices"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Check core SDK version number for validate your installation and for debugging purposes"
|
"### Check the Azure ML Core SDK Version to Validate Your Installation"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -107,17 +61,17 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Initialize an Azure ML Workspace\n",
|
"## Initialize an Azure ML Workspace\n",
|
||||||
"### What is an Azure ML Workspace and why do I need one?\n",
|
"### What is an Azure ML Workspace and Why Do I Need One?\n",
|
||||||
"\n",
|
"\n",
|
||||||
"An AML Workspace is an Azure resource that organaizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an AML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n",
|
"An Azure ML workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### What do I need\n",
|
"### What do I Need?\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To create or access an Azure ML Workspace, you will need to import the AML library and specify following information:\n",
|
"To create or access an Azure ML workspace, you will need to import the Azure ML library and specify following information:\n",
|
||||||
"* A name for your workspace. You can choose one.\n",
|
"* A name for your workspace. You can choose one.\n",
|
||||||
"* Your subscription id. Use *id* value from *az account show* output above. \n",
|
"* Your subscription id. Use the `id` value from the `az account show` command output above.\n",
|
||||||
"* The resource group name. Resource group organizes Azure resources and provides default region for the resources in the group. You can either specify a new one, in which case it gets created for your Workspace, or use an existing one or create a new one from [Azure portal](https://portal.azure.com)\n",
|
"* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n",
|
||||||
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
|
"* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -137,17 +91,17 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Creating a workspace\n",
|
"## Creating a Workspace\n",
|
||||||
"If you already have access to an AML Workspace you want to use, you can skip this cell. Otherwise, this cell will create an AML workspace for you in a subscription provided you have the correct permissions for the given `subscription_id`.\n",
|
"If you already have access to an Azure ML workspace you want to use, you can skip this cell. Otherwise, this cell will create an Azure ML workspace for you in the specified subscription, provided you have the correct permissions for the given `subscription_id`.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This will fail when:\n",
|
"This will fail when:\n",
|
||||||
"1. The workspace already exists\n",
|
"1. The workspace already exists.\n",
|
||||||
"2. You do not have permission to create a workspace in the resource group\n",
|
"2. You do not have permission to create a workspace in the resource group.\n",
|
||||||
"3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
"3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If workspace creation fails for any reason other than already existing, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources.\n",
|
"If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note** The workspace creation can take several minutes."
|
"**Note:** Creation of a new workspace can take several minutes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -156,7 +110,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# import the Workspace class and check the azureml SDK version\n",
|
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ws = Workspace.create(name = workspace_name,\n",
|
"ws = Workspace.create(name = workspace_name,\n",
|
||||||
@@ -170,7 +124,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Configuring your local environment\n",
|
"## Configuring Your Local Environment\n",
|
||||||
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
"You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -186,7 +140,7 @@
|
|||||||
" subscription_id = subscription_id,\n",
|
" subscription_id = subscription_id,\n",
|
||||||
" resource_group = resource_group)\n",
|
" resource_group = resource_group)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
||||||
"ws.write_config()"
|
"ws.write_config()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -203,7 +157,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# load workspace configuratio from ./aml_config/config.json file.\n",
|
"# Load workspace configuration from ./aml_config/config.json file.\n",
|
||||||
"my_workspace = Workspace.from_config()\n",
|
"my_workspace = Workspace.from_config()\n",
|
||||||
"my_workspace.get_details()"
|
"my_workspace.get_details()"
|
||||||
]
|
]
|
||||||
@@ -212,8 +166,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create a folder to host all sample projects\n",
|
"## Create a Folder to Host All Sample Projects\n",
|
||||||
"Lastly, create a folder where all the sample projects will be hosted."
|
"Finally, create a folder where all the sample projects will be hosted."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -242,6 +196,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,27 +13,27 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 01: Classification with local compute\n",
|
"# AutoML 01: Classification with Local Compute\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Creating an Experiment in an existing Workspace\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Instantiating AutoMLConfig\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"3. Training the Model using local compute\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Exploring the results\n",
|
"4. Explore the results.\n",
|
||||||
"5. Testing the fitted model\n"
|
"5. Test the best fitted model.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -67,9 +67,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for experiment\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"experiment_name = 'automl-local-classification'\n",
|
"experiment_name = 'automl-local-classification'\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-local-classification'\n",
|
"project_folder = './sample_projects/automl-local-classification'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -92,7 +91,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -109,7 +108,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Load Digits Dataset"
|
"## Load Training Data\n",
|
||||||
|
"\n",
|
||||||
|
"This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -123,25 +124,25 @@
|
|||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"X_digits = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"y_digits = digits.target[100:]"
|
"y_train = digits.target[100:]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate Auto ML Config\n",
|
"## Configure AutoML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**task**|classification or regression|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|Time limit in seconds for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data |\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
@@ -160,8 +161,8 @@
|
|||||||
" iterations = 50,\n",
|
" iterations = 50,\n",
|
||||||
" n_cross_validations = 3,\n",
|
" n_cross_validations = 3,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" X = X_digits, \n",
|
" X = X_train, \n",
|
||||||
" y = y_digits,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -169,10 +170,10 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Training the Model\n",
|
"## Train the Models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -184,11 +185,32 @@
|
|||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"local_run"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Optionally, you can continue an interrupted local run by calling continue_experiment without the <b>iterations</b> parameter, or run more iterations to a completed run by specifying the <b>iterations</b> parameter:"
|
"Optionally, you can continue an interrupted local run by calling `continue_experiment` without the `iterations` parameter, or run more iterations for a completed run by specifying the `iterations` parameter:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"local_run = local_run.continue_experiment(X = X_train, \n",
|
||||||
|
" y = y_train, \n",
|
||||||
|
" show_output = True,\n",
|
||||||
|
" iterations = 5)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -201,33 +223,21 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "markdown",
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
"source": [
|
||||||
"local_run = local_run.continue_experiment(X = X_digits, \n",
|
"## Explore the Results"
|
||||||
" y = y_digits, \n",
|
|
||||||
" show_output = True,\n",
|
|
||||||
" iterations = 5)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Exploring the results"
|
"#### Widget for Monitoring Runs\n",
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Widget for monitoring runs\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -246,7 +256,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -272,7 +282,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -290,8 +300,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any other metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Give me the run and the model that has the smallest `log_loss`:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -310,8 +320,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a specific iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Give me the run and the model from the 3rd iteration:"
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -330,7 +340,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Testing the Fitted Model \n",
|
"### Test the Best Fitted Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
]
|
||||||
@@ -342,8 +352,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_digits = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -351,7 +361,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing our best pipeline\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"We will try to predict 2 digits and see how our model works."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -361,11 +371,11 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Randomly select digits and test\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_digits), 2):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
" print(index)\n",
|
" print(index)\n",
|
||||||
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
" label = y_digits[index]\n",
|
" label = y_test[index]\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
" fig = plt.figure(1, figsize = (3,3))\n",
|
" fig = plt.figure(1, figsize = (3,3))\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
@@ -376,6 +386,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,27 +13,27 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 02: Regression with local compute\n",
|
"# AutoML 02: Regression with Local Compute\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple regression problem.\n",
|
"In this example we use the scikit-learn's [diabetes dataset](http://scikit-learn.org/stable/datasets/index.html#diabetes-dataset) to showcase how you can use AutoML for a simple regression problem.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Creating an Experiment using an existing Workspace\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Instantiating AutoMLConfig\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"3. Training the Model using local compute\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Exploring the results\n",
|
"4. Explore the results.\n",
|
||||||
"5. Testing the fitted model"
|
"5. Test the best fitted model.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -67,9 +67,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for the experiment\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"experiment_name = 'automl-local-regression'\n",
|
"experiment_name = 'automl-local-regression'\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-local-regression'\n",
|
"project_folder = './sample_projects/automl-local-regression'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -92,7 +91,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -109,7 +108,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Read Data"
|
"### Load Training Data\n",
|
||||||
|
"This uses scikit-learn's [load_diabetes](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) method."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -118,7 +118,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# load diabetes dataset, a well-known built-in small dataset that comes with scikit-learn\n",
|
"# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.\n",
|
||||||
"from sklearn.datasets import load_diabetes\n",
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
"from sklearn.linear_model import Ridge\n",
|
"from sklearn.linear_model import Ridge\n",
|
||||||
"from sklearn.metrics import mean_squared_error\n",
|
"from sklearn.metrics import mean_squared_error\n",
|
||||||
@@ -135,17 +135,17 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate Auto ML Config\n",
|
"## Configure AutoML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**task**|classification or regression|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Regression supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i><br><i>normalized_root_mean_squared_log_error</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|Time limit in seconds for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
@@ -173,10 +173,10 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Training the Model\n",
|
"## Train the Models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -201,18 +201,18 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Exploring the results"
|
"## Explore the Results"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for monitoring runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -231,7 +231,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -257,7 +257,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -275,8 +275,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any other metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run and model that has the smallest `root_mean_squared_error` (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -295,9 +295,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a specific iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"\n",
|
"Show the run and the model from the third iteration:"
|
||||||
"Simply show the run and model from the 3rd iteration:"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -316,7 +315,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Testing the Fitted Model"
|
"### Test the Best Fitted Model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -351,13 +350,13 @@
|
|||||||
"from sklearn import datasets\n",
|
"from sklearn import datasets\n",
|
||||||
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# set up a multi-plot chart\n",
|
"# Set up a multi-plot chart.\n",
|
||||||
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
||||||
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
||||||
"f.set_figheight(6)\n",
|
"f.set_figheight(6)\n",
|
||||||
"f.set_figwidth(16)\n",
|
"f.set_figwidth(16)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# plot residual values of training set\n",
|
"# Plot residual values of training set.\n",
|
||||||
"a0.axis([0, 360, -200, 200])\n",
|
"a0.axis([0, 360, -200, 200])\n",
|
||||||
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
||||||
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
@@ -365,11 +364,12 @@
|
|||||||
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
||||||
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
||||||
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
||||||
"# plot histogram\n",
|
"\n",
|
||||||
|
"# Plot a histogram.\n",
|
||||||
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n",
|
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n",
|
||||||
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n",
|
"a0.hist(y_residual_train, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# plot residual values of test set\n",
|
"# Plot residual values of test set.\n",
|
||||||
"a1.axis([0, 90, -200, 200])\n",
|
"a1.axis([0, 90, -200, 200])\n",
|
||||||
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
||||||
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
@@ -377,15 +377,21 @@
|
|||||||
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
||||||
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
||||||
"a1.set_yticklabels([])\n",
|
"a1.set_yticklabels([])\n",
|
||||||
"# plot histogram\n",
|
"\n",
|
||||||
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step');\n",
|
"# Plot a histogram.\n",
|
||||||
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10);\n",
|
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', bins = 10, histtype = 'step')\n",
|
||||||
|
"a1.hist(y_residual_test, orientation = 'horizontal', color = 'b', alpha = 0.2, bins = 10)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -15,33 +15,33 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# AutoML 03: Remote Execution using DSVM (Ubuntu)\n",
|
"# AutoML 03: Remote Execution using DSVM (Ubuntu)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you wiil learn how to:\n",
|
||||||
"1. Creating an Experiment using an existing Workspace\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Attaching an existing DSVM to a workspace\n",
|
"2. Attach an existing DSVM to a workspace.\n",
|
||||||
"3. Instantiating AutoMLConfig \n",
|
"3. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"4. Training the Model using the DSVM\n",
|
"4. Train the model using the DSVM.\n",
|
||||||
"5. Exploring the results\n",
|
"5. Explore the results.\n",
|
||||||
"6. Testing the fitted model\n",
|
"6. Test the best fitted model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In addition this notebook showcases the following features\n",
|
"In addition, this notebook showcases the following features:\n",
|
||||||
"- **Parallel** Executions for iterations\n",
|
"- **Parallel** executions for iterations\n",
|
||||||
"- Asyncronous tracking of progress\n",
|
"- **Asynchronous** tracking of progress\n",
|
||||||
"- **Cancelling** individual iterations or the entire run\n",
|
"- **Cancellation** of individual iterations or the entire run\n",
|
||||||
"- Retrieving models for any iteration or logged metric\n",
|
"- Retrieving models for any iteration or logged metric\n",
|
||||||
"- specify automl settings as **kwargs**\n"
|
"- Specifying AutoML settings as `**kwargs`\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a workspace. For AutoML you would need to create a <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -75,9 +75,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for the run history container in the workspace\n",
|
"# Choose a name for the run history container in the workspace.\n",
|
||||||
"experiment_name = 'automl-remote-dsvm4'\n",
|
"experiment_name = 'automl-remote-dsvm4'\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-remote-dsvm4'\n",
|
"project_folder = './sample_projects/automl-remote-dsvm4'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -100,7 +99,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -118,9 +117,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create a Remote Linux DSVM\n",
|
"## Create a Remote Linux DSVM\n",
|
||||||
"Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n",
|
"**Note:** If creation fails with a message about Marketplace purchase eligibilty, start creation of a DSVM through the [Azure portal](https://portal.azure.com), and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled this setting, you can exit the portal without actually creating the DSVM, and creation of the DSVM through the notebook should work.\n"
|
||||||
"\n",
|
|
||||||
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -134,9 +131,9 @@
|
|||||||
"dsvm_name = 'mydsvm'\n",
|
"dsvm_name = 'mydsvm'\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
|
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
|
||||||
" print('found existing dsvm.')\n",
|
" print('Found an existing DSVM.')\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" print('creating new dsvm.')\n",
|
" print('Creating a new DSVM.')\n",
|
||||||
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
|
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
|
||||||
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
|
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
|
||||||
" dsvm_compute.wait_for_completion(show_output = True)"
|
" dsvm_compute.wait_for_completion(show_output = True)"
|
||||||
@@ -147,7 +144,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Get Data File\n",
|
"## Create Get Data File\n",
|
||||||
"For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file."
|
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
|
||||||
|
"In this example, the `get_data()` function returns data using scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -175,29 +173,29 @@
|
|||||||
"def get_data():\n",
|
"def get_data():\n",
|
||||||
" \n",
|
" \n",
|
||||||
" digits = datasets.load_digits()\n",
|
" digits = datasets.load_digits()\n",
|
||||||
" X_digits = digits.data[100:,:]\n",
|
" X_train = digits.data[100:,:]\n",
|
||||||
" y_digits = digits.target[100:]\n",
|
" y_train = digits.target[100:]\n",
|
||||||
"\n",
|
"\n",
|
||||||
" return { \"X\" : X_digits, \"y\" : y_digits }"
|
" return { \"X\" : X_train, \"y\" : y_train }"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
|
"## Configure AutoML <a class=\"anchor\" id=\"Instantiate-AutoML-Remote-DSVM\"></a>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
|
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
|
"**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|Time limit in seconds for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM."
|
"|**concurrent_iterations**|Maximum number of iterations to execute in parallel. This should be less than the number of cores on the DSVM.|"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -229,7 +227,18 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<b>Note</b> that the first run on a new DSVM may take a several minutes to preparing the environment."
|
"**Note:** The first run on a new DSVM may take several minutes to prepare the environment."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train the Models\n",
|
||||||
|
"\n",
|
||||||
|
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
|
||||||
|
"\n",
|
||||||
|
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -245,10 +254,10 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Exploring the Results\n",
|
"## Explore the Results\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Loading executed runs\n",
|
"#### Loading Executed Runs\n",
|
||||||
"In case you need to load a previously executed run given a run id please enable the below cell"
|
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -262,13 +271,13 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for monitoring runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n",
|
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -287,7 +296,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# wait till the run finishes\n",
|
"# Wait until the run finishes.\n",
|
||||||
"remote_run.wait_for_completion(show_output = True)"
|
"remote_run.wait_for_completion(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -297,7 +306,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -321,9 +330,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Canceling runs\n",
|
"## Cancelling Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
|
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -332,10 +341,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
|
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
||||||
"# remote_run.cancel()\n",
|
"# remote_run.cancel()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Cancel iteration 1 and move onto iteration 2\n",
|
"# Cancel iteration 1 and move onto iteration 2.\n",
|
||||||
"# remote_run.cancel_iteration(1)"
|
"# remote_run.cancel_iteration(1)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -345,7 +354,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -363,8 +372,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any other metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run/model which has the smallest `log_loss` value."
|
"Show the run and the model which has the smallest `log_loss` value:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -383,8 +392,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a specific iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and model from the 3rd iteration."
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -403,7 +412,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Testing the Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n",
|
"### Test the Best Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
]
|
||||||
@@ -415,8 +424,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_digits = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -424,7 +433,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing our best pipeline"
|
"#### Test Our Best Fitted Model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -433,11 +442,11 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Randomly select digits and test\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_digits), 2):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
" print(index)\n",
|
" print(index)\n",
|
||||||
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
" label = y_digits[index]\n",
|
" label = y_test[index]\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
@@ -448,6 +457,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -15,33 +15,33 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# AutoML 03: Remote Execution using Batch AI\n",
|
"# AutoML 03: Remote Execution using Batch AI\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) to showcase how you can use AutoML for a simple classification problem.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [setup](setup.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you would see\n",
|
||||||
"1. Creating an Experiment using an existing Workspace\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Attaching an existing Batch AI compute to a workspace\n",
|
"2. Attach an existing Batch AI compute to a workspace.\n",
|
||||||
"3. Instantiating AutoMLConfig \n",
|
"3. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"4. Training the Model using the Batch AI\n",
|
"4. Train the model using Batch AI.\n",
|
||||||
"5. Exploring the results\n",
|
"5. Explore the results.\n",
|
||||||
"6. Testing the fitted model\n",
|
"6. Test the best fitted model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In addition this notebook showcases the following features\n",
|
"In addition this notebook showcases the following features\n",
|
||||||
"- **Parallel** Executions for iterations\n",
|
"- **Parallel** executions for iterations\n",
|
||||||
"- Asyncronous tracking of progress\n",
|
"- **Asynchronous** tracking of progress\n",
|
||||||
"- **Cancelling** individual iterations or the entire run\n",
|
"- **Cancellation** of individual iterations or the entire run\n",
|
||||||
"- Retrieving models for any iteration or logged metric\n",
|
"- Retrieving models for any iteration or logged metric\n",
|
||||||
"- specify automl settings as **kwargs**\n"
|
"- Specifying AutoML settings as `**kwargs`\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a workspace. For AutoML you would need to create a <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -75,9 +75,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for the run history container in the workspace\n",
|
"# Choose a name for the run history container in the workspace.\n",
|
||||||
"experiment_name = 'automl-remote-batchai'\n",
|
"experiment_name = 'automl-remote-batchai'\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-remote-batchai'\n",
|
"project_folder = './sample_projects/automl-remote-batchai'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -100,7 +99,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -120,9 +119,9 @@
|
|||||||
"## Create Batch AI Cluster\n",
|
"## Create Batch AI Cluster\n",
|
||||||
"The cluster is created as Machine Learning Compute and will appear under your workspace.\n",
|
"The cluster is created as Machine Learning Compute and will appear under your workspace.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<b>Note</b>: The cluster creation can take over 10 minutes, please be patient.\n",
|
"**Note:** The creation of the Batch AI cluster can take over 10 minutes, please be patient.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As with other Azure services, there are limits on certain resources (for eg. BatchAI cluster size) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. Batch AI cluster size) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -134,35 +133,35 @@
|
|||||||
"from azureml.core.compute import BatchAiCompute\n",
|
"from azureml.core.compute import BatchAiCompute\n",
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# Choose a name for your cluster.\n",
|
||||||
"batchai_cluster_name = ws.name + \"cpu\"\n",
|
"batchai_cluster_name = \"mybatchai\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"found = False\n",
|
"found = False\n",
|
||||||
"# see if this compute target already exists in the workspace\n",
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
"for ct in ws.compute_targets():\n",
|
"for ct_name, ct in ws.compute_targets().items():\n",
|
||||||
" print(ct.name, ct.type)\n",
|
" print(ct.name, ct.type)\n",
|
||||||
" if (ct.name == batchai_cluster_name and ct.type == 'BatchAI'):\n",
|
" if (ct.name == batchai_cluster_name and ct.type == 'BatchAI'):\n",
|
||||||
" found = True\n",
|
" found = True\n",
|
||||||
" print('found compute target. just use it.')\n",
|
" print('Found existing compute target.')\n",
|
||||||
" compute_target = ct\n",
|
" compute_target = ct\n",
|
||||||
" break\n",
|
" break\n",
|
||||||
" \n",
|
" \n",
|
||||||
"if not found:\n",
|
"if not found:\n",
|
||||||
" print('creating a new compute target...')\n",
|
" print('Creating a new compute target...')\n",
|
||||||
" provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
" provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
" #vm_priority = 'lowpriority', # optional\n",
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
" autoscale_enabled = True,\n",
|
" autoscale_enabled = True,\n",
|
||||||
" cluster_min_nodes = 1, \n",
|
" cluster_min_nodes = 1, \n",
|
||||||
" cluster_max_nodes = 4)\n",
|
" cluster_max_nodes = 4)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # create the cluster\n",
|
" # Create the cluster.\n",
|
||||||
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)\n",
|
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
" # if no min node count is provided it will use the scale settings for the cluster\n",
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # For a more detailed view of current BatchAI cluster status, use the 'status' property "
|
" # For a more detailed view of current Batch AI cluster status, use the 'status' property."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -170,7 +169,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Get Data File\n",
|
"## Create Get Data File\n",
|
||||||
"For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file."
|
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
|
||||||
|
"In this example, the `get_data()` function returns data using scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -198,10 +198,10 @@
|
|||||||
"def get_data():\n",
|
"def get_data():\n",
|
||||||
" \n",
|
" \n",
|
||||||
" digits = datasets.load_digits()\n",
|
" digits = datasets.load_digits()\n",
|
||||||
" X_digits = digits.data\n",
|
" X_train = digits.data\n",
|
||||||
" y_digits = digits.target\n",
|
" y_train = digits.target\n",
|
||||||
"\n",
|
"\n",
|
||||||
" return { \"X\" : X_digits, \"y\" : y_digits }"
|
" return { \"X\" : X_train, \"y\" : y_train }"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -210,17 +210,17 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Instantiate AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
|
"## Instantiate AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
|
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
|
"**Note:** When using Batch AI, you can't pass Numpy arrays directly to the fit method.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|Time limit in seconds for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM."
|
"|**concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -248,6 +248,16 @@
|
|||||||
" )\n"
|
" )\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train the Models\n",
|
||||||
|
"\n",
|
||||||
|
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n",
|
||||||
|
"In this example, we specify `show_output = False` to suppress console output while the run is in progress."
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -261,10 +271,10 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Exploring the Results\n",
|
"## Explore the Results\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Loading executed runs\n",
|
"#### Loading executed runs\n",
|
||||||
"In case you need to load a previously executed run given a run id please enable the below cell"
|
"In case you need to load a previously executed run, enable the cell below and replace the `run_id` value."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -278,13 +288,13 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for monitoring runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n",
|
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -312,7 +322,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# wait till the run finishes\n",
|
"# Wait until the run finishes.\n",
|
||||||
"remote_run.wait_for_completion(show_output = True)"
|
"remote_run.wait_for_completion(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -322,7 +332,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -346,9 +356,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Canceling runs\n",
|
"## Cancelling Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
|
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -357,10 +367,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
|
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
||||||
"# remote_run.cancel()\n",
|
"# remote_run.cancel()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Cancel iteration 1 and move onto iteration 2\n",
|
"# Cancel iteration 1 and move onto iteration 2.\n",
|
||||||
"# remote_run.cancel_iteration(1)"
|
"# remote_run.cancel_iteration(1)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -370,7 +380,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -388,8 +398,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any other metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Show the run/model which has the smallest `log_loss` value."
|
"Show the run and the model which has the smallest `log_loss` value:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -408,8 +418,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a specific iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Show the run and model from the 3rd iteration."
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -424,25 +434,6 @@
|
|||||||
"print(third_model)"
|
"print(third_model)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Register fitted model for deployment"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"description = 'AutoML Model'\n",
|
|
||||||
"tags = None\n",
|
|
||||||
"remote_run.register_model(description=description, tags=tags)\n",
|
|
||||||
"remote_run.model_id # Use this id to deploy the model as a web service in Azure"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -459,8 +450,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_digits = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -468,7 +459,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing our best pipeline"
|
"#### Testing Our Best Fitted Model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -477,11 +468,11 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Randomly select digits and test\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_digits), 2):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
" print(index)\n",
|
" print(index)\n",
|
||||||
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
" label = y_digits[index]\n",
|
" label = y_test[index]\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
@@ -489,16 +480,14 @@
|
|||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
]
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,36 +13,36 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Auto ML : Remote Execution with Text data from Blobstorage\n",
|
"# Auto ML 04: Remote Execution with Text Data from Azure Blob Storage\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the [Burning Man 2016 dataset](https://innovate.burningman.org/datasets-page/) to showcase how you can use AutoML to handle text data from a Azure blobstorage.\n",
|
"In this example we use the [Burning Man 2016 dataset](https://innovate.burningman.org/datasets-page/) to showcase how you can use AutoML to handle text data from Azure Blob Storage.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Creating an Experiment using an existing Workspace\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Attaching an existing DSVM to a workspace\n",
|
"2. Attach an existing DSVM to a workspace.\n",
|
||||||
"3. Instantiating AutoMLConfig \n",
|
"3. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"4. Training the Model using the DSVM\n",
|
"4. Train the model using the DSVM.\n",
|
||||||
"5. Exploring the results\n",
|
"5. Explore the results.\n",
|
||||||
"6. Testing the fitted model\n",
|
"6. Test the best fitted model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In addition this notebook showcases the following features\n",
|
"In addition this notebook showcases the following features\n",
|
||||||
"- **Parallel** Executions for iterations\n",
|
"- **Parallel** executions for iterations\n",
|
||||||
"- Asyncronous tracking of progress\n",
|
"- **Asynchronous** tracking of progress\n",
|
||||||
"- **Cancelling** individual iterations or the entire run\n",
|
"- **Cancellation** of individual iterations or the entire run\n",
|
||||||
"- Retrieving models for any iteration or logged metric\n",
|
"- Retrieving models for any iteration or logged metric\n",
|
||||||
"- specify automl settings as **kwargs**\n",
|
"- Specifying AutoML settings as `**kwargs`\n",
|
||||||
"- handling **text** data with **preprocess** flag\n"
|
"- Handling **text** data using the `preprocess` flag\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -76,9 +76,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for the run history container in the workspace\n",
|
"# Choose a name for the run history container in the workspace.\n",
|
||||||
"experiment_name = 'automl-remote-dsvm-blobstore'\n",
|
"experiment_name = 'automl-remote-dsvm-blobstore'\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-remote-dsvm-blobstore'\n",
|
"project_folder = './sample_projects/automl-remote-dsvm-blobstore'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -101,7 +100,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -119,11 +118,11 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Attach a Remote Linux DSVM\n",
|
"## Attach a Remote Linux DSVM\n",
|
||||||
"To use remote docker commpute target:\n",
|
"To use a remote Docker compute target:\n",
|
||||||
"1. Create a Linux DSVM in Azure. Here is some [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor, NOT CentOS. Make sure that disk space is available under /tmp because AutoML creates files under /tmp/azureml_runs. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4Gb per core.\n",
|
"1. Create a Linux DSVM in Azure, following these [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor (not CentOS). Make sure that disk space is available under `/tmp` because AutoML creates files under `/tmp/azureml_run`s. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4GB per core.\n",
|
||||||
"2. Enter the IP address, username and password below\n",
|
"2. Enter the IP address, user name and password below.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this."
|
"**Note:** By default, SSH runs on port 22 and you don't need to change the port number below. If you've configured SSH to use a different port, change `dsvm_ssh_port` accordinglyaddress. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on changing SSH ports for security reasons."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -133,14 +132,32 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import RemoteCompute\n",
|
"from azureml.core.compute import RemoteCompute\n",
|
||||||
|
"import time\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Add your VM information below\n",
|
"# Add your VM information below\n",
|
||||||
"dsvm_name = 'mydsvm1'\n",
|
"# If a compute with the specified compute_name already exists, it will be used and the dsvm_ip_addr, dsvm_ssh_port, \n",
|
||||||
|
"# dsvm_username and dsvm_password will be ignored.\n",
|
||||||
|
"compute_name = 'mydsvm'\n",
|
||||||
"dsvm_ip_addr = '<<ip_addr>>'\n",
|
"dsvm_ip_addr = '<<ip_addr>>'\n",
|
||||||
|
"dsvm_ssh_port = 22\n",
|
||||||
"dsvm_username = '<<username>>'\n",
|
"dsvm_username = '<<username>>'\n",
|
||||||
"dsvm_password = '<<password>>'\n",
|
"dsvm_password = '<<password>>'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"dsvm_compute = RemoteCompute.attach(workspace=ws, name=dsvm_name, address=dsvm_ip_addr, username=dsvm_username, password=dsvm_password, ssh_port=22)"
|
"if compute_name in ws.compute_targets():\n",
|
||||||
|
" print('Using existing compute.')\n",
|
||||||
|
" dsvm_compute = ws.compute_targets()[compute_name]\n",
|
||||||
|
"else:\n",
|
||||||
|
" RemoteCompute.attach(workspace=ws, name=compute_name, address=dsvm_ip_addr, username=dsvm_username, password=dsvm_password, ssh_port=dsvm_ssh_port)\n",
|
||||||
|
"\n",
|
||||||
|
" while ws.compute_targets()[compute_name].provisioning_state == 'Creating':\n",
|
||||||
|
" time.sleep(1)\n",
|
||||||
|
"\n",
|
||||||
|
" dsvm_compute = ws.compute_targets()[compute_name]\n",
|
||||||
|
" \n",
|
||||||
|
" if dsvm_compute.provisioning_state == 'Failed':\n",
|
||||||
|
" print('Attached failed.')\n",
|
||||||
|
" print(dsvm_compute.provisioning_errors)\n",
|
||||||
|
" dsvm_compute.delete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -148,9 +165,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Get Data File\n",
|
"## Create Get Data File\n",
|
||||||
"For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
|
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
|
||||||
"\n",
|
"In this example, the `get_data()` function returns a [dictionary](README.md#getdata)."
|
||||||
"The *get_data()* function returns a [dictionary](README.md#getdata)."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -176,18 +192,18 @@
|
|||||||
"from sklearn.preprocessing import LabelEncoder\n",
|
"from sklearn.preprocessing import LabelEncoder\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def get_data():\n",
|
"def get_data():\n",
|
||||||
" # Burning man 2016 data\n",
|
" # Load Burning Man 2016 data.\n",
|
||||||
" df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n",
|
" df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n",
|
||||||
" delimiter=\"\\t\", quotechar='\"')\n",
|
" delimiter=\"\\t\", quotechar='\"')\n",
|
||||||
" # get integer labels\n",
|
" # Get integer labels.\n",
|
||||||
" le = LabelEncoder()\n",
|
" le = LabelEncoder()\n",
|
||||||
" le.fit(df[\"Label\"].values)\n",
|
" le.fit(df[\"Label\"].values)\n",
|
||||||
" y = le.transform(df[\"Label\"].values)\n",
|
" y = le.transform(df[\"Label\"].values)\n",
|
||||||
" df = df.drop([\"Label\"], axis=1)\n",
|
" X = df.drop([\"Label\"], axis=1)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)\n",
|
" X_train, _, y_train, _ = train_test_split(X, y, test_size = 0.1, random_state = 42)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" return { \"X\" : df, \"y\" : y }"
|
" return { \"X\" : X_train, \"y\" : y_train }"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -196,7 +212,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### View data\n",
|
"### View data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can execute the *get_data()* function locally to view the *train* data"
|
"You can execute the `get_data()` function locally to view the training data."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -218,21 +234,21 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
|
"## Configure AutoML <a class=\"anchor\" id=\"Instatiate-AutoML-Remote-DSVM\"></a>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
|
"You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
|
"**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|Time limit in seconds for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM\n",
|
"|**concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n",
|
||||||
"|**preprocess**| *True/False* <br>Setting this to *True* enables AutoML to perform preprocessing <br>on the input to handle *missing data*, and perform some common *feature extraction*|\n",
|
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
|
||||||
"|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.<br> Default is *1*, you can set it to *-1* to use all cores|"
|
"|**max_cores_per_iteration**|Indicates how many cores on the compute target would be used to train a single pipeline.<br>Default is *1*; you can set it to *-1* to use all cores.|"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -262,9 +278,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Training the Model <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
|
"## Train the Models <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run."
|
"Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -281,13 +297,13 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Exploring the Results <a class=\"anchor\" id=\"Exploring-the-Results-Remote-DSVM\"></a>\n",
|
"## Exploring the Results <a class=\"anchor\" id=\"Exploring-the-Results-Remote-DSVM\"></a>\n",
|
||||||
"#### Widget for monitoring runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n",
|
"You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -306,7 +322,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -330,8 +346,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Canceling runs\n",
|
"## Cancelling Runs\n",
|
||||||
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
|
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -340,10 +356,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
|
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
|
||||||
"remote_run.cancel()\n",
|
"remote_run.cancel()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Cancel iteration 1 and move onto iteration 2\n",
|
"# Cancel iteration 1 and move onto iteration 2.\n",
|
||||||
"# remote_run.cancel_iteration(1)"
|
"# remote_run.cancel_iteration(1)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -353,7 +369,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -371,7 +387,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any other metric"
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -388,7 +405,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a specific iteration"
|
"#### Model from a Specific Iteration"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -401,25 +418,6 @@
|
|||||||
"zero_run, zero_model = remote_run.get_output(iteration = iteration)"
|
"zero_run, zero_model = remote_run.get_output(iteration = iteration)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Register fitted model for deployment"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"description = 'AutoML Model'\n",
|
|
||||||
"tags = None\n",
|
|
||||||
"remote_run.register_model(description=description, tags=tags)\n",
|
|
||||||
"remote_run.model_id # Use this id to deploy the model as a web service in Azure"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -445,12 +443,12 @@
|
|||||||
"le = LabelEncoder()\n",
|
"le = LabelEncoder()\n",
|
||||||
"le.fit(df[\"Label\"].values)\n",
|
"le.fit(df[\"Label\"].values)\n",
|
||||||
"y = le.transform(df[\"Label\"].values)\n",
|
"y = le.transform(df[\"Label\"].values)\n",
|
||||||
"df = df.drop([\"Label\"], axis=1)\n",
|
"X = df.drop([\"Label\"], axis=1)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n",
|
"_, X_test, _, y_test = train_test_split(X, y, test_size=0.1, random_state=42)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ypred = fitted_model.predict(df_test.values)\n",
|
"ypred = fitted_model.predict(X_test.values)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ypred_strings = le.inverse_transform(ypred)\n",
|
"ypred_strings = le.inverse_transform(ypred)\n",
|
||||||
@@ -462,16 +460,14 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cm.plot()"
|
"cm.plot()"
|
||||||
]
|
]
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,33 +13,32 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 05 : Blacklisting models, Early termination and handling missing data\n",
|
"# AutoML 05: Blacklisting Models, Early Termination, and Handling Missing Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metric so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a black list of algos that AutoML will ignore for this run.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for handling missing values in data. We also provide a stopping metric indicating a target for the primary metrics so that AutoML can terminate the run without necessarly going through all the iterations. Finally, if you want to avoid a certain pipeline, we allow you to specify a blacklist of algorithms that AutoML will ignore for this run.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Creating an Experiment using an existing Workspace\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Instantiating AutoMLConfig\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"4. Training the Model\n",
|
"4. Train the model.\n",
|
||||||
"5. Exploring the results\n",
|
"5. Explore the results.\n",
|
||||||
"6. Testing the fitted model\n",
|
"6. Test the best fitted model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In addition this notebook showcases the following features\n",
|
"In addition this notebook showcases the following features\n",
|
||||||
"- **Blacklist** certain pipelines\n",
|
"- **Blacklisting** certain pipelines\n",
|
||||||
"- Specify a **target metrics** to indicate stopping criteria\n",
|
"- Specifying **target metrics** to indicate stopping criteria\n",
|
||||||
"- Handling **Missing Data** in the input\n"
|
"- Handling **missing data** in the input\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Create Experiment\n",
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
"\n",
|
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -73,9 +72,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for the experiment\n",
|
"# Choose a name for the experiment.\n",
|
||||||
"experiment_name = 'automl-local-missing-data'\n",
|
"experiment_name = 'automl-local-missing-data'\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -98,7 +96,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -115,7 +113,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Creating Missing Data"
|
"### Creating missing data"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -127,17 +125,17 @@
|
|||||||
"from scipy import sparse\n",
|
"from scipy import sparse\n",
|
||||||
"\n",
|
"\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[10:,:]\n",
|
"X_train = digits.data[10:,:]\n",
|
||||||
"y_digits = digits.target[10:]\n",
|
"y_train = digits.target[10:]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Add missing values in 75% of the lines\n",
|
"# Add missing values in 75% of the lines.\n",
|
||||||
"missing_rate = 0.75\n",
|
"missing_rate = 0.75\n",
|
||||||
"n_missing_samples = int(np.floor(X_digits.shape[0] * missing_rate))\n",
|
"n_missing_samples = int(np.floor(X_train.shape[0] * missing_rate))\n",
|
||||||
"missing_samples = np.hstack((np.zeros(X_digits.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n",
|
"missing_samples = np.hstack((np.zeros(X_train.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))\n",
|
||||||
"rng = np.random.RandomState(0)\n",
|
"rng = np.random.RandomState(0)\n",
|
||||||
"rng.shuffle(missing_samples)\n",
|
"rng.shuffle(missing_samples)\n",
|
||||||
"missing_features = rng.randint(0, X_digits.shape[1], n_missing_samples)\n",
|
"missing_features = rng.randint(0, X_train.shape[1], n_missing_samples)\n",
|
||||||
"X_digits[np.where(missing_samples)[0], missing_features] = np.nan"
|
"X_train[np.where(missing_samples)[0], missing_features] = np.nan"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -146,8 +144,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"df = pd.DataFrame(data=X_digits)\n",
|
"df = pd.DataFrame(data = X_train)\n",
|
||||||
"df['Label'] = pd.Series(y_digits, index=df.index)\n",
|
"df['Label'] = pd.Series(y_train, index=df.index)\n",
|
||||||
"df.head()"
|
"df.head()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -155,21 +153,20 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate Auto ML Config\n",
|
"## Configure AutoML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment. This includes setting `exit_score`, which should cause the run to complete before the `iterations` count is reached.\n",
|
||||||
"This defines the settings and data used to run the experiment.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**task**|classification or regression|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|Time limit in seconds for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**preprocess**| *True/False* <br>Setting this to *True* enables Auto ML to perform preprocessing <br>on the input to handle *missing data*, and perform some common *feature extraction*|\n",
|
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
|
||||||
"|**exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|\n",
|
"|**exit_score**|*double* value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
|
||||||
"|**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for Auto ML.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGDClassifierWrapper</i><br><i>NBWrapper</i><br><i>BernoulliNB</i><br><i>SVCWrapper</i><br><i>LinearSVMWrapper</i><br><i>KNeighborsClassifier</i><br><i>DecisionTreeClassifier</i><br><i>RandomForestClassifier</i><br><i>ExtraTreesClassifier</i><br><i>LightGBMClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet<i><br><i>GradientBoostingRegressor<i><br><i>DecisionTreeRegressor<i><br><i>KNeighborsRegressor<i><br><i>LassoLars<i><br><i>SGDRegressor<i><br><i>RandomForestRegressor<i><br><i>ExtraTreesRegressor<i>|\n",
|
"|**blacklist_algos**|*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGDClassifierWrapper</i><br><i>NBWrapper</i><br><i>BernoulliNB</i><br><i>SVCWrapper</i><br><i>LinearSVMWrapper</i><br><i>KNeighborsClassifier</i><br><i>DecisionTreeClassifier</i><br><i>RandomForestClassifier</i><br><i>ExtraTreesClassifier</i><br><i>LightGBMClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet<i><br><i>GradientBoostingRegressor<i><br><i>DecisionTreeRegressor<i><br><i>KNeighborsRegressor<i><br><i>LassoLars<i><br><i>SGDRegressor<i><br><i>RandomForestRegressor<i><br><i>ExtraTreesRegressor<i>|\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
@@ -188,11 +185,11 @@
|
|||||||
" iterations = 20,\n",
|
" iterations = 20,\n",
|
||||||
" n_cross_validations = 5,\n",
|
" n_cross_validations = 5,\n",
|
||||||
" preprocess = True,\n",
|
" preprocess = True,\n",
|
||||||
" exit_score = 0.994,\n",
|
" exit_score = 0.9984,\n",
|
||||||
" blacklist_algos = ['KNeighborsClassifier','LinearSVMWrapper'],\n",
|
" blacklist_algos = ['KNeighborsClassifier','LinearSVMWrapper'],\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" X = X_digits, \n",
|
" X = X_train, \n",
|
||||||
" y = y_digits,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -200,10 +197,10 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Training the Model\n",
|
"## Train the Models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -219,18 +216,18 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Exploring the results"
|
"## Explore the Results"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for monitoring runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget will display a link at the bottom. This will not currently work, but will eventually link to a web-ui to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -249,7 +246,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -275,7 +272,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. Each pipeline is a tuple of three elements. The first element is the score for the pipeline the second element is the string description of the pipeline and the last element are the pipeline objects used for each fold in the cross-validation."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -291,7 +288,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any other metric"
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -308,7 +306,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a specific iteration"
|
"#### Model from a Specific Iteration\n",
|
||||||
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -325,26 +324,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register fitted model for deployment"
|
"### Testing the best Fitted Model"
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"description = 'AutoML Model'\n",
|
|
||||||
"tags = None\n",
|
|
||||||
"local_run.register_model(description=description, tags=tags)\n",
|
|
||||||
"local_run.model_id # Use this id to deploy the model as a web service in Azure"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Testing the Fitted Model "
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -354,15 +334,15 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_digits = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]\n",
|
"images = digits.images[:10]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#Randomly select digits and test\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in np.random.choice(len(y_digits), 2):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
" print(index)\n",
|
" print(index)\n",
|
||||||
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
" label = y_digits[index]\n",
|
" label = y_test[index]\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
@@ -373,6 +353,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,31 +13,31 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 06: Custom CV splits, handling sparse data\n",
|
"# AutoML 06: Custom CV Splits and Handling Sparse Data\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit learn's [20newsgroup](In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for handling sparse data and specify custom cross validation splits.\n",
|
"In this example we use the scikit-learn's [20newsgroup](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups.html) to showcase how you can use AutoML for handling sparse data and how to specify custom cross validations splits.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Creating an Experiment using an existing Workspace\n",
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
"2. Instantiating AutoMLConfig\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"4. Training the Model\n",
|
"4. Train the model.\n",
|
||||||
"5. Exploring the results\n",
|
"5. Explore the results.\n",
|
||||||
"6. Testing the fitted model\n",
|
"6. Test the best fitted model.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In addition this notebook showcases the following features\n",
|
"In addition this notebook showcases the following features\n",
|
||||||
"- **Custom CV** splits \n",
|
"- **Custom CV** splits \n",
|
||||||
"- Handling **Sparse Data** in the input"
|
"- Handling **sparse data** in the input"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -96,7 +96,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -137,17 +137,17 @@
|
|||||||
" shuffle = True, random_state = 42,\n",
|
" shuffle = True, random_state = 42,\n",
|
||||||
" remove = remove)\n",
|
" remove = remove)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"X_train, X_validation, y_train, y_validation = train_test_split(data_train.data, data_train.target, test_size=0.33, random_state=42)\n",
|
"X_train, X_valid, y_train, y_valid = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
|
"vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
|
||||||
" n_features = 2**16)\n",
|
" n_features = 2**16)\n",
|
||||||
"X_train = vectorizer.transform(X_train)\n",
|
"X_train = vectorizer.transform(X_train)\n",
|
||||||
"X_validation = vectorizer.transform(X_validation)\n",
|
"X_valid = vectorizer.transform(X_valid)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n",
|
"summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n",
|
||||||
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
|
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
|
||||||
"summary_df['Validation Set'] = [X_validation.shape[0], X_validation.shape[1]]\n",
|
"summary_df['Validation Set'] = [X_valid.shape[0], X_valid.shape[1]]\n",
|
||||||
"summary_df"
|
"summary_df"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -155,21 +155,21 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate Auto ML Config\n",
|
"## Configure AutoML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This defines the settings and data used to run the experiment.\n",
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**task**|classification or regression|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|Time limit in seconds for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**preprocess**| *True/False* <br>Setting this to *True* enables Auto ML to perform preprocessing <br>on the input to handle *missing data*, and perform some common *feature extraction*<br>*Note: If input data is Sparse you cannot use preprocess=True*|\n",
|
"|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and to perform some common *feature extraction*.<br>**Note:** If input data is sparse, you cannot use *True*.|\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom Validation set|\n",
|
"|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom validation set.|\n",
|
||||||
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. for the custom Validation set|\n",
|
"|**y_valid**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification for the custom validation set.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -188,8 +188,8 @@
|
|||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" X = X_train, \n",
|
" X = X_train, \n",
|
||||||
" y = y_train,\n",
|
" y = y_train,\n",
|
||||||
" X_valid = X_validation, \n",
|
" X_valid = X_valid, \n",
|
||||||
" y_valid = y_validation, \n",
|
" y_valid = y_valid, \n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -197,10 +197,10 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Training the Model\n",
|
"## Train the Models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -216,18 +216,18 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Exploring the results"
|
"## Explore the Results"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for monitoring runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -246,7 +246,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
"#### Retrieve All Child Runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use sdk methods to fetch all the child runs and see individual metrics that we log. "
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -266,20 +266,13 @@
|
|||||||
"rundata"
|
"rundata"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -295,7 +288,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any other metric"
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
|
"Show the run and the model which has the smallest `accuracy` value:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -312,7 +306,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Model from a specific iteration"
|
"#### Model from a Specific Iteration\n",
|
||||||
|
"Show the run and the model from the third iteration:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -329,7 +324,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register fitted model for deployment"
|
"### Testing the Best Fitted Model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -338,63 +333,34 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"# Load test data.\n",
|
||||||
"tags = None\n",
|
|
||||||
"local_run.register_model(description=description, tags=tags)\n",
|
|
||||||
"local_run.model_id # Use this id to deploy the model as a web service in Azure"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Testing the Fitted Model "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"digits = datasets.load_digits()### Testing the Fitted Model\n",
|
|
||||||
"\n",
|
|
||||||
"#### Load Test Data\n",
|
|
||||||
"import sklearn\n",
|
|
||||||
"from pandas_ml import ConfusionMatrix\n",
|
"from pandas_ml import ConfusionMatrix\n",
|
||||||
"\n",
|
"\n",
|
||||||
"remove = ('headers', 'footers', 'quotes')\n",
|
|
||||||
"categories = [\n",
|
|
||||||
" 'alt.atheism',\n",
|
|
||||||
" 'talk.religion.misc',\n",
|
|
||||||
" 'comp.graphics',\n",
|
|
||||||
" 'sci.space',\n",
|
|
||||||
"]\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
|
"data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
|
||||||
" shuffle = True, random_state = 42,\n",
|
" shuffle = True, random_state = 42,\n",
|
||||||
" remove = remove)\n",
|
" remove = remove)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"vectorizer = HashingVectorizer(stop_words='english', alternate_sign=False,\n",
|
|
||||||
" n_features=2**16)\n",
|
|
||||||
"\n",
|
|
||||||
"X_test = vectorizer.transform(data_test.data)\n",
|
"X_test = vectorizer.transform(data_test.data)\n",
|
||||||
"y_test = data_test.target\n",
|
"y_test = data_test.target\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Testing our best pipeline\n",
|
"# Test our best pipeline.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ypred = fitted_model.predict(X_test)\n",
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
"ypred_strings = [categories[i] for i in ypred]\n",
|
"y_pred_strings = [data_test.target_names[i] for i in y_pred]\n",
|
||||||
"ytest_strings = [categories[i] for i in y_test]\n",
|
"y_test_strings = [data_test.target_names[i] for i in y_test]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"cm = ConfusionMatrix(ytest_strings, ypred_strings)\n",
|
"cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n",
|
||||||
"print(cm)\n",
|
"print(cm)\n",
|
||||||
"cm.plot()"
|
"cm.plot()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,17 +13,17 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 07: Exploring previous runs\n",
|
"# AutoML 07: Exploring Previous Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n",
|
"In this example we present some examples on navigating previously executed runs. We also show how you can download a fitted model for any previous run.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. List all Experiments for the workspace\n",
|
"1. List all experiments in a workspace.\n",
|
||||||
"2. List AutoML runs for an Experiment\n",
|
"2. List all AutoML runs in an experiment.\n",
|
||||||
"3. Get details for a AutoML Run. (Automl settings, run widget & all metrics)\n",
|
"3. Get details for an AutoML run, including settings, run widget, and all metrics.\n",
|
||||||
"4. Download fitted pipeline for any iteration\n"
|
"4. Download a fitted pipeline for any iteration.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -87,7 +87,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -104,8 +104,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# List AutoML runs for an Experiment\n",
|
"# List AutoML runs for an experiment\n",
|
||||||
"You can set <i>Experiment</i> name with any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
"Set `experiment_name` to any experiment name from the result of the Experiment.list cell to load the AutoML runs."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -114,7 +114,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell\n",
|
"experiment_name = 'automl-local-classification' # Replace this with any project name from previous cell.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"proj = ws.experiments()[experiment_name]\n",
|
"proj = ws.experiments()[experiment_name]\n",
|
||||||
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n",
|
"summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])\n",
|
||||||
@@ -143,7 +143,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Get Details for a Auto ML Run\n",
|
"# Get details for an AutoML run\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
"Copy the project name and run id from the previous cell output to find more details on a particular run."
|
||||||
]
|
]
|
||||||
@@ -154,7 +154,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run_id = '' # Filling your own run_id\n",
|
"run_id = '' # Filling your own run_id from above run ids\n",
|
||||||
|
"assert (run_id in summary_df.keys()),\"Run id not found! Please set run id to a value from above run ids\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.train.widgets import RunDetails\n",
|
"from azureml.train.widgets import RunDetails\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -210,7 +211,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Download best model for any given metric"
|
"## Download the Best Model for Any Given Metric"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -219,7 +220,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name\n",
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
"best_run, fitted_model = ml_run.get_output(metric = metric)\n",
|
||||||
"fitted_model"
|
"fitted_model"
|
||||||
]
|
]
|
||||||
@@ -228,7 +229,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Download model for any given iteration"
|
"## Download the Model for Any Given Iteration"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -237,7 +238,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 4 # Replace with an interation number\n",
|
"iteration = 4 # Replace with an iteration number.\n",
|
||||||
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
"best_run, fitted_model = ml_run.get_output(iteration = iteration)\n",
|
||||||
"fitted_model"
|
"fitted_model"
|
||||||
]
|
]
|
||||||
@@ -246,7 +247,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Register fitted model for deployment"
|
"# Register fitted model for deployment\n",
|
||||||
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -258,14 +260,14 @@
|
|||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags)\n",
|
"ml_run.register_model(description = description, tags = tags)\n",
|
||||||
"ml_run.model_id # Use this id to deploy the model as a web service in Azure"
|
"ml_run.model_id # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register best model for any given metric"
|
"## Register the Best Model for Any Given Metric"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -274,18 +276,18 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"metric = 'AUC_weighted' # Replace with a metric name\n",
|
"metric = 'AUC_weighted' # Replace with a metric name.\n",
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
"ml_run.register_model(description = description, tags = tags, metric = metric)\n",
|
||||||
"ml_run.model_id # Use this id to deploy the model as a web service in Azure"
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Register model for any given iteration"
|
"## Register the Model for Any Given Iteration"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -294,15 +296,20 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"iteration = 4 # Replace with an interation number\n",
|
"iteration = 4 # Replace with an iteration number.\n",
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
"ml_run.register_model(description = description, tags = tags, iteration = iteration)\n",
|
||||||
"ml_run.model_id # Use this id to deploy the model as a web service in Azure"
|
"print(ml_run.model_id) # Use this id to deploy the model as a web service in Azure."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,15 +13,18 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 08: Remote Execution with Text file\n",
|
"# AutoML 08: Remote Execution with DataStore\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this sample accesses a data file on a remote DSVM. This is more efficient than reading the file from Blob storage in the get_data method.\n",
|
"This sample accesses a data file on a remote DSVM through DataStore. Advantages of using data store are:\n",
|
||||||
|
"1. DataStore secures the access details.\n",
|
||||||
|
"2. DataStore supports read, write to blob and file store\n",
|
||||||
|
"3. AutoML natively supports copying data from DataStore to DSVM\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you would see\n",
|
||||||
"1. Configuring the DSVM to allow files to be access directly by the get_data method.\n",
|
"1. Storing data in DataStore.\n",
|
||||||
"2. get_data returning data from a local file.\n",
|
"2. get_data returning data from DataStore.\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -66,7 +69,7 @@
|
|||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for experiment\n",
|
"# choose a name for experiment\n",
|
||||||
"experiment_name = 'automl-remote-dsvm-file'\n",
|
"experiment_name = 'automl-remote-datastore-file'\n",
|
||||||
"# project folder\n",
|
"# project folder\n",
|
||||||
"project_folder = './sample_projects/automl-remote-dsvm-file'\n",
|
"project_folder = './sample_projects/automl-remote-dsvm-file'\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -120,15 +123,16 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import DsvmCompute\n",
|
"from azureml.core.compute import DsvmCompute\n",
|
||||||
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
|
"\n",
|
||||||
|
"compute_target_name = 'mydsvm'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"dsvm_name = 'mydsvm'\n",
|
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
|
" dsvm_compute = DsvmCompute(workspace=ws, name=compute_target_name)\n",
|
||||||
" print('found existing dsvm.')\n",
|
" print('found existing:', dsvm_compute.name)\n",
|
||||||
"except:\n",
|
"except ComputeTargetException:\n",
|
||||||
" print('creating new dsvm.')\n",
|
|
||||||
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size=\"Standard_D2_v2\")\n",
|
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size=\"Standard_D2_v2\")\n",
|
||||||
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
|
" dsvm_compute = DsvmCompute.create(ws, name=compute_target_name, provisioning_configuration=dsvm_config)\n",
|
||||||
" dsvm_compute.wait_for_completion(show_output=True)"
|
" dsvm_compute.wait_for_completion(show_output=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -136,9 +140,18 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Copy data file to the DSVM\n",
|
"## Copy data file to local\n",
|
||||||
"Download the data file.\n",
|
"\n",
|
||||||
"Copy the data file to the DSVM under the folder: /tmp/data"
|
"Download the data file.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"mkdir data"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -149,9 +162,84 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n",
|
"df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n",
|
||||||
" delimiter=\"\\t\", quotechar='\"')\n",
|
" delimiter=\"\\t\", quotechar='\"')\n",
|
||||||
"df.to_csv(\"data.tsv\", sep=\"\\t\", quotechar='\"', index=False)\n",
|
"df.to_csv(\"data/data.tsv\", sep=\"\\t\", quotechar='\"', index=False)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Upload data to the cloud"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Now copy the file data.tsv to the folder /tmp/data on the DSVM"
|
"The data.tsv files are uploaded into a directory named data at the root of the datastore."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace, Datastore\n",
|
||||||
|
"#blob_datastore = Datastore(ws, blob_datastore_name)\n",
|
||||||
|
"ds = ws.get_default_datastore()\n",
|
||||||
|
"print(ds.datastore_type, ds.account_name, ds.container_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# ds.upload_files(\"data.tsv\")\n",
|
||||||
|
"ds.upload(src_dir='./data', target_path='data', overwrite=True, show_progress=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Configure & Run\n",
|
||||||
|
"\n",
|
||||||
|
"First let's create a DataReferenceConfigruation object to inform the system what data folder to download to the copmute target."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import DataReferenceConfiguration\n",
|
||||||
|
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
|
||||||
|
" path_on_datastore='data', \n",
|
||||||
|
" mode='download', # download files from datastore to compute target\n",
|
||||||
|
" overwrite=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to the Linux DSVM\n",
|
||||||
|
"conda_run_config.target = dsvm_compute.name\n",
|
||||||
|
"# set the data reference of the run coonfiguration\n",
|
||||||
|
"conda_run_config.data_references = {ds.name: dr}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -186,20 +274,22 @@
|
|||||||
"from sklearn.model_selection import train_test_split\n",
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"from sklearn.preprocessing import LabelEncoder\n",
|
"from sklearn.preprocessing import LabelEncoder\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
|
"from os.path import expanduser, join, dirname\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def get_data():\n",
|
"def get_data():\n",
|
||||||
" # Burning man 2016 data\n",
|
" # Burning man 2016 data\n",
|
||||||
" df = pd.read_csv('/tmp/data/data.tsv',\n",
|
" df = pd.read_csv(join(dirname(os.path.realpath(__file__)),\n",
|
||||||
" delimiter=\"\\t\", quotechar='\"')\n",
|
" os.environ[\"AZUREML_DATAREFERENCE_workspacefilestore\"],\n",
|
||||||
|
" \"data.tsv\"), delimiter=\"\\t\", quotechar='\"')\n",
|
||||||
" # get integer labels\n",
|
" # get integer labels\n",
|
||||||
" le = LabelEncoder()\n",
|
" le = LabelEncoder()\n",
|
||||||
" le.fit(df[\"Label\"].values)\n",
|
" le.fit(df[\"Label\"].values)\n",
|
||||||
" y = le.transform(df[\"Label\"].values)\n",
|
" y = le.transform(df[\"Label\"].values)\n",
|
||||||
" df = df.drop([\"Label\"], axis=1)\n",
|
" X = df.drop([\"Label\"], axis=1)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)\n",
|
" X_train, _, y_train, _ = train_test_split(X, y, test_size=0.1, random_state=42)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" return { \"X\" : df.values, \"y\" : y }"
|
" return { \"X\" : X_train.values, \"y\" : y_train }"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -210,7 +300,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
|
"You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
|
"<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to AutoMLConfig.</i>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
@@ -241,7 +331,8 @@
|
|||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
" path=project_folder,\n",
|
" path=project_folder,\n",
|
||||||
" compute_target = dsvm_compute,\n",
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" #compute_target = dsvm_compute,\n",
|
||||||
" data_script = project_folder + \"/get_data.py\",\n",
|
" data_script = project_folder + \"/get_data.py\",\n",
|
||||||
" **automl_settings\n",
|
" **automl_settings\n",
|
||||||
" )"
|
" )"
|
||||||
@@ -251,7 +342,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Training the Model <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
|
"## Training the Models <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run."
|
"For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run."
|
||||||
]
|
]
|
||||||
@@ -319,7 +410,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Canceling runs\n",
|
"## Canceling Runs\n",
|
||||||
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
|
"You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -330,7 +421,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
|
"# Cancel the ongoing experiment and stop scheduling new iterations\n",
|
||||||
"# remote_run.cancel()\n",
|
"remote_run.cancel()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Cancel iteration 1 and move onto iteration 2\n",
|
"# Cancel iteration 1 and move onto iteration 2\n",
|
||||||
"# remote_run.cancel_iteration(1)"
|
"# remote_run.cancel_iteration(1)"
|
||||||
@@ -342,7 +433,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The *get_output* method returns the best run and the fitted model. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -392,26 +483,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register fitted model for deployment"
|
"### Testing the Best Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n"
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"description = 'AutoML Model'\n",
|
|
||||||
"tags = None\n",
|
|
||||||
"remote_run.register_model(description=description, tags=tags)\n",
|
|
||||||
"remote_run.model_id # Use this id to deploy the model as a web service in Azure"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Testing the Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -432,11 +504,11 @@
|
|||||||
"le = LabelEncoder()\n",
|
"le = LabelEncoder()\n",
|
||||||
"le.fit(df[\"Label\"].values)\n",
|
"le.fit(df[\"Label\"].values)\n",
|
||||||
"y = le.transform(df[\"Label\"].values)\n",
|
"y = le.transform(df[\"Label\"].values)\n",
|
||||||
"df = df.drop([\"Label\"], axis=1)\n",
|
"X = df.drop([\"Label\"], axis=1)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n",
|
"_, X_test, _, y_test = train_test_split(X, y, test_size=0.1, random_state=42)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ypred = fitted_model.predict(df_test.values)\n",
|
"ypred = fitted_model.predict(X_test.values)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"ypred_strings = le.inverse_transform(ypred)\n",
|
"ypred_strings = le.inverse_transform(ypred)\n",
|
||||||
"ytest_strings = le.inverse_transform(y_test)\n",
|
"ytest_strings = le.inverse_transform(y_test)\n",
|
||||||
@@ -447,16 +519,14 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"cm.plot()"
|
"cm.plot()"
|
||||||
]
|
]
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,29 +13,30 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 09: Classification with deployment\n",
|
"# AutoML 09: Classification with Deployment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem.\n",
|
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem and deploy it to an Azure Container Instance (ACI).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Creating an Experiment using an existing Workspace\n",
|
"1. Create an experiment using an existing workspace.\n",
|
||||||
"2. Instantiating AutoMLConfig\n",
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
"3. Training the Model using local compute\n",
|
"3. Train the model using local compute.\n",
|
||||||
"4. Exploring the results\n",
|
"4. Explore the results.\n",
|
||||||
"5. Registering the model\n",
|
"5. Register the model.\n",
|
||||||
"6. Creating Image and creating aci service\n",
|
"6. Create a container image.\n",
|
||||||
"7. Testing the aci service\n"
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
|
"8. Test the ACI service.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -95,7 +96,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -112,17 +113,17 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate Auto ML Config\n",
|
"## Configure AutoML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"|Property|Description|\n",
|
"|Property|Description|\n",
|
||||||
"|-|-|\n",
|
"|-|-|\n",
|
||||||
"|**task**|classification or regression|\n",
|
"|**task**|classification or regression|\n",
|
||||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
|
||||||
"|**max_time_sec**|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|Time limit in seconds for each iteration.|\n",
|
||||||
"|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n",
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
"|**n_cross_validations**|Number of cross validation splits|\n",
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
"|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
|
||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
@@ -135,8 +136,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[10:,:]\n",
|
"X_train = digits.data[10:,:]\n",
|
||||||
"y_digits = digits.target[10:]\n",
|
"y_train = digits.target[10:]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" name = experiment_name,\n",
|
" name = experiment_name,\n",
|
||||||
@@ -146,8 +147,8 @@
|
|||||||
" iterations = 10,\n",
|
" iterations = 10,\n",
|
||||||
" n_cross_validations = 2,\n",
|
" n_cross_validations = 2,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" X = X_digits, \n",
|
" X = X_train, \n",
|
||||||
" y = y_digits,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -155,10 +156,10 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Training the Model\n",
|
"## Train the Models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -176,7 +177,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -192,7 +193,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Register fitted model for deployment"
|
"### Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -203,7 +205,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"description = 'AutoML Model'\n",
|
"description = 'AutoML Model'\n",
|
||||||
"tags = None\n",
|
"tags = None\n",
|
||||||
"model = local_run.register_model(description=description, tags=tags, iteration=8)\n",
|
"model = local_run.register_model(description = description, tags = tags)\n",
|
||||||
"local_run.model_id # This will be written to the script file later in the notebook."
|
"local_run.model_id # This will be written to the script file later in the notebook."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -211,7 +213,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Scoring script ###"
|
"### Create Scoring Script"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -249,14 +251,14 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create yml file for env"
|
"### Create a YAML File for the Environment"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To ensure the consistence the fit results with the training results, the sdk dependence versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook 12.auto-ml-retrieve-the-training-sdk-versions.ipynb."
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -338,7 +340,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create Image ###"
|
"### Create a Container Image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -368,7 +370,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Deploy Image as web service on Azure Container Instance ###"
|
"### Deploy the Image as a Web Service on Azure Container Instance"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -407,7 +409,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### To delete a service ##"
|
"### Delete a Web Service"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -423,7 +425,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### To get logs from deployed service ###"
|
"### Get Logs from a Deployed Web Service"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -439,7 +441,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Test Web Service ###"
|
"### Test a Web Service"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -450,15 +452,15 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"#Randomly select digits and test\n",
|
"#Randomly select digits and test\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_digits = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]\n",
|
"images = digits.images[:10]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for index in np.random.choice(len(y_digits), 3):\n",
|
"for index in np.random.choice(len(y_test), 3, replace = False):\n",
|
||||||
" print(index)\n",
|
" print(index)\n",
|
||||||
" test_sample = json.dumps({'data':X_digits[index:index + 1].tolist()})\n",
|
" test_sample = json.dumps({'data':X_test[index:index + 1].tolist()})\n",
|
||||||
" predicted = aci_service.run(input_data = test_sample)\n",
|
" predicted = aci_service.run(input_data = test_sample)\n",
|
||||||
" label = y_digits[index]\n",
|
" label = y_test[index]\n",
|
||||||
" predictedDict = json.loads(predicted)\n",
|
" predictedDict = json.loads(predicted)\n",
|
||||||
" title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0])\n",
|
" title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0])\n",
|
||||||
" fig = plt.figure(1, figsize = (3,3))\n",
|
" fig = plt.figure(1, figsize = (3,3))\n",
|
||||||
@@ -467,16 +469,14 @@
|
|||||||
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
||||||
" plt.show()"
|
" plt.show()"
|
||||||
]
|
]
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,14 +13,11 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 10: Multi output Example for AutoML"
|
"# AutoML 10: Multi-output\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"This notebook shows how to use AutoML to train multi-output problems by leveraging the correlation between the outputs using indicator vectors.\n",
|
||||||
{
|
"\n",
|
||||||
"cell_type": "markdown",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook."
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"This notebook shows an example to use AutoML to train the multi output problems by leveraging the correlation between the outputs using indicator vectors."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -52,7 +49,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -69,18 +66,18 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Transformer functions\n",
|
"## Transformer Functions\n",
|
||||||
"The transformation of the input are happening for input X and Y as following, e.g. Y = {y_1, y_2}, then X becomes\n",
|
"The transformations of inputs `X` and `y` are happening as follows, e.g. `y = {y_1, y_2}`, then `X` becomes\n",
|
||||||
" \n",
|
" \n",
|
||||||
"X 1 0\n",
|
"`X 1 0`\n",
|
||||||
" \n",
|
" \n",
|
||||||
"X 0 1\n",
|
"`X 0 1`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"and Y becomes,\n",
|
"and `y` becomes,\n",
|
||||||
"\n",
|
"\n",
|
||||||
"y_1\n",
|
"`y_1`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"y_2"
|
"`y_2`"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -93,24 +90,24 @@
|
|||||||
"from scipy import linalg\n",
|
"from scipy import linalg\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#Transformer functions\n",
|
"#Transformer functions\n",
|
||||||
"def multi_output_transform_x_y(X, Y):\n",
|
"def multi_output_transform_x_y(X, y):\n",
|
||||||
" X_new = multi_output_transformer_x(X, Y.shape[1])\n",
|
" X_new = multi_output_transformer_x(X, y.shape[1])\n",
|
||||||
" y_new = multi_output_transform_y(Y)\n",
|
" y_new = multi_output_transform_y(y)\n",
|
||||||
" return X_new, y_new\n",
|
" return X_new, y_new\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def multi_output_transformer_x(X, number_of_columns_Y):\n",
|
"def multi_output_transformer_x(X, number_of_columns_y):\n",
|
||||||
" indicator_vecs = linalg.block_diag(*([np.ones((X.shape[0], 1))] * number_of_columns_Y))\n",
|
" indicator_vecs = linalg.block_diag(*([np.ones((X.shape[0], 1))] * number_of_columns_y))\n",
|
||||||
" if sparse.issparse(X):\n",
|
" if sparse.issparse(X):\n",
|
||||||
" X_new = sparse.vstack(np.tile(X, number_of_columns_Y))\n",
|
" X_new = sparse.vstack(np.tile(X, number_of_columns_y))\n",
|
||||||
" indicator_vecs = sparse.coo_matrix(indicator_vecs)\n",
|
" indicator_vecs = sparse.coo_matrix(indicator_vecs)\n",
|
||||||
" X_new = sparse.hstack((X_new, indicator_vecs))\n",
|
" X_new = sparse.hstack((X_new, indicator_vecs))\n",
|
||||||
" else:\n",
|
" else:\n",
|
||||||
" X_new = np.tile(X, (number_of_columns_Y, 1))\n",
|
" X_new = np.tile(X, (number_of_columns_y, 1))\n",
|
||||||
" X_new = np.hstack((X_new, indicator_vecs))\n",
|
" X_new = np.hstack((X_new, indicator_vecs))\n",
|
||||||
" return X_new\n",
|
" return X_new\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def multi_output_transform_y(Y):\n",
|
"def multi_output_transform_y(y):\n",
|
||||||
" return Y.reshape(-1, order=\"F\")\n",
|
" return y.reshape(-1, order=\"F\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def multi_output_inverse_transform_y(y, number_of_columns_y):\n",
|
"def multi_output_inverse_transform_y(y, number_of_columns_y):\n",
|
||||||
" return y.reshape((-1, number_of_columns_y), order = \"F\")"
|
" return y.reshape((-1, number_of_columns_y), order = \"F\")"
|
||||||
@@ -120,7 +117,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## AutoML experiment set up"
|
"## AutoML Experiment Setup"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -131,9 +128,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for experiment\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"experiment_name = 'automl-local-multi-output'\n",
|
"experiment_name = 'automl-local-multi-output'\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-local-multi-output'\n",
|
"project_folder = './sample_projects/automl-local-multi-output'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -154,7 +150,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create a random dataset for the test purpose "
|
"## Create a Random Dataset for Test Purposes"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -165,15 +161,15 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"rng = np.random.RandomState(1)\n",
|
"rng = np.random.RandomState(1)\n",
|
||||||
"X_train = np.sort(200 * rng.rand(600, 1) - 100, axis = 0)\n",
|
"X_train = np.sort(200 * rng.rand(600, 1) - 100, axis = 0)\n",
|
||||||
"Y_train = np.array([np.pi * np.sin(X_train).ravel(), np.pi * np.cos(X_train).ravel()]).T\n",
|
"y_train = np.array([np.pi * np.sin(X_train).ravel(), np.pi * np.cos(X_train).ravel()]).T\n",
|
||||||
"Y_train += (0.5 - rng.rand(*Y_train.shape))"
|
"y_train += (0.5 - rng.rand(*y_train.shape))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Perform X and Y transformation using transformer function"
|
"Perform X and y transformation using the transformer function."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -182,7 +178,14 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"X_train_transformed, y_train_transformed = multi_output_transform_x_y(X_train, Y_train)"
|
"X_train_transformed, y_train_transformed = multi_output_transform_x_y(X_train, y_train)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Configure AutoML using the transformed results."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -206,7 +209,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Fit the transformed data "
|
"## Fit the Transformed Data"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -224,7 +227,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Get the best fit model\n",
|
"# Get the best fit model.\n",
|
||||||
"best_run, fitted_model = local_run.get_output()"
|
"best_run, fitted_model = local_run.get_output()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -234,8 +237,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Generate random data set for predicting\n",
|
"# Generate random data set for predicting.\n",
|
||||||
"X_predict = np.sort(200 * rng.rand(200, 1) - 100, axis=0)"
|
"X_test = np.sort(200 * rng.rand(200, 1) - 100, axis = 0)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -244,11 +247,12 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Transform predict data\n",
|
"# Transform predict data.\n",
|
||||||
"X_predict_transformed = multi_output_transformer_x(X_predict, Y_train.shape[1])\n",
|
"X_test_transformed = multi_output_transformer_x(X_test, y_train.shape[1])\n",
|
||||||
"# Predict and inverse transform the prediction\n",
|
"\n",
|
||||||
"y_predict = fitted_model.predict(X_predict_transformed)\n",
|
"# Predict and inverse transform the prediction.\n",
|
||||||
"Y_predict = multi_output_inverse_transform_y(y_predict, Y_train.shape[1])"
|
"y_predict = fitted_model.predict(X_test_transformed)\n",
|
||||||
|
"y_predict = multi_output_inverse_transform_y(y_predict, y_train.shape[1])"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -257,18 +261,16 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(Y_predict)"
|
"print(y_predict)"
|
||||||
]
|
]
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": []
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,26 +13,22 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 11: Sample weight\n",
|
"# AutoML 11: Sample Weight\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use sample weight with the AutoML Classifier.\n",
|
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use sample weight with AutoML. Sample weight is used where some sample values are more important than others.\n",
|
||||||
"Sample weight is used where some sample values are more important than others.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results.\n"
|
||||||
"1. How to specifying sample_weight\n",
|
|
||||||
"2. The difference that it makes to test results\n",
|
|
||||||
"\n"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -66,11 +62,10 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for experiment\n",
|
"# Choose names for the regular and the sample weight experiments.\n",
|
||||||
"experiment_name = 'non_sample_weight_experiment'\n",
|
"experiment_name = 'non_sample_weight_experiment'\n",
|
||||||
"sample_weight_experiment_name = 'sample_weight_experiment'\n",
|
"sample_weight_experiment_name = 'sample_weight_experiment'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-local-classification'\n",
|
"project_folder = './sample_projects/automl-local-classification'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -94,7 +89,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -111,9 +106,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate Auto ML Config\n",
|
"## Configure AutoML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Instantiate two AutoMLConfig Objects. One will be used with sample_weight and one without."
|
"Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -123,12 +118,12 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"y_digits = digits.target[100:]\n",
|
"y_train = digits.target[100:]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.\n",
|
"# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.\n",
|
||||||
"# This makes the model more likely to classify as 4 if the image it not clear.\n",
|
"# This makes the model more likely to classify as 4 if the image it not clear.\n",
|
||||||
"sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_digits])\n",
|
"sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_train])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"automl_classifier = AutoMLConfig(task = 'classification',\n",
|
"automl_classifier = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -137,8 +132,8 @@
|
|||||||
" iterations = 10,\n",
|
" iterations = 10,\n",
|
||||||
" n_cross_validations = 2,\n",
|
" n_cross_validations = 2,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" X = X_digits, \n",
|
" X = X_train, \n",
|
||||||
" y = y_digits,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)\n",
|
" path = project_folder)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"automl_sample_weight = AutoMLConfig(task = 'classification',\n",
|
"automl_sample_weight = AutoMLConfig(task = 'classification',\n",
|
||||||
@@ -148,8 +143,8 @@
|
|||||||
" iterations = 10,\n",
|
" iterations = 10,\n",
|
||||||
" n_cross_validations = 2,\n",
|
" n_cross_validations = 2,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" X = X_digits, \n",
|
" X = X_train, \n",
|
||||||
" y = y_digits,\n",
|
" y = y_train,\n",
|
||||||
" sample_weight = sample_weight,\n",
|
" sample_weight = sample_weight,\n",
|
||||||
" path = project_folder)"
|
" path = project_folder)"
|
||||||
]
|
]
|
||||||
@@ -158,10 +153,10 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Training the Models\n",
|
"## Train the Models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Call the submit method on the experiment and pass the configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
|
"Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
"You will see the currently running iterations printing to the console."
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -181,7 +176,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Testing the Fitted Models\n",
|
"### Test the Best Fitted Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
]
|
||||||
@@ -193,8 +188,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[:100, :]\n",
|
"X_test = digits.data[:100, :]\n",
|
||||||
"y_digits = digits.target[:100]\n",
|
"y_test = digits.target[:100]\n",
|
||||||
"images = digits.images[:100]"
|
"images = digits.images[:100]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -202,7 +197,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Compare the pipelines\n",
|
"#### Compare the Models\n",
|
||||||
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
|
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -212,11 +207,11 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#Randomly select digits and test\n",
|
"# Randomly select digits and test.\n",
|
||||||
"for index in range(0,len(y_digits)):\n",
|
"for index in range(0,len(y_test)):\n",
|
||||||
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
" predicted_sample_weight = fitted_model_sample_weight.predict(X_digits[index:index + 1])[0]\n",
|
" predicted_sample_weight = fitted_model_sample_weight.predict(X_test[index:index + 1])[0]\n",
|
||||||
" label = y_digits[index]\n",
|
" label = y_test[index]\n",
|
||||||
" if predicted == 4 or predicted_sample_weight == 4 or label == 4:\n",
|
" if predicted == 4 or predicted_sample_weight == 4 or label == 4:\n",
|
||||||
" title = \"Label value = %d Predicted value = %d Prediced with sample weight = %d\" % (label, predicted, predicted_sample_weight)\n",
|
" title = \"Label value = %d Predicted value = %d Prediced with sample weight = %d\" % (label, predicted, predicted_sample_weight)\n",
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
@@ -228,6 +223,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -13,7 +13,11 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 12: Retrieving Training SDK Versions"
|
"# AutoML 12: Retrieving Training SDK Versions\n",
|
||||||
|
"\n",
|
||||||
|
"This example shows how to find the SDK versions used for an experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -46,7 +50,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Diagnostics\n",
|
"## Diagnostics\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -63,14 +67,14 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 1. Retrieve the SDK versions in the current env"
|
"# Retrieve the SDK versions in the current environment"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To retrieve the SDK versions in the current env, simple running get_sdk_dependencies()"
|
"To retrieve the SDK versions in the current environment, run `get_sdk_dependencies`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -86,7 +90,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 2. Training Model Using AutoML"
|
"# Train models using AutoML"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -97,9 +101,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for experiment\n",
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
"experiment_name = 'automl-local-classification'\n",
|
"experiment_name = 'automl-local-classification'\n",
|
||||||
"# project folder\n",
|
|
||||||
"project_folder = './sample_projects/automl-local-classification'\n",
|
"project_folder = './sample_projects/automl-local-classification'\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
@@ -123,8 +126,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[10:,:]\n",
|
"X_train = digits.data[10:,:]\n",
|
||||||
"y_digits = digits.target[10:]\n",
|
"y_train = digits.target[10:]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"automl_config = AutoMLConfig(task = 'classification',\n",
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
" debug_log = 'automl_errors.log',\n",
|
" debug_log = 'automl_errors.log',\n",
|
||||||
@@ -132,8 +135,8 @@
|
|||||||
" iterations = 3,\n",
|
" iterations = 3,\n",
|
||||||
" n_cross_validations = 2,\n",
|
" n_cross_validations = 2,\n",
|
||||||
" verbosity = logging.INFO,\n",
|
" verbosity = logging.INFO,\n",
|
||||||
" X = X_digits, \n",
|
" X = X_train, \n",
|
||||||
" y = y_digits,\n",
|
" y = y_train,\n",
|
||||||
" path = project_folder)\n",
|
" path = project_folder)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"local_run = experiment.submit(automl_config, show_output = True)"
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
||||||
@@ -143,14 +146,14 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 3. Retrieve the SDK versions from RunHistory"
|
"# Retrieve the SDK versions from RunHistory"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"To get the SDK versions from RunHistory, first the RunId need to be recorded. This can either be done by copy it from the output message or retieve if after each run."
|
"To get the SDK versions from RunHistory, first the run id needs to be recorded. This can either be done by copying it from the output message or by retrieving it after each run."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -159,6 +162,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"# Use a run id copied from an output message.\n",
|
||||||
|
"#run_id = 'AutoML_c0585b1f-a0e6-490b-84c7-3a099468b28e'\n",
|
||||||
|
"\n",
|
||||||
|
"# Retrieve the run id from a run.\n",
|
||||||
"run_id = local_run.id\n",
|
"run_id = local_run.id\n",
|
||||||
"print(run_id)"
|
"print(run_id)"
|
||||||
]
|
]
|
||||||
@@ -167,7 +174,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Initialize a new AutoMLRunClass."
|
"Initialize a new `AutoMLRun` object."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -177,7 +184,6 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"experiment_name = 'automl-local-classification'\n",
|
"experiment_name = 'automl-local-classification'\n",
|
||||||
"#run_id = 'AutoML_c0585b1f-a0e6-490b-84c7-3a099468b28e'\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"experiment = Experiment(ws, experiment_name)\n",
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)"
|
"ml_run = AutoMLRun(experiment = experiment, run_id = run_id)"
|
||||||
@@ -217,6 +223,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -14,14 +14,14 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# AutoML 13: Prepare Data using `azureml.dataprep`\n",
|
"# AutoML 13: Prepare Data using `azureml.dataprep`\n",
|
||||||
"In this example we showcase how you can use `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone - full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
|
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Make sure you have executed the [setup](00.configuration.ipynb) before running this notebook.\n",
|
"Make sure you have executed the [setup](00.configuration.ipynb) before running this notebook.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook you would see\n",
|
"In this notebook you will learn how to:\n",
|
||||||
"1. Defining data loading and preparation steps in a `Dataflow` using `azureml.dataprep`\n",
|
"1. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n",
|
||||||
"2. Passing the `Dataflow` to AutoML for local run\n",
|
"2. Pass the `Dataflow` to AutoML for a local run.\n",
|
||||||
"3. Passing the `Dataflow` to AutoML for remote run"
|
"3. Pass the `Dataflow` to AutoML for a remote run."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -31,23 +31,13 @@
|
|||||||
"## Install `azureml.dataprep` SDK"
|
"## Install `azureml.dataprep` SDK"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Please restart your kernel after the below installs.\n",
|
|
||||||
"\n",
|
|
||||||
"Tornado must be downgraded to a pre-5 version due to a known Tornado x Jupyter event loop bug."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"!pip install azureml-dataprep\n",
|
"!pip install azureml-dataprep"
|
||||||
"!pip install tornado==4.5.1"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -73,9 +63,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create Experiment\n",
|
"## Create an Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -139,12 +129,12 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# You can use `smart_read_file` which intelligently figures out delimiters and datatypes of a file\n",
|
"# You can use `smart_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
|
||||||
"# data pulled from sklearn.datasets.load_digits()\n",
|
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
|
||||||
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
|
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
|
||||||
"X = dprep.smart_read_file(simple_example_data_root + 'X.csv').skip(1) # remove header\n",
|
"X = dprep.smart_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter).\n",
|
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
|
||||||
"# and convert column types manually.\n",
|
"# and convert column types manually.\n",
|
||||||
"# Here we read a comma delimited file and convert all columns to integers.\n",
|
"# Here we read a comma delimited file and convert all columns to integers.\n",
|
||||||
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
|
||||||
@@ -156,7 +146,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Review the Data Preparation Result\n",
|
"## Review the Data Preparation Result\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large dataset."
|
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -172,9 +162,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Instantiate AutoML Settings\n",
|
"## Configure AutoML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This creates a general Auto ML Settings applicable for both Local and Remote runs."
|
"This creates a general AutoML settings object applicable for both local and remote runs."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -204,9 +194,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pass data with Dataflows\n",
|
"### Pass Data with `Dataflow` Objects\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `Dataflow` objects captured above can be passed to `submit` method for local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
"The `Dataflow` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `Dataflow` for model training."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -235,8 +225,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Remote Run\n",
|
"## Remote Run"
|
||||||
"*Note: This feature might not work properly in your workspace region before the October update. You may jump to the \"Exploring the results\" section below to explore other features AutoML and DataPrep has to offer.*"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -255,9 +244,9 @@
|
|||||||
"dsvm_name = 'mydsvm'\n",
|
"dsvm_name = 'mydsvm'\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
|
" dsvm_compute = DsvmCompute(ws, dsvm_name)\n",
|
||||||
" print('found existing dsvm.')\n",
|
" print('Found existing DVSM.')\n",
|
||||||
"except:\n",
|
"except:\n",
|
||||||
" print('creating new dsvm.')\n",
|
" print('Creating a new DSVM.')\n",
|
||||||
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
|
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
|
||||||
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
|
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
|
||||||
" dsvm_compute.wait_for_completion(show_output = True)"
|
" dsvm_compute.wait_for_completion(show_output = True)"
|
||||||
@@ -269,7 +258,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Update Conda Dependency file to have AutoML and DataPrep SDK\n",
|
"### Update Conda Dependency file to have AutoML and DataPrep SDK\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Currently AutoML and DataPrep SDK is not installed with Azure ML SDK by default. Due to this we update the conda dependency file to add such dependencies."
|
"Currently the AutoML and DataPrep SDKs are not installed with the Azure ML SDK by default. To circumvent this limitation, we update the conda dependency file to add these dependencies."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -279,15 +268,14 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"cd = CondaDependencies()\n",
|
"cd = CondaDependencies()\n",
|
||||||
"cd.add_pip_package(pip_package='azureml-dataprep')\n",
|
"cd.add_pip_package(pip_package='azureml-dataprep')"
|
||||||
"cd.add_pip_package(pip_package='tornado==4.5.1')"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create a RunConfiguration with DSVM name"
|
"### Create a `RunConfiguration` with DSVM name"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -305,9 +293,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Pass data with Dataflows\n",
|
"### Pass Data with `Dataflow` Objects\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The `Dataflow` objects captured above can also be passed to `submit` method for remote run. AutoML will serialize the `Dataflow` and send to remote compute target. The `Dataflow` will not be evaluated locally."
|
"The `Dataflow` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `Dataflow` object and send it to the remote compute target. The `Dataflow` will not be evaluated locally."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -323,27 +311,25 @@
|
|||||||
" X = X,\n",
|
" X = X,\n",
|
||||||
" y = y,\n",
|
" y = y,\n",
|
||||||
" **automl_settings)\n",
|
" **automl_settings)\n",
|
||||||
"# Please uncomment the line below to try out remote run with dataprep. \n",
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
"# This feature might not work properly in your workspace region before the October update.\n",
|
|
||||||
"# remote_run = experiment.submit(automl_config, show_output = True)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Exploring the results"
|
"## Explore the Results"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Widget for monitoring runs\n",
|
"#### Widget for Monitoring Runs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -360,7 +346,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Retrieve all child runs\n",
|
"#### Retrieve All Child Runs\n",
|
||||||
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -388,7 +374,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### Retrieve the Best Model\n",
|
"### Retrieve the Best Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -406,8 +392,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any other metric\n",
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
"Give me the run and the model that has the smallest `log_loss`:"
|
"Show the run and the model that has the smallest `log_loss` value:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -426,8 +412,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Best Model based on any iteration\n",
|
"#### Model from a Specific Iteration\n",
|
||||||
"Give me the run and the model from the 1st iteration:"
|
"Show the run and the model from the first iteration:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -446,7 +432,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Testing the Fitted Model \n",
|
"### Test the Best Fitted Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#### Load Test Data"
|
"#### Load Test Data"
|
||||||
]
|
]
|
||||||
@@ -460,8 +446,8 @@
|
|||||||
"from sklearn import datasets\n",
|
"from sklearn import datasets\n",
|
||||||
"\n",
|
"\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"X_digits = digits.data[:10, :]\n",
|
"X_test = digits.data[:10, :]\n",
|
||||||
"y_digits = digits.target[:10]\n",
|
"y_test = digits.target[:10]\n",
|
||||||
"images = digits.images[:10]"
|
"images = digits.images[:10]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -469,7 +455,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Testing our best pipeline\n",
|
"#### Testing Our Best Fitted Model\n",
|
||||||
"We will try to predict 2 digits and see how our model works."
|
"We will try to predict 2 digits and see how our model works."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -485,10 +471,10 @@
|
|||||||
"import random\n",
|
"import random\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for index in np.random.choice(len(y_digits), 2):\n",
|
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
|
||||||
" print(index)\n",
|
" print(index)\n",
|
||||||
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
|
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
|
||||||
" label = y_digits[index]\n",
|
" label = y_test[index]\n",
|
||||||
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
|
||||||
" fig = plt.figure(1, figsize=(3,3))\n",
|
" fig = plt.figure(1, figsize=(3,3))\n",
|
||||||
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
||||||
@@ -508,9 +494,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Capture the Dataflows to use for AutoML later\n",
|
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`Dataflow` objects are immutable. Each of them is composed of a list of data preparation steps. A `Dataflow` can be branched at any point for further usage."
|
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -544,6 +530,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "savitam"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
108
automl/README.md
108
automl/README.md
@@ -1,54 +1,24 @@
|
|||||||
# Table of Contents
|
# Table of Contents
|
||||||
1. [Automated ML Introduction](#introduction)
|
1. [Auto ML Introduction](#introduction)
|
||||||
1. [Running samples in Azure Notebooks](#jupyter)
|
2. [Running samples in a Local Conda environment](#localconda)
|
||||||
1. [Running samples in a Local Conda environment](#localconda)
|
3. [Auto ML SDK Sample Notebooks](#samples)
|
||||||
1. [Automated ML SDK Sample Notebooks](#samples)
|
4. [Documentation](#documentation)
|
||||||
1. [Documentation](#documentation)
|
5. [Running using python command](#pythoncommand)
|
||||||
1. [Running using python command](#pythoncommand)
|
6. [Troubleshooting](#troubleshooting)
|
||||||
1. [Troubleshooting](#troubleshooting)
|
|
||||||
|
|
||||||
<a name="introduction"></a>
|
|
||||||
# Automated ML introduction
|
|
||||||
Automated machine learning (automated ML) builds high quality machine learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, automated ML will give you a high quality machine learning model that you can use for predictions.
|
|
||||||
|
|
||||||
|
# Auto ML Introduction <a name="introduction"></a>
|
||||||
|
AutoML builds high quality Machine Learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, AutoML will give you a high quality machine learning model that you can use for predictions.
|
||||||
|
|
||||||
If you are new to Data Science, AutoML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
|
If you are new to Data Science, AutoML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
|
||||||
|
|
||||||
If you are an experienced data scientist, AutoML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. AutoML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
|
If you are an experienced data scientist, AutoML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. AutoML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
|
||||||
|
|
||||||
<a name="jupyter"></a>
|
|
||||||
## Running samples in Azure Notebooks - Jupyter based notebooks in the Azure cloud
|
|
||||||
|
|
||||||
1. [](https://aka.ms/aml-clone-azure-notebooks)
|
# Running samples in a Local Conda environment <a name="localconda"></a>
|
||||||
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks if they are not already there.
|
|
||||||
1. Create a workspace and its configuration file (**config.json**) using [these instructions](https://aka.ms/aml-how-to-configure-environment).
|
|
||||||
1. Select `+New` in the Azure Notebook toolbar to add your **config.json** file to the imported folder.
|
|
||||||

|
|
||||||
1. Open the notebook.
|
|
||||||
|
|
||||||
**Make sure the Azure Notebook kernal is set to `Python 3.6`** when you open a notebook.
|
You can run these notebooks in Azure Notebooks without any extra installation. To run these notebook on your own notebook server, use these installation instructions.
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
<a name="localconda"></a>
|
|
||||||
## Running samples in a Local Conda environment
|
|
||||||
|
|
||||||
To run these notebook on your own notebook server, use these installation instructions.
|
|
||||||
|
|
||||||
The instructions below will install everything you need and then start a Jupyter notebook. To start your Jupyter notebook manually, use:
|
|
||||||
|
|
||||||
```
|
|
||||||
conda activate azure_automl
|
|
||||||
jupyter notebook
|
|
||||||
```
|
|
||||||
|
|
||||||
or on Mac:
|
|
||||||
|
|
||||||
```
|
|
||||||
source activate azure_automl
|
|
||||||
jupyter notebook
|
|
||||||
```
|
|
||||||
|
|
||||||
|
It is best if you create a new conda environment locally to try this SDK, so it doesn't mess up with your existing Python environment.
|
||||||
|
|
||||||
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose Python 3.7 or higher.
|
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose Python 3.7 or higher.
|
||||||
- **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
|
- **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
|
||||||
@@ -78,19 +48,19 @@ bash automl_setup_mac.sh
|
|||||||
cd to the **automl** folder where the sample notebooks were extracted and then run:
|
cd to the **automl** folder where the sample notebooks were extracted and then run:
|
||||||
|
|
||||||
```
|
```
|
||||||
bash automl_setup_linux.sh
|
automl_setup_linux.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Running configuration.ipynb
|
### 4. Running configuration.ipynb
|
||||||
- Before running any samples you next need to run the configuration notebook. Click on 00.configuration.ipynb notebook
|
- Before running any samples you next need to run the configuration notebook. Click on 00.configuration.ipynb notebook
|
||||||
|
- Please make sure you use the Python [conda env:azure_automl] kernel when running this notebook.
|
||||||
- Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
|
- Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
|
||||||
|
|
||||||
### 5. Running Samples
|
### 5. Running Samples
|
||||||
- Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks.
|
- Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks.
|
||||||
- Follow the instructions in the individual notebooks to explore various features in AutoML
|
- Follow the instructions in the individual notebooks to explore various features in AutoML
|
||||||
|
|
||||||
<a name="samples"></a>
|
# Auto ML SDK Sample Notebooks <a name="samples"></a>
|
||||||
# Automated ML SDK Sample Notebooks
|
|
||||||
- [00.configuration.ipynb](00.configuration.ipynb)
|
- [00.configuration.ipynb](00.configuration.ipynb)
|
||||||
- Register Machine Learning Services Resource Provider
|
- Register Machine Learning Services Resource Provider
|
||||||
- Create new Azure ML Workspace
|
- Create new Azure ML Workspace
|
||||||
@@ -117,7 +87,7 @@ bash automl_setup_linux.sh
|
|||||||
|
|
||||||
- [03b.auto-ml-remote-batchai.ipynb](03b.auto-ml-remote-batchai.ipynb)
|
- [03b.auto-ml-remote-batchai.ipynb](03b.auto-ml-remote-batchai.ipynb)
|
||||||
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
|
||||||
- Example of using automated ML for classification using a remote Batch AI compute for training
|
- Example of using Auto ML for classification using a remote Batch AI compute for training
|
||||||
- Parallel execution of iterations
|
- Parallel execution of iterations
|
||||||
- Async tracking of progress
|
- Async tracking of progress
|
||||||
- Cancelling individual iterations or entire run
|
- Cancelling individual iterations or entire run
|
||||||
@@ -173,17 +143,20 @@ bash automl_setup_linux.sh
|
|||||||
- [13.auto-ml-dataprep.ipynb](13.auto-ml-dataprep.ipynb)
|
- [13.auto-ml-dataprep.ipynb](13.auto-ml-dataprep.ipynb)
|
||||||
- Using DataPrep for reading data
|
- Using DataPrep for reading data
|
||||||
|
|
||||||
<a name="documentation"></a>
|
- [14a.auto-ml-classification-ensemble.ipynb](14a.auto-ml-classification-ensemble.ipynb)
|
||||||
# Documentation
|
- Classification with ensembling
|
||||||
|
|
||||||
|
- [14b.auto-ml-regression-ensemble.ipynb](14b.auto-ml-regression-ensemble.ipynb)
|
||||||
|
- Regression with ensembling
|
||||||
|
|
||||||
|
# Documentation <a name="documentation"></a>
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
1. [Automated ML Settings ](#automlsettings)
|
1. [Auto ML Settings ](#automlsettings)
|
||||||
1. [Cross validation split options](#cvsplits)
|
2. [Cross validation split options](#cvsplits)
|
||||||
1. [Get Data Syntax](#getdata)
|
3. [Get Data Syntax](#getdata)
|
||||||
1. [Data pre-processing and featurization](#preprocessing)
|
4. [Data pre-processing and featurization](#preprocessing)
|
||||||
|
|
||||||
<a name="automlsettings"></a>
|
|
||||||
## Automated ML Settings
|
|
||||||
|
|
||||||
|
## Auto ML Settings <a name="automlsettings"></a>
|
||||||
|Property|Description|Default|
|
|Property|Description|Default|
|
||||||
|-|-|-|
|
|-|-|-|
|
||||||
|**primary_metric**|This is the metric that you want to optimize.<br><br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i><br><br> Regression supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i><br><i>normalized_root_mean_squared_log_error</i>| Classification: accuracy <br><br> Regression: spearman_correlation
|
|**primary_metric**|This is the metric that you want to optimize.<br><br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i><br><br> Regression supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i><br><i>normalized_root_mean_squared_log_error</i>| Classification: accuracy <br><br> Regression: spearman_correlation
|
||||||
@@ -197,8 +170,7 @@ bash automl_setup_linux.sh
|
|||||||
|**exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|None|
|
|**exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|None|
|
||||||
|**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for Auto ML.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGDClassifierWrapper</i><br><i>NBWrapper</i><br><i>BernoulliNB</i><br><i>SVCWrapper</i><br><i>LinearSVMWrapper</i><br><i>KNeighborsClassifier</i><br><i>DecisionTreeClassifier</i><br><i>RandomForestClassifier</i><br><i>ExtraTreesClassifier</i><br><i>gradient boosting</i><br><i>LightGBMClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoostingRegressor</i><br><i>DecisionTreeRegressor</i><br><i>KNeighborsRegressor</i><br><i>LassoLars</i><br><i>SGDRegressor</i><br><i>RandomForestRegressor</i><br><i>ExtraTreesRegressor</i>|None|
|
|**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for Auto ML.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGDClassifierWrapper</i><br><i>NBWrapper</i><br><i>BernoulliNB</i><br><i>SVCWrapper</i><br><i>LinearSVMWrapper</i><br><i>KNeighborsClassifier</i><br><i>DecisionTreeClassifier</i><br><i>RandomForestClassifier</i><br><i>ExtraTreesClassifier</i><br><i>gradient boosting</i><br><i>LightGBMClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoostingRegressor</i><br><i>DecisionTreeRegressor</i><br><i>KNeighborsRegressor</i><br><i>LassoLars</i><br><i>SGDRegressor</i><br><i>RandomForestRegressor</i><br><i>ExtraTreesRegressor</i>|None|
|
||||||
|
|
||||||
<a name="cvsplits"></a>
|
## Cross validation split options <a name="cvsplits"></a>
|
||||||
## Cross validation split options
|
|
||||||
### K-Folds Cross Validation
|
### K-Folds Cross Validation
|
||||||
Use *n_cross_validations* setting to specify the number of cross validations. The training data set will be randomly split into *n_cross_validations* folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for *n_cross_validations* rounds until each fold is used once as validation set. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set.
|
Use *n_cross_validations* setting to specify the number of cross validations. The training data set will be randomly split into *n_cross_validations* folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for *n_cross_validations* rounds until each fold is used once as validation set. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set.
|
||||||
|
|
||||||
@@ -208,8 +180,7 @@ Use *validation_size* to specify the percentage of the training data set that sh
|
|||||||
### Custom train and validation set
|
### Custom train and validation set
|
||||||
You can specify seperate train and validation set either through the get_data() or directly to the fit method.
|
You can specify seperate train and validation set either through the get_data() or directly to the fit method.
|
||||||
|
|
||||||
<a name="getdata"></a>
|
## get_data() syntax <a name="getdata"></a>
|
||||||
## get_data() syntax
|
|
||||||
The *get_data()* function can be used to return a dictionary with these values:
|
The *get_data()* function can be used to return a dictionary with these values:
|
||||||
|
|
||||||
|Key|Type|Dependency|Mutually Exclusive with|Description|
|
|Key|Type|Dependency|Mutually Exclusive with|Description|
|
||||||
@@ -225,23 +196,21 @@ The *get_data()* function can be used to return a dictionary with these values:
|
|||||||
|columns|Array of strings|data_train||*Optional* Whitelist of columns to use for features|
|
|columns|Array of strings|data_train||*Optional* Whitelist of columns to use for features|
|
||||||
|cv_splits_indices|Array of integers|data_train||*Optional* List of indexes to split the data for cross validation|
|
|cv_splits_indices|Array of integers|data_train||*Optional* List of indexes to split the data for cross validation|
|
||||||
|
|
||||||
<a name="preprocessing"></a>
|
## Data pre-processing and featurization <a name="preprocessing"></a>
|
||||||
## Data pre-processing and featurization
|
If you use "preprocess=True", the following data preprocessing steps are performed automatically for you:
|
||||||
If you use `preprocess=True`, the following data preprocessing steps are performed automatically for you:
|
### 1. Dropping high cardinality or no variance features
|
||||||
|
|
||||||
1. Dropping high cardinality or no variance features
|
|
||||||
- Features with no useful information are dropped from training and validation sets. These include features with all values missing, same value across all rows or with extremely high cardinality (e.g., hashes, IDs or GUIDs).
|
- Features with no useful information are dropped from training and validation sets. These include features with all values missing, same value across all rows or with extremely high cardinality (e.g., hashes, IDs or GUIDs).
|
||||||
2. Missing value imputation
|
### 2. Missing value imputation
|
||||||
- For numerical features, missing values are imputed with average of values in the column.
|
- For numerical features, missing values are imputed with average of values in the column.
|
||||||
- For categorical features, missing values are imputed with most frequent value.
|
- For categorical features, missing values are imputed with most frequent value.
|
||||||
3. Generating additional features
|
### 3. Generating additional features
|
||||||
- For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second.
|
- For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second.
|
||||||
- For Text features: Term frequency based on bi-grams and tri-grams, Count vectorizer.
|
- For Text features: Term frequency based on bi-grams and tri-grams, Count vectorizer.
|
||||||
4. Transformations and encodings
|
### 4. Transformations and encodings
|
||||||
- Numeric features with very few unique values are transformed into categorical features.
|
- Numeric features with very few unique values are transformed into categorical features.
|
||||||
|
- Depending on cardinality of categorical features label encoding or (hashing) one-hot encoding is performed.
|
||||||
|
|
||||||
<a name="pythoncommand"></a>
|
# Running using python command <a name="pythoncommand"></a>
|
||||||
# Running using python command
|
|
||||||
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
|
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
|
||||||
You can then run this file using the python command.
|
You can then run this file using the python command.
|
||||||
However, on Windows the file needs to be modified before it can be run.
|
However, on Windows the file needs to be modified before it can be run.
|
||||||
@@ -251,8 +220,7 @@ The following condition must be added to the main code in the file:
|
|||||||
|
|
||||||
The main code of the file must be indented so that it is under this condition.
|
The main code of the file must be indented so that it is under this condition.
|
||||||
|
|
||||||
<a name="troubleshooting"></a>
|
# Troubleshooting <a name="troubleshooting"></a>
|
||||||
# Troubleshooting
|
|
||||||
## Iterations fail and the log contains "MemoryError"
|
## Iterations fail and the log contains "MemoryError"
|
||||||
This can be caused by insufficient memory on the DSVM. AutoML loads all training data into memory. So, the available memory should be more than the training data size.
|
This can be caused by insufficient memory on the DSVM. AutoML loads all training data into memory. So, the available memory should be more than the training data size.
|
||||||
If you are using a remote DSVM, memory is needed for each concurrent iteration. The concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.
|
If you are using a remote DSVM, memory is needed for each concurrent iteration. The concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.
|
||||||
|
|||||||
@@ -8,13 +8,12 @@ dependencies:
|
|||||||
- numpy>=1.11.0,<1.16.0
|
- numpy>=1.11.0,<1.16.0
|
||||||
- scipy>=0.19.0,<0.20.0
|
- scipy>=0.19.0,<0.20.0
|
||||||
- scikit-learn>=0.18.0,<=0.19.1
|
- scikit-learn>=0.18.0,<=0.19.1
|
||||||
- pandas>=0.19.0,<0.23.0
|
- pandas>=0.22.0,<0.23.0
|
||||||
|
|
||||||
- pip:
|
- pip:
|
||||||
# Required packages for AzureML execution, history, and data preparation.
|
# Required packages for AzureML execution, history, and data preparation.
|
||||||
- --extra-index-url https://pypi.python.org/simple
|
- --extra-index-url https://pypi.python.org/simple
|
||||||
- azureml-sdk[automl]
|
- azureml-sdk[automl]
|
||||||
- azureml-train-widgets
|
- azureml-train-widgets
|
||||||
- azure-cli
|
|
||||||
- pandas_ml
|
- pandas_ml
|
||||||
|
|
||||||
|
|||||||
@@ -6,7 +6,8 @@ IF "%conda_env_name%"=="" SET conda_env_name="azure_automl"
|
|||||||
call conda activate %conda_env_name% 2>nul:
|
call conda activate %conda_env_name% 2>nul:
|
||||||
|
|
||||||
if not errorlevel 1 (
|
if not errorlevel 1 (
|
||||||
call conda env update --file automl_env.yml -n %conda_env_name%
|
echo Upgrading azureml-sdk[automl] in existing conda environment %conda_env_name%
|
||||||
|
call pip install --upgrade azureml-sdk[automl]
|
||||||
if errorlevel 1 goto ErrorExit
|
if errorlevel 1 goto ErrorExit
|
||||||
) else (
|
) else (
|
||||||
call conda env create -f automl_env.yml -n %conda_env_name%
|
call conda env create -f automl_env.yml -n %conda_env_name%
|
||||||
|
|||||||
@@ -9,7 +9,8 @@ fi
|
|||||||
|
|
||||||
if source activate $CONDA_ENV_NAME 2> /dev/null
|
if source activate $CONDA_ENV_NAME 2> /dev/null
|
||||||
then
|
then
|
||||||
conda env update -file automl_env.yml -n $CONDA_ENV_NAME
|
echo "Upgrading azureml-sdk[automl] in existing conda environment" $CONDA_ENV_NAME
|
||||||
|
pip install --upgrade azureml-sdk[automl]
|
||||||
else
|
else
|
||||||
conda env create -f automl_env.yml -n $CONDA_ENV_NAME &&
|
conda env create -f automl_env.yml -n $CONDA_ENV_NAME &&
|
||||||
source activate $CONDA_ENV_NAME &&
|
source activate $CONDA_ENV_NAME &&
|
||||||
|
|||||||
@@ -9,7 +9,8 @@ fi
|
|||||||
|
|
||||||
if source activate $CONDA_ENV_NAME 2> /dev/null
|
if source activate $CONDA_ENV_NAME 2> /dev/null
|
||||||
then
|
then
|
||||||
conda env update -file automl_env.yml -n $CONDA_ENV_NAME
|
echo "Upgrading azureml-sdk[automl] in existing conda environment" $CONDA_ENV_NAME
|
||||||
|
pip install --upgrade azureml-sdk[automl]
|
||||||
else
|
else
|
||||||
conda env create -f automl_env.yml -n $CONDA_ENV_NAME &&
|
conda env create -f automl_env.yml -n $CONDA_ENV_NAME &&
|
||||||
source activate $CONDA_ENV_NAME &&
|
source activate $CONDA_ENV_NAME &&
|
||||||
|
|||||||
Binary file not shown.
@@ -1,9 +1,9 @@
|
|||||||
# Azure Databricks - Azure ML SDK Sample Notebooks
|
# Azure Databricks - Azure Machine Learning SDK Sample Notebooks
|
||||||
|
|
||||||
**NOTE**: With the latest version of our AML SDK, there are some API changes due to which previous version of notebooks will not work.
|
**NOTE**: With the latest version of Azure Machine Learning SDK, there are some API changes due to which previous version of notebooks will not work.
|
||||||
Kindly use this v4 notebooks (updated Sep 18)– if you had installed the AML SDK in your Databricks cluster please update to latest SDK version by installing azureml-sdk[databricks] as a library from GUI.
|
Please remove the previous SDK version and install the latest SDK by installing **azureml-sdk[databricks]** as a PyPi library in Azure Databricks workspace.
|
||||||
|
|
||||||
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown). We are extending it to more runtimes asap.
|
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown).
|
||||||
|
|
||||||
**NOTE**: Some packages like psutil upgrade libs that can cause a conflict, please install such packages by freezing lib version. Eg. "pstuil **cryptography==1.5 pyopenssl==16.0.0 ipython=2.2.0**" to avoid install error. This issue is related to Databricks and not related to AML SDK.
|
**NOTE**: Some packages like psutil upgrade libs that can cause a conflict, please install such packages by freezing lib version. Eg. "pstuil **cryptography==1.5 pyopenssl==16.0.0 ipython=2.2.0**" to avoid install error. This issue is related to Databricks and not related to AML SDK.
|
||||||
|
|
||||||
@@ -11,9 +11,9 @@ Kindly use this v4 notebooks (updated Sep 18)– if you had installed the AML SD
|
|||||||
|
|
||||||
The iPython Notebooks have to be run sequentially after making changes based on your subscription. The corresponding DBC archive contains all the notebooks and can be imported into your Databricks workspace. You can the run notebooks after importing .dbc instead of downloading individually.
|
The iPython Notebooks have to be run sequentially after making changes based on your subscription. The corresponding DBC archive contains all the notebooks and can be imported into your Databricks workspace. You can the run notebooks after importing .dbc instead of downloading individually.
|
||||||
|
|
||||||
This set of notebooks are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. For details on SDK concepts, please refer to [Private preview notebooks](https://github.com/Azure/ViennaDocs/tree/master/PrivatePreview/notebooks)
|
This set of notebooks are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. For details on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks)
|
||||||
|
|
||||||
(Recommended) [Azure Databricks AML SDK notebooks](Databricks_AMLSDK_github.dbc) A single DBC package to import all notebooks in your Databricks workspace.
|
(Recommended) [Azure Databricks AML SDK notebooks](Databricks_AMLSDK_github.dbc) A single DBC package to import all notebooks in your Azure Databricks workspace.
|
||||||
|
|
||||||
01. [Installation and Configuration](01.Installation_and_Configuration.ipynb): Install the Azure ML Python SDK and Initialize an Azure ML Workspace and save the Workspace configuration file.
|
01. [Installation and Configuration](01.Installation_and_Configuration.ipynb): Install the Azure ML Python SDK and Initialize an Azure ML Workspace and save the Workspace configuration file.
|
||||||
02. [Ingest data](02.Ingest_data.ipynb): Download the Adult Census Income dataset and split it into train and test sets.
|
02. [Ingest data](02.Ingest_data.ipynb): Download the Adult Census Income dataset and split it into train and test sets.
|
||||||
@@ -23,4 +23,7 @@ This set of notebooks are related to Income prediction experiment based on this
|
|||||||
06. [Deploy to AKS](04.Deploy_to_AKS_existingImage.ipynb): Deploy model to Azure Kubernetis Service (AKS) with Azure ML Python SDK from an existing Image with model, conda and score file.
|
06. [Deploy to AKS](04.Deploy_to_AKS_existingImage.ipynb): Deploy model to Azure Kubernetis Service (AKS) with Azure ML Python SDK from an existing Image with model, conda and score file.
|
||||||
|
|
||||||
Copyright (c) Microsoft Corporation. All rights reserved.
|
Copyright (c) Microsoft Corporation. All rights reserved.
|
||||||
|
|
||||||
All notebooks in this folder are licensed under the MIT License.
|
All notebooks in this folder are licensed under the MIT License.
|
||||||
|
|
||||||
|
Apache®, Apache Spark, and Spark® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
|
||||||
174
ignore/doc-qa/how-to-deploy-to-aci/how-to-deploy-to-aci.py
Normal file
174
ignore/doc-qa/how-to-deploy-to-aci/how-to-deploy-to-aci.py
Normal file
@@ -0,0 +1,174 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
# coding: utf-8
|
||||||
|
|
||||||
|
import azureml.core
|
||||||
|
print('SDK version' + azureml.core.VERSION)
|
||||||
|
|
||||||
|
# PREREQ: load workspace info
|
||||||
|
# import azureml.core
|
||||||
|
|
||||||
|
# <loadWorkspace>
|
||||||
|
from azureml.core import Workspace
|
||||||
|
ws = Workspace.from_config()
|
||||||
|
# </loadWorkspace>
|
||||||
|
|
||||||
|
scorepy_content = "import json\nimport numpy as np\nimport os\nimport pickle\nfrom sklearn.externals import joblib\nfrom sklearn.linear_model import LogisticRegression\n\nfrom azureml.core.model import Model\n\ndef init():\n global model\n # retreive the path to the model file using the model name\n model_path = Model.get_model_path('sklearn_mnist')\n model = joblib.load(model_path)\n\ndef run(raw_data):\n data = np.array(json.loads(raw_data)['data'])\n # make prediction\n y_hat = model.predict(data)\n return json.dumps(y_hat.tolist())"
|
||||||
|
print(scorepy_content)
|
||||||
|
with open("score.py","w") as f:
|
||||||
|
f.write(scorepy_content)
|
||||||
|
|
||||||
|
|
||||||
|
# PREREQ: create environment file
|
||||||
|
from azureml.core.conda_dependencies import CondaDependencies
|
||||||
|
|
||||||
|
myenv = CondaDependencies()
|
||||||
|
myenv.add_conda_package("scikit-learn")
|
||||||
|
|
||||||
|
with open("myenv.yml","w") as f:
|
||||||
|
f.write(myenv.serialize_to_string())
|
||||||
|
|
||||||
|
#<configImage>
|
||||||
|
from azureml.core.image import ContainerImage
|
||||||
|
|
||||||
|
image_config = ContainerImage.image_configuration(execution_script = "score.py",
|
||||||
|
runtime = "python",
|
||||||
|
conda_file = "myenv.yml",
|
||||||
|
description = "Image with mnist model",
|
||||||
|
tags = {"data": "mnist", "type": "classification"}
|
||||||
|
)
|
||||||
|
#</configImage>
|
||||||
|
|
||||||
|
# <configAci>
|
||||||
|
from azureml.core.webservice import AciWebservice
|
||||||
|
|
||||||
|
aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1,
|
||||||
|
memory_gb = 1,
|
||||||
|
tags = {"data": "mnist", "type": "classification"},
|
||||||
|
description = 'Handwriting recognition')
|
||||||
|
# </configAci>
|
||||||
|
|
||||||
|
#<registerModel>
|
||||||
|
from azureml.core.model import Model
|
||||||
|
|
||||||
|
model_name = "sklearn_mnist"
|
||||||
|
model = Model.register(model_path = "sklearn_mnist_model.pkl",
|
||||||
|
model_name = model_name,
|
||||||
|
tags = {"data": "mnist", "type": "classification"},
|
||||||
|
description = "Mnist handwriting recognition",
|
||||||
|
workspace = ws)
|
||||||
|
#</registerModel>
|
||||||
|
|
||||||
|
# <retrieveModel>
|
||||||
|
from azureml.core.model import Model
|
||||||
|
|
||||||
|
model_name = "sklearn_mnist"
|
||||||
|
model=Model(ws, model_name)
|
||||||
|
# </retrieveModel>
|
||||||
|
|
||||||
|
|
||||||
|
# ## DEPLOY FROM REGISTERED MODEL
|
||||||
|
|
||||||
|
# <option2Deploy>
|
||||||
|
from azureml.core.webservice import Webservice
|
||||||
|
|
||||||
|
service_name = 'aci-mnist-2'
|
||||||
|
service = Webservice.deploy_from_model(deployment_config = aciconfig,
|
||||||
|
image_config = image_config,
|
||||||
|
models = [model], # this is the registered model object
|
||||||
|
name = service_name,
|
||||||
|
workspace = ws)
|
||||||
|
service.wait_for_deployment(show_output = True)
|
||||||
|
print(service.state)
|
||||||
|
# </option2Deploy>
|
||||||
|
|
||||||
|
service.delete()
|
||||||
|
|
||||||
|
# ## DEPLOY FROM IMAGE
|
||||||
|
|
||||||
|
|
||||||
|
# <option3CreateImage>
|
||||||
|
from azureml.core.image import ContainerImage
|
||||||
|
|
||||||
|
image = ContainerImage.create(name = "myimage1",
|
||||||
|
models = [model], # this is the registered model object
|
||||||
|
image_config = image_config,
|
||||||
|
workspace = ws)
|
||||||
|
|
||||||
|
image.wait_for_creation(show_output = True)
|
||||||
|
# </option3CreateImage>
|
||||||
|
|
||||||
|
# <option3Deploy>
|
||||||
|
from azureml.core.webservice import Webservice
|
||||||
|
|
||||||
|
service_name = 'aci-mnist-13'
|
||||||
|
service = Webservice.deploy_from_image(deployment_config = aciconfig,
|
||||||
|
image = image,
|
||||||
|
name = service_name,
|
||||||
|
workspace = ws)
|
||||||
|
service.wait_for_deployment(show_output = True)
|
||||||
|
print(service.state)
|
||||||
|
# </option3Deploy>
|
||||||
|
|
||||||
|
service.delete()
|
||||||
|
|
||||||
|
|
||||||
|
# ## DEPLOY FROM MODEL FILE
|
||||||
|
# First change score.py!
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
scorepy_content = "import json\nimport numpy as np\nimport os\nimport pickle\nfrom sklearn.externals import joblib\nfrom sklearn.linear_model import LogisticRegression\n\nfrom azureml.core.model import Model\n\ndef init():\n global model\n # retreive the path to the model file using the model name\n model_path = Model.get_model_path('sklearn_mnist_model.pkl')\n model = joblib.load(model_path)\n\ndef run(raw_data):\n data = np.array(json.loads(raw_data)['data'])\n # make prediction\n y_hat = model.predict(data)\n return json.dumps(y_hat.tolist())"
|
||||||
|
with open("score.py","w") as f:
|
||||||
|
f.write(scorepy_content)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# <option1Deploy>
|
||||||
|
from azureml.core.webservice import Webservice
|
||||||
|
|
||||||
|
service_name = 'aci-mnist-1'
|
||||||
|
service = Webservice.deploy(deployment_config = aciconfig,
|
||||||
|
image_config = image_config,
|
||||||
|
model_paths = ['sklearn_mnist_model.pkl'],
|
||||||
|
name = service_name,
|
||||||
|
workspace = ws)
|
||||||
|
|
||||||
|
service.wait_for_deployment(show_output = True)
|
||||||
|
print(service.state)
|
||||||
|
# </option1Deploy>
|
||||||
|
|
||||||
|
# <testService>
|
||||||
|
# Load Data
|
||||||
|
import os
|
||||||
|
import urllib
|
||||||
|
|
||||||
|
os.makedirs('./data', exist_ok = True)
|
||||||
|
|
||||||
|
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/test-images.gz')
|
||||||
|
|
||||||
|
from utils import load_data
|
||||||
|
X_test = load_data('./data/test-images.gz', False) / 255.0
|
||||||
|
|
||||||
|
from sklearn import datasets
|
||||||
|
import numpy as np
|
||||||
|
import json
|
||||||
|
|
||||||
|
# find 5 random samples from test set
|
||||||
|
n = 5
|
||||||
|
sample_indices = np.random.permutation(X_test.shape[0])[0:n]
|
||||||
|
|
||||||
|
test_samples = json.dumps({"data": X_test[sample_indices].tolist()})
|
||||||
|
test_samples = bytes(test_samples, encoding = 'utf8')
|
||||||
|
|
||||||
|
# predict using the deployed model
|
||||||
|
prediction = service.run(input_data = test_samples)
|
||||||
|
print(prediction)
|
||||||
|
# </testService>
|
||||||
|
|
||||||
|
# <deleteService>
|
||||||
|
service.delete()
|
||||||
|
# </deleteService>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
BIN
ignore/doc-qa/how-to-deploy-to-aci/sklearn_mnist_model.pkl
Normal file
BIN
ignore/doc-qa/how-to-deploy-to-aci/sklearn_mnist_model.pkl
Normal file
Binary file not shown.
27
ignore/doc-qa/how-to-deploy-to-aci/utils.py
Normal file
27
ignore/doc-qa/how-to-deploy-to-aci/utils.py
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||||
|
# Licensed under the MIT License.
|
||||||
|
|
||||||
|
import gzip
|
||||||
|
import numpy as np
|
||||||
|
import struct
|
||||||
|
|
||||||
|
|
||||||
|
# load compressed MNIST gz files and return numpy arrays
|
||||||
|
def load_data(filename, label=False):
|
||||||
|
with gzip.open(filename) as gz:
|
||||||
|
struct.unpack('I', gz.read(4))
|
||||||
|
n_items = struct.unpack('>I', gz.read(4))
|
||||||
|
if not label:
|
||||||
|
n_rows = struct.unpack('>I', gz.read(4))[0]
|
||||||
|
n_cols = struct.unpack('>I', gz.read(4))[0]
|
||||||
|
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
|
||||||
|
res = res.reshape(n_items[0], n_rows * n_cols)
|
||||||
|
else:
|
||||||
|
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
|
||||||
|
res = res.reshape(n_items[0], 1)
|
||||||
|
return res
|
||||||
|
|
||||||
|
|
||||||
|
# one-hot encode a 1-D array
|
||||||
|
def one_hot_encode(array, num_of_classes):
|
||||||
|
return np.eye(num_of_classes)[array.reshape(-1)]
|
||||||
39
ignore/doc-qa/how-to-set-up-training-targets/Local.py
Normal file
39
ignore/doc-qa/how-to-set-up-training-targets/Local.py
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
# Code for Local computer and Submit training run sections
|
||||||
|
|
||||||
|
# Check core SDK version number
|
||||||
|
import azureml.core
|
||||||
|
|
||||||
|
print("SDK version:", azureml.core.VERSION)
|
||||||
|
|
||||||
|
#<run_local>
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
|
||||||
|
# Edit a run configuration property on the fly.
|
||||||
|
run_local = RunConfiguration()
|
||||||
|
|
||||||
|
run_local.environment.python.user_managed_dependencies = True
|
||||||
|
#</run_local>
|
||||||
|
|
||||||
|
from azureml.core import Workspace
|
||||||
|
ws = Workspace.from_config()
|
||||||
|
|
||||||
|
|
||||||
|
# Set up an experiment
|
||||||
|
# <experiment>
|
||||||
|
from azureml.core import Experiment
|
||||||
|
experiment_name = 'my_experiment'
|
||||||
|
|
||||||
|
exp = Experiment(workspace=ws, name=experiment_name)
|
||||||
|
# </experiment>
|
||||||
|
|
||||||
|
# Submit the experiment using the run configuration
|
||||||
|
#<local_submit>
|
||||||
|
from azureml.core import ScriptRunConfig
|
||||||
|
import os
|
||||||
|
|
||||||
|
script_folder = os.getcwd()
|
||||||
|
src = ScriptRunConfig(source_directory = script_folder, script = 'train.py', run_config = run_local)
|
||||||
|
run = exp.submit(src)
|
||||||
|
run.wait_for_completion(show_output = True)
|
||||||
|
#</local_submit>
|
||||||
|
|
||||||
48
ignore/doc-qa/how-to-set-up-training-targets/amlcompute.py
Normal file
48
ignore/doc-qa/how-to-set-up-training-targets/amlcompute.py
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
# Code for Azure Machine Learning Compute - Run-based creation
|
||||||
|
|
||||||
|
# Check core SDK version number
|
||||||
|
import azureml.core
|
||||||
|
|
||||||
|
print("SDK version:", azureml.core.VERSION)
|
||||||
|
|
||||||
|
|
||||||
|
from azureml.core import Workspace
|
||||||
|
ws = Workspace.from_config()
|
||||||
|
|
||||||
|
|
||||||
|
# Set up an experiment
|
||||||
|
from azureml.core import Experiment
|
||||||
|
experiment_name = 'my-experiment'
|
||||||
|
script_folder= "./"
|
||||||
|
|
||||||
|
exp = Experiment(workspace=ws, name=experiment_name)
|
||||||
|
|
||||||
|
|
||||||
|
#<run_temp_compute>
|
||||||
|
from azureml.core.compute import ComputeTarget, AmlCompute
|
||||||
|
|
||||||
|
# First, list the supported VM families for Azure Machine Learning Compute
|
||||||
|
print(AmlCompute.supported_vmsizes(workspace=ws))
|
||||||
|
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
# Create a new runconfig object
|
||||||
|
run_temp_compute = RunConfiguration()
|
||||||
|
|
||||||
|
# Signal that you want to use AmlCompute to execute the script
|
||||||
|
run_temp_compute.target = "amlcompute"
|
||||||
|
|
||||||
|
# AmlCompute is created in the same region as your workspace
|
||||||
|
# Set the VM size for AmlCompute from the list of supported_vmsizes
|
||||||
|
run_temp_compute.amlcompute.vm_size = 'STANDARD_D2_V2'
|
||||||
|
#</run_temp_compute>
|
||||||
|
|
||||||
|
|
||||||
|
# Submit the experiment using the run configuration
|
||||||
|
from azureml.core import ScriptRunConfig
|
||||||
|
|
||||||
|
src = ScriptRunConfig(source_directory = script_folder, script = 'train.py', run_config = run_temp_compute)
|
||||||
|
run = exp.submit(src)
|
||||||
|
run.wait_for_completion(show_output = True)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
70
ignore/doc-qa/how-to-set-up-training-targets/amlcompute2.py
Normal file
70
ignore/doc-qa/how-to-set-up-training-targets/amlcompute2.py
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
# Code for Azure Machine Learning Compute - Persistent compute
|
||||||
|
|
||||||
|
# Check core SDK version number
|
||||||
|
import azureml.core
|
||||||
|
|
||||||
|
print("SDK version:", azureml.core.VERSION)
|
||||||
|
|
||||||
|
from azureml.core import Workspace
|
||||||
|
ws = Workspace.from_config()
|
||||||
|
|
||||||
|
|
||||||
|
# Set up an experiment
|
||||||
|
from azureml.core import Experiment
|
||||||
|
experiment_name = 'my-experiment'
|
||||||
|
script_folder= "./"
|
||||||
|
|
||||||
|
exp = Experiment(workspace=ws, name=experiment_name)
|
||||||
|
|
||||||
|
#<cpu_cluster>
|
||||||
|
from azureml.core.compute import ComputeTarget, AmlCompute
|
||||||
|
from azureml.core.compute_target import ComputeTargetException
|
||||||
|
|
||||||
|
# Choose a name for your CPU cluster
|
||||||
|
cpu_cluster_name = "cpucluster"
|
||||||
|
|
||||||
|
# Verify that cluster does not exist already
|
||||||
|
try:
|
||||||
|
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
|
||||||
|
print('Found existing cluster, use it.')
|
||||||
|
except ComputeTargetException:
|
||||||
|
# To use a different region for the compute, add a location='<region>' parameter
|
||||||
|
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
|
||||||
|
max_nodes=4)
|
||||||
|
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
|
||||||
|
|
||||||
|
cpu_cluster.wait_for_completion(show_output=True)
|
||||||
|
#</cpu_cluster>
|
||||||
|
|
||||||
|
#<run_amlcompute>
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
from azureml.core.conda_dependencies import CondaDependencies
|
||||||
|
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
|
||||||
|
|
||||||
|
# Create a new runconfig object
|
||||||
|
run_amlcompute = RunConfiguration()
|
||||||
|
|
||||||
|
# Use the cpu_cluster you created above.
|
||||||
|
run_amlcompute.target = cpu_cluster
|
||||||
|
|
||||||
|
# Enable Docker
|
||||||
|
run_amlcompute.environment.docker.enabled = True
|
||||||
|
|
||||||
|
# Set Docker base image to the default CPU-based image
|
||||||
|
run_amlcompute.environment.docker.base_image = DEFAULT_CPU_IMAGE
|
||||||
|
|
||||||
|
# Use conda_dependencies.yml to create a conda environment in the Docker image for execution
|
||||||
|
run_amlcompute.environment.python.user_managed_dependencies = False
|
||||||
|
|
||||||
|
# Specify CondaDependencies obj, add necessary packages
|
||||||
|
run_amlcompute.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])
|
||||||
|
#</run_amlcompute>
|
||||||
|
|
||||||
|
# Submit the experiment using the run configuration
|
||||||
|
#<amlcompute_submit>
|
||||||
|
from azureml.core import ScriptRunConfig
|
||||||
|
|
||||||
|
src = ScriptRunConfig(source_directory = script_folder, script = 'train.py', run_config = run_amlcompute)
|
||||||
|
run = exp.submit(src)
|
||||||
|
run.wait_for_completion(show_output = True)
|
||||||
|
#</amlcompute_submit>
|
||||||
26
ignore/doc-qa/how-to-set-up-training-targets/dsvm.py
Normal file
26
ignore/doc-qa/how-to-set-up-training-targets/dsvm.py
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
# Code for Remote virtual machines
|
||||||
|
|
||||||
|
compute_target_name = "sheri-linuxvm"
|
||||||
|
|
||||||
|
#<run_dsvm>
|
||||||
|
import azureml.core
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
from azureml.core.conda_dependencies import CondaDependencies
|
||||||
|
|
||||||
|
run_dsvm = RunConfiguration(framework = "python")
|
||||||
|
|
||||||
|
# Set the compute target to the Linux DSVM
|
||||||
|
run_dsvm.target = compute_target_name
|
||||||
|
|
||||||
|
# Use Docker in the remote VM
|
||||||
|
run_dsvm.environment.docker.enabled = True
|
||||||
|
|
||||||
|
# Use the CPU base image
|
||||||
|
# To use GPU in DSVM, you must also use the GPU base Docker image "azureml.core.runconfig.DEFAULT_GPU_IMAGE"
|
||||||
|
run_dsvm.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE
|
||||||
|
print('Base Docker image is:', run_dsvm.environment.docker.base_image)
|
||||||
|
|
||||||
|
# Specify the CondaDependencies object
|
||||||
|
run_dsvm.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])
|
||||||
|
#</run_dsvm>
|
||||||
|
print(run_dsvm)
|
||||||
27
ignore/doc-qa/how-to-set-up-training-targets/hdi.py
Normal file
27
ignore/doc-qa/how-to-set-up-training-targets/hdi.py
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
|
||||||
|
from azureml.core import Workspace
|
||||||
|
ws = Workspace.from_config()
|
||||||
|
|
||||||
|
from azureml.core.compute import ComputeTarget
|
||||||
|
|
||||||
|
# refers to an existing compute resource attached to the workspace!
|
||||||
|
hdi_compute = ComputeTarget(workspace=ws, name='sherihdi')
|
||||||
|
|
||||||
|
|
||||||
|
#<run_hdi>
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
from azureml.core.conda_dependencies import CondaDependencies
|
||||||
|
|
||||||
|
|
||||||
|
# use pyspark framework
|
||||||
|
run_hdi = RunConfiguration(framework="pyspark")
|
||||||
|
|
||||||
|
# Set compute target to the HDI cluster
|
||||||
|
run_hdi.target = hdi_compute.name
|
||||||
|
|
||||||
|
# specify CondaDependencies object to ask system installing numpy
|
||||||
|
cd = CondaDependencies()
|
||||||
|
cd.add_conda_package('numpy')
|
||||||
|
run_hdi.environment.python.conda_dependencies = cd
|
||||||
|
#</run_hdi>
|
||||||
|
print(run_hdi)
|
||||||
9
ignore/doc-qa/how-to-set-up-training-targets/mylib.py
Normal file
9
ignore/doc-qa/how-to-set-up-training-targets/mylib.py
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
|
def get_alphas():
|
||||||
|
# list of numbers from 0.0 to 1.0 with a 0.05 interval
|
||||||
|
return np.arange(0.0, 1.0, 0.05)
|
||||||
52
ignore/doc-qa/how-to-set-up-training-targets/remote.py
Normal file
52
ignore/doc-qa/how-to-set-up-training-targets/remote.py
Normal file
@@ -0,0 +1,52 @@
|
|||||||
|
# Code for Remote virtual machines
|
||||||
|
|
||||||
|
compute_target_name = "attach-dsvm"
|
||||||
|
|
||||||
|
#<run_dsvm>
|
||||||
|
import azureml.core
|
||||||
|
from azureml.core.runconfig import RunConfiguration, DEFAULT_CPU_IMAGE
|
||||||
|
from azureml.core.conda_dependencies import CondaDependencies
|
||||||
|
|
||||||
|
run_dsvm = RunConfiguration(framework = "python")
|
||||||
|
|
||||||
|
# Set the compute target to the Linux DSVM
|
||||||
|
run_dsvm.target = compute_target_name
|
||||||
|
|
||||||
|
# Use Docker in the remote VM
|
||||||
|
run_dsvm.environment.docker.enabled = True
|
||||||
|
|
||||||
|
# Use the CPU base image
|
||||||
|
# To use GPU in DSVM, you must also use the GPU base Docker image "azureml.core.runconfig.DEFAULT_GPU_IMAGE"
|
||||||
|
run_dsvm.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE
|
||||||
|
print('Base Docker image is:', run_dsvm.environment.docker.base_image)
|
||||||
|
|
||||||
|
# Prepare the Docker and conda environment automatically when they're used for the first time
|
||||||
|
run_dsvm.prepare_environment = True
|
||||||
|
|
||||||
|
# Specify the CondaDependencies object
|
||||||
|
run_dsvm.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])
|
||||||
|
#</run_dsvm>
|
||||||
|
hdi_compute.name = "blah"
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
from azureml.core.conda_dependencies import CondaDependencies
|
||||||
|
|
||||||
|
|
||||||
|
# use pyspark framework
|
||||||
|
hdi_run_config = RunConfiguration(framework="pyspark")
|
||||||
|
|
||||||
|
# Set compute target to the HDI cluster
|
||||||
|
hdi_run_config.target = hdi_compute.name
|
||||||
|
|
||||||
|
# specify CondaDependencies object to ask system installing numpy
|
||||||
|
cd = CondaDependencies()
|
||||||
|
cd.add_conda_package('numpy')
|
||||||
|
hdi_run_config.environment.python.conda_dependencies = cd
|
||||||
|
|
||||||
|
#<run_hdi>
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
# Configure the HDInsight run
|
||||||
|
# Load the runconfig object from the myhdi.runconfig file generated in the previous attach operation
|
||||||
|
run_hdi = RunConfiguration.load(project_object = project, run_name = 'myhdi')
|
||||||
|
|
||||||
|
# Ask the system to prepare the conda environment automatically when it's used for the first time
|
||||||
|
run_hdi.auto_prepare_environment = True>
|
||||||
25
ignore/doc-qa/how-to-set-up-training-targets/runconfig.py
Normal file
25
ignore/doc-qa/how-to-set-up-training-targets/runconfig.py
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
# Code for What's a run configuration
|
||||||
|
|
||||||
|
# <run_system_managed>
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
from azureml.core.conda_dependencies import CondaDependencies
|
||||||
|
|
||||||
|
run_system_managed = RunConfiguration()
|
||||||
|
|
||||||
|
# Specify the conda dependencies with scikit-learn
|
||||||
|
run_system_managed.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])
|
||||||
|
# </run_system_managed>
|
||||||
|
print(run_system_managed)
|
||||||
|
|
||||||
|
|
||||||
|
# <run_user_managed>
|
||||||
|
from azureml.core.runconfig import RunConfiguration
|
||||||
|
|
||||||
|
run_user_managed = RunConfiguration()
|
||||||
|
run_user_managed.environment.python.user_managed_dependencies = True
|
||||||
|
|
||||||
|
# Choose a specific Python environment by pointing to a Python path. For example:
|
||||||
|
# run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'
|
||||||
|
# </run_user_managed>
|
||||||
|
print(run_user_managed)
|
||||||
|
|
||||||
45
ignore/doc-qa/how-to-set-up-training-targets/train.py
Normal file
45
ignore/doc-qa/how-to-set-up-training-targets/train.py
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
from sklearn.datasets import load_diabetes
|
||||||
|
from sklearn.linear_model import Ridge
|
||||||
|
from sklearn.metrics import mean_squared_error
|
||||||
|
from sklearn.model_selection import train_test_split
|
||||||
|
from azureml.core.run import Run
|
||||||
|
from sklearn.externals import joblib
|
||||||
|
import os
|
||||||
|
import numpy as np
|
||||||
|
import mylib
|
||||||
|
|
||||||
|
os.makedirs('./outputs', exist_ok=True)
|
||||||
|
|
||||||
|
X, y = load_diabetes(return_X_y=True)
|
||||||
|
|
||||||
|
run = Run.get_context()
|
||||||
|
|
||||||
|
X_train, X_test, y_train, y_test = train_test_split(X, y,
|
||||||
|
test_size=0.2,
|
||||||
|
random_state=0)
|
||||||
|
data = {"train": {"X": X_train, "y": y_train},
|
||||||
|
"test": {"X": X_test, "y": y_test}}
|
||||||
|
|
||||||
|
# list of numbers from 0.0 to 1.0 with a 0.05 interval
|
||||||
|
alphas = mylib.get_alphas()
|
||||||
|
|
||||||
|
for alpha in alphas:
|
||||||
|
# Use Ridge algorithm to create a regression model
|
||||||
|
reg = Ridge(alpha=alpha)
|
||||||
|
reg.fit(data["train"]["X"], data["train"]["y"])
|
||||||
|
|
||||||
|
preds = reg.predict(data["test"]["X"])
|
||||||
|
mse = mean_squared_error(preds, data["test"]["y"])
|
||||||
|
run.log('alpha', alpha)
|
||||||
|
run.log('mse', mse)
|
||||||
|
|
||||||
|
model_file_name = 'ridge_{0:.2f}.pkl'.format(alpha)
|
||||||
|
# save model in the outputs folder so it automatically get uploaded
|
||||||
|
with open(model_file_name, "wb") as file:
|
||||||
|
joblib.dump(value=reg, filename=os.path.join('./outputs/',
|
||||||
|
model_file_name))
|
||||||
|
|
||||||
|
print('alpha is {0:.2f}, and mse is {1:0.2f}'.format(alpha, mse))
|
||||||
@@ -0,0 +1,55 @@
|
|||||||
|
# code snippets for the quickstart-create-workspace-with-python article
|
||||||
|
# <import>
|
||||||
|
import azureml.core
|
||||||
|
print(azureml.core.VERSION)
|
||||||
|
# </import>
|
||||||
|
|
||||||
|
# this is NOT a snippet. If this code changes, go fix it in the article!
|
||||||
|
from azureml.core import Workspace
|
||||||
|
ws = Workspace.create(name='myworkspace',
|
||||||
|
subscription_id='<subscription-id>',
|
||||||
|
resource_group='myresourcegroup',
|
||||||
|
create_resource_group=True,
|
||||||
|
location='eastus2' # or other supported Azure region
|
||||||
|
)
|
||||||
|
|
||||||
|
# <getDetails>
|
||||||
|
ws.get_details()
|
||||||
|
# </getDetails>
|
||||||
|
|
||||||
|
# <writeConfig>
|
||||||
|
# Create the configuration file.
|
||||||
|
ws.write_config()
|
||||||
|
|
||||||
|
# Use this code to load the workspace from
|
||||||
|
# other scripts and notebooks in this directory.
|
||||||
|
# ws = Workspace.from_config()
|
||||||
|
# </writeConfig>
|
||||||
|
|
||||||
|
# <useWs>
|
||||||
|
from azureml.core import Experiment
|
||||||
|
|
||||||
|
# Create a new experiment in your workspace.
|
||||||
|
exp = Experiment(workspace=ws, name='myexp')
|
||||||
|
|
||||||
|
# Start a run and start the logging service.
|
||||||
|
run = exp.start_logging()
|
||||||
|
|
||||||
|
# Log a single number.
|
||||||
|
run.log('my magic number', 42)
|
||||||
|
|
||||||
|
# Log a list (Fibonacci numbers).
|
||||||
|
run.log_list('my list', [1, 1, 2, 3, 5, 8, 13, 21, 34, 55])
|
||||||
|
|
||||||
|
# Finish the run.
|
||||||
|
run.complete()
|
||||||
|
# </useWs>
|
||||||
|
|
||||||
|
# <viewLog>
|
||||||
|
print(run.get_portal_url())
|
||||||
|
# </viewLog>
|
||||||
|
|
||||||
|
|
||||||
|
# <delete>
|
||||||
|
ws.delete(delete_dependent_resources=True)
|
||||||
|
# </delete>
|
||||||
67
ignore/doc-qa/testnotebook.ipynb
Normal file
67
ignore/doc-qa/testnotebook.ipynb
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Testing notebook include"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"metadata": {
|
||||||
|
"name": "import"
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Azure ML SDK Version: 1.0.83\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"%matplotlib inline\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"# check core SDK version number\n",
|
||||||
|
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"celltoolbar": "Edit Metadata",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6 - AzureML",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3-azureml"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.9"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 19 KiB |
27
onnx/README.md
Normal file
27
onnx/README.md
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
# ONNX on Azure Machine Learning
|
||||||
|
|
||||||
|
These tutorials show how to create and deploy [ONNX](http://onnx.ai) models using Azure Machine Learning and the [ONNX Runtime](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx).
|
||||||
|
Once deployed as web services, you can ping the models with your own images to be analyzed!
|
||||||
|
|
||||||
|
## Tutorials
|
||||||
|
- [Obtain ONNX model from ONNX Model Zoo and deploy - ResNet50](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb)
|
||||||
|
- [Convert ONNX model from CoreML and deploy - TinyYOLO](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb)
|
||||||
|
- [Train ONNX model in PyTorch and deploy - MNIST](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb)
|
||||||
|
- [Handwritten Digit Classification (MNIST) using ONNX Runtime on AzureML](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-inference-mnist.ipynb)
|
||||||
|
- [Facial Expression Recognition using ONNX Runtime on AzureML](https://github.com/Azure/MachineLearningNotebooks/blob/master/onnx/onnx-inference-emotion-recognition.ipynb)
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
- [ONNX Runtime Python API Documentation](http://aka.ms/onnxruntime-python)
|
||||||
|
- [Azure Machine Learning API Documentation](http://aka.ms/aml-docs)
|
||||||
|
|
||||||
|
## Related Articles
|
||||||
|
- [Building and Deploying ONNX Runtime Models](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx)
|
||||||
|
- [Azure AI – Making AI Real for Business](https://aka.ms/aml-blog-overview)
|
||||||
|
- [What’s new in Azure Machine Learning](https://aka.ms/aml-blog-whats-new)
|
||||||
|
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
Copyright (c) Microsoft Corporation. All rights reserved.
|
||||||
|
Licensed under the MIT License.
|
||||||
|
|
||||||
124
onnx/mnist.py
Normal file
124
onnx/mnist.py
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
# This is a modified version of https://github.com/pytorch/examples/blob/master/mnist/main.py which is
|
||||||
|
# licensed under BSD 3-Clause (https://github.com/pytorch/examples/blob/master/LICENSE)
|
||||||
|
|
||||||
|
from __future__ import print_function
|
||||||
|
import argparse
|
||||||
|
import torch
|
||||||
|
import torch.nn as nn
|
||||||
|
import torch.nn.functional as F
|
||||||
|
import torch.optim as optim
|
||||||
|
from torchvision import datasets, transforms
|
||||||
|
import os
|
||||||
|
|
||||||
|
|
||||||
|
class Net(nn.Module):
|
||||||
|
def __init__(self):
|
||||||
|
super(Net, self).__init__()
|
||||||
|
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
|
||||||
|
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
|
||||||
|
self.conv2_drop = nn.Dropout2d()
|
||||||
|
self.fc1 = nn.Linear(320, 50)
|
||||||
|
self.fc2 = nn.Linear(50, 10)
|
||||||
|
|
||||||
|
def forward(self, x):
|
||||||
|
x = F.relu(F.max_pool2d(self.conv1(x), 2))
|
||||||
|
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
|
||||||
|
x = x.view(-1, 320)
|
||||||
|
x = F.relu(self.fc1(x))
|
||||||
|
x = F.dropout(x, training=self.training)
|
||||||
|
x = self.fc2(x)
|
||||||
|
return F.log_softmax(x, dim=1)
|
||||||
|
|
||||||
|
|
||||||
|
def train(args, model, device, train_loader, optimizer, epoch, output_dir):
|
||||||
|
model.train()
|
||||||
|
for batch_idx, (data, target) in enumerate(train_loader):
|
||||||
|
data, target = data.to(device), target.to(device)
|
||||||
|
optimizer.zero_grad()
|
||||||
|
output = model(data)
|
||||||
|
loss = F.nll_loss(output, target)
|
||||||
|
loss.backward()
|
||||||
|
optimizer.step()
|
||||||
|
if batch_idx % args.log_interval == 0:
|
||||||
|
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
|
||||||
|
epoch, batch_idx * len(data), len(train_loader.dataset),
|
||||||
|
100. * batch_idx / len(train_loader), loss.item()))
|
||||||
|
|
||||||
|
|
||||||
|
def test(args, model, device, test_loader):
|
||||||
|
model.eval()
|
||||||
|
test_loss = 0
|
||||||
|
correct = 0
|
||||||
|
with torch.no_grad():
|
||||||
|
for data, target in test_loader:
|
||||||
|
data, target = data.to(device), target.to(device)
|
||||||
|
output = model(data)
|
||||||
|
test_loss += F.nll_loss(output, target, size_average=False, reduce=True).item() # sum up batch loss
|
||||||
|
pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
|
||||||
|
correct += pred.eq(target.view_as(pred)).sum().item()
|
||||||
|
|
||||||
|
test_loss /= len(test_loader.dataset)
|
||||||
|
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
|
||||||
|
test_loss, correct, len(test_loader.dataset),
|
||||||
|
100. * correct / len(test_loader.dataset)))
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
# Training settings
|
||||||
|
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
|
||||||
|
parser.add_argument('--batch-size', type=int, default=64, metavar='N',
|
||||||
|
help='input batch size for training (default: 64)')
|
||||||
|
parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
|
||||||
|
help='input batch size for testing (default: 1000)')
|
||||||
|
parser.add_argument('--epochs', type=int, default=10, metavar='N',
|
||||||
|
help='number of epochs to train (default: 10)')
|
||||||
|
parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
|
||||||
|
help='learning rate (default: 0.01)')
|
||||||
|
parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
|
||||||
|
help='SGD momentum (default: 0.5)')
|
||||||
|
parser.add_argument('--no-cuda', action='store_true', default=False,
|
||||||
|
help='disables CUDA training')
|
||||||
|
parser.add_argument('--seed', type=int, default=1, metavar='S',
|
||||||
|
help='random seed (default: 1)')
|
||||||
|
parser.add_argument('--log-interval', type=int, default=10, metavar='N',
|
||||||
|
help='how many batches to wait before logging training status')
|
||||||
|
parser.add_argument('--output-dir', type=str, default='outputs')
|
||||||
|
args = parser.parse_args()
|
||||||
|
use_cuda = not args.no_cuda and torch.cuda.is_available()
|
||||||
|
|
||||||
|
torch.manual_seed(args.seed)
|
||||||
|
|
||||||
|
device = torch.device("cuda" if use_cuda else "cpu")
|
||||||
|
|
||||||
|
output_dir = args.output_dir
|
||||||
|
os.makedirs(output_dir, exist_ok=True)
|
||||||
|
|
||||||
|
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
|
||||||
|
train_loader = torch.utils.data.DataLoader(
|
||||||
|
datasets.MNIST('data', train=True, download=True,
|
||||||
|
transform=transforms.Compose([transforms.ToTensor(),
|
||||||
|
transforms.Normalize((0.1307,), (0.3081,))])
|
||||||
|
),
|
||||||
|
batch_size=args.batch_size, shuffle=True, **kwargs)
|
||||||
|
test_loader = torch.utils.data.DataLoader(
|
||||||
|
datasets.MNIST('data', train=False,
|
||||||
|
transform=transforms.Compose([transforms.ToTensor(),
|
||||||
|
transforms.Normalize((0.1307,), (0.3081,))])
|
||||||
|
),
|
||||||
|
batch_size=args.test_batch_size, shuffle=True, **kwargs)
|
||||||
|
|
||||||
|
model = Net().to(device)
|
||||||
|
optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)
|
||||||
|
|
||||||
|
for epoch in range(1, args.epochs + 1):
|
||||||
|
train(args, model, device, train_loader, optimizer, epoch, output_dir)
|
||||||
|
test(args, model, device, test_loader)
|
||||||
|
|
||||||
|
# save model
|
||||||
|
dummy_input = torch.randn(1, 1, 28, 28, device=device)
|
||||||
|
model_path = os.path.join(output_dir, 'mnist.onnx')
|
||||||
|
torch.onnx.export(model, dummy_input, model_path)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
431
onnx/onnx-convert-aml-deploy-tinyyolo.ipynb
Normal file
431
onnx/onnx-convert-aml-deploy-tinyyolo.ipynb
Normal file
@@ -0,0 +1,431 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# YOLO Real-time Object Detection using ONNX on AzureML\n",
|
||||||
|
"\n",
|
||||||
|
"This example shows how to convert the TinyYOLO model from CoreML to ONNX and operationalize it as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
|
||||||
|
"\n",
|
||||||
|
"## What is ONNX\n",
|
||||||
|
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
|
||||||
|
"\n",
|
||||||
|
"## YOLO Details\n",
|
||||||
|
"You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. For more information about YOLO, please visit the [YOLO website](https://pjreddie.com/darknet/yolo/)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prerequisites\n",
|
||||||
|
"\n",
|
||||||
|
"To make the best use of your time, make sure you have done the following:\n",
|
||||||
|
"\n",
|
||||||
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
||||||
|
"* Go through the [00.configuration.ipynb](../00.configuration.ipynb) notebook to:\n",
|
||||||
|
" * install the AML SDK\n",
|
||||||
|
" * create a workspace and its configuration file (config.json)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Check core SDK version number\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Install necessary packages\n",
|
||||||
|
"\n",
|
||||||
|
"You'll need to run the following commands to use this tutorial:\n",
|
||||||
|
"\n",
|
||||||
|
"```sh\n",
|
||||||
|
"pip install coremltools\n",
|
||||||
|
"pip install onnxmltools\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Convert model to ONNX\n",
|
||||||
|
"\n",
|
||||||
|
"First we download the CoreML model. We use the CoreML model listed at https://coreml.store/tinyyolo. This may take a few minutes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"!wget https://s3-us-west-2.amazonaws.com/coreml-models/TinyYOLO.mlmodel"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Then we use ONNXMLTools to convert the model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import onnxmltools\n",
|
||||||
|
"import coremltools\n",
|
||||||
|
"\n",
|
||||||
|
"# Load a CoreML model\n",
|
||||||
|
"coreml_model = coremltools.utils.load_spec('TinyYOLO.mlmodel')\n",
|
||||||
|
"\n",
|
||||||
|
"# Convert from CoreML into ONNX\n",
|
||||||
|
"onnx_model = onnxmltools.convert_coreml(coreml_model, 'TinyYOLOv2')\n",
|
||||||
|
"\n",
|
||||||
|
"# Save ONNX model\n",
|
||||||
|
"onnxmltools.utils.save_model(onnx_model, 'tinyyolov2.onnx')\n",
|
||||||
|
"\n",
|
||||||
|
"import os\n",
|
||||||
|
"print(os.path.getsize('tinyyolov2.onnx'))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploying as a web service with Azure ML\n",
|
||||||
|
"\n",
|
||||||
|
"### Load Azure ML workspace\n",
|
||||||
|
"\n",
|
||||||
|
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Registering your model with Azure ML\n",
|
||||||
|
"\n",
|
||||||
|
"Now we upload the model and register it in the workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"model = Model.register(model_path = \"tinyyolov2.onnx\",\n",
|
||||||
|
" model_name = \"tinyyolov2\",\n",
|
||||||
|
" tags = {\"onnx\": \"demo\"},\n",
|
||||||
|
" description = \"TinyYOLO\",\n",
|
||||||
|
" workspace = ws)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Displaying your registered models\n",
|
||||||
|
"\n",
|
||||||
|
"You can optionally list out all the models that you have registered in this workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"models = ws.models()\n",
|
||||||
|
"for m in models:\n",
|
||||||
|
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Write scoring file\n",
|
||||||
|
"\n",
|
||||||
|
"We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import json\n",
|
||||||
|
"import time\n",
|
||||||
|
"import sys\n",
|
||||||
|
"import os\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"import numpy as np # we're going to use numpy to process input and output data\n",
|
||||||
|
"import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global session\n",
|
||||||
|
" model = Model.get_model_path(model_name = 'tinyyolov2')\n",
|
||||||
|
" session = onnxruntime.InferenceSession(model)\n",
|
||||||
|
"\n",
|
||||||
|
"def preprocess(input_data_json):\n",
|
||||||
|
" # convert the JSON data into the tensor input\n",
|
||||||
|
" return np.array(json.loads(input_data_json)['data']).astype('float32')\n",
|
||||||
|
"\n",
|
||||||
|
"def postprocess(result):\n",
|
||||||
|
" return np.array(result).tolist()\n",
|
||||||
|
"\n",
|
||||||
|
"def run(input_data_json):\n",
|
||||||
|
" try:\n",
|
||||||
|
" start = time.time() # start timer\n",
|
||||||
|
" input_data = preprocess(input_data_json)\n",
|
||||||
|
" input_name = session.get_inputs()[0].name # get the id of the first input of the model \n",
|
||||||
|
" result = session.run([], {input_name: input_data})\n",
|
||||||
|
" end = time.time() # stop timer\n",
|
||||||
|
" return {\"result\": postprocess(result),\n",
|
||||||
|
" \"time\": end - start}\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return {\"error\": result}"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create container image\n",
|
||||||
|
"First we create a YAML file that specifies which dependencies we would like to see in our container."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\"])\n",
|
||||||
|
"\n",
|
||||||
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
|
" f.write(myenv.serialize_to_string())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Then we have Azure ML create the container. This step will likely take a few minutes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
|
" runtime = \"python\",\n",
|
||||||
|
" conda_file = \"myenv.yml\",\n",
|
||||||
|
" description = \"TinyYOLO ONNX Demo\",\n",
|
||||||
|
" tags = {\"demo\": \"onnx\"}\n",
|
||||||
|
" )\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"image = ContainerImage.create(name = \"onnxyolo\",\n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"In case you need to debug your code, the next line of code accesses the log file."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(image.image_build_log_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We're all set! Let's get our model chugging.\n",
|
||||||
|
"\n",
|
||||||
|
"### Deploy the container image"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'demo': 'onnx'}, \n",
|
||||||
|
" description = 'web service for TinyYOLO ONNX model')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The following cell will likely take a few minutes to run as well."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"from random import randint\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'onnx-tinyyolo'+str(randint(0,100))\n",
|
||||||
|
"print(\"Service\", aci_service_name)\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"if aci_service.state != 'Healthy':\n",
|
||||||
|
" # run this command for debugging.\n",
|
||||||
|
" print(aci_service.get_logs())\n",
|
||||||
|
" aci_service.delete()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Success!\n",
|
||||||
|
"\n",
|
||||||
|
"If you've made it this far, you've deployed a working web service that does object detection using an ONNX model. You can get the URL for the webservice with the code below."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(aci_service.scoring_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"When you are eventually done using the web service, remember to delete it."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "onnx"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.5.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -12,7 +12,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Facial Expression Recognition using ONNX Runtime on AzureML\n",
|
"# Facial Expression Recognition (Emotion FER+) using ONNX Runtime on Azure ML\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This example shows how to deploy an image classification neural network using the Facial Expression Recognition ([FER](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. This tutorial will show you how to deploy a FER+ model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n",
|
"This example shows how to deploy an image classification neural network using the Facial Expression Recognition ([FER](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)) dataset and Open Neural Network eXchange format ([ONNX](http://aka.ms/onnxdocarticle)) on the Azure Machine Learning platform. This tutorial will show you how to deploy a FER+ model from the [ONNX model zoo](https://github.com/onnx/models), use it to make predictions using ONNX Runtime Inference, and deploy it as a web service in Azure.\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -34,32 +34,54 @@
|
|||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### 1. Install Azure ML SDK and create a new workspace\n",
|
"### 1. Install Azure ML SDK and create a new workspace\n",
|
||||||
"Please follow [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook.\n",
|
"Please follow [Azure ML configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) to set up your environment.\n",
|
||||||
"\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"### 2. Install additional packages needed for this Notebook\n",
|
"### 2. Install additional packages needed for this Notebook\n",
|
||||||
"You need to install the popular plotting library `matplotlib`, the image manipulation library `PIL`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n",
|
"You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```sh\n",
|
"```sh\n",
|
||||||
"(myenv) $ pip install matplotlib onnx Pillow\n",
|
"(myenv) $ pip install matplotlib onnx opencv-python\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"**Debugging tip**: Make sure that to activate your virtual environment (myenv) before you re-launch this notebook using the `jupyter notebook` comand. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n",
|
||||||
|
"\n",
|
||||||
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
|
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"[Download the ONNX Emotion FER+ model and corresponding test data](https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_7/emotion_ferplus.tar.gz) and place them in the same folder as this tutorial notebook. You can unzip the file through the following line of code.\n",
|
"In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# urllib is a built-in Python library to download files from URLs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```sh\n",
|
"# Objective: retrieve the latest version of the ONNX Emotion FER+ model files from the\n",
|
||||||
"(myenv) $ tar xvzf emotion_ferplus.tar.gz\n",
|
"# ONNX Model Zoo and save it in the same folder as this tutorial\n",
|
||||||
"```\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"More information can be found about the ONNX FER+ model on [github](https://github.com/onnx/models/tree/master/emotion_ferplus). For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)."
|
"import urllib.request\n",
|
||||||
|
"\n",
|
||||||
|
"onnx_model_url = \"https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_7/emotion_ferplus.tar.gz\"\n",
|
||||||
|
"\n",
|
||||||
|
"urllib.request.urlretrieve(onnx_model_url, filename=\"emotion_ferplus.tar.gz\")\n",
|
||||||
|
"\n",
|
||||||
|
"# the ! magic command tells our jupyter notebook kernel to run the following line of \n",
|
||||||
|
"# code from the command line instead of the notebook kernel\n",
|
||||||
|
"\n",
|
||||||
|
"# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n",
|
||||||
|
"\n",
|
||||||
|
"!tar xvzf emotion_ferplus.tar.gz"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Load Azure ML workspace\n",
|
"## Deploy a VM with your ONNX model in the Cloud\n",
|
||||||
|
"\n",
|
||||||
|
"### Load Azure ML workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
||||||
]
|
]
|
||||||
@@ -137,8 +159,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"models = ws.models()\n",
|
"models = ws.models()\n",
|
||||||
"for m in models:\n",
|
"for name, m in models.items():\n",
|
||||||
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -147,9 +169,9 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"### ONNX FER+ Model Methodology\n",
|
"### ONNX FER+ Model Methodology\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n",
|
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The original Facial Emotion Recognition (FER) Dataset was released in 2013, but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a better basis for ground truth. \n",
|
"The original Facial Emotion Recognition (FER) Dataset was released in 2013 by Pierre-Luc Carrier and Aaron Courville as part of a [Kaggle Competition](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a more accurate basis for ground truth. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can see the difference of label quality in the sample model input below. The FER labels are the first word below each image, and the FER+ labels are the second word below each image.\n",
|
"You can see the difference of label quality in the sample model input below. The FER labels are the first word below each image, and the FER+ labels are the second word below each image.\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -202,20 +224,18 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy our model on Azure ML"
|
"### Specify our Score and Environment Files"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file.\n",
|
"We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n",
|
||||||
"\n",
|
|
||||||
"You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"### Write Score File\n",
|
"### Write Score File\n",
|
||||||
"\n",
|
"\n",
|
||||||
"A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime GPU inference session to evaluate the data passed in on our function calls."
|
"A score file is what tells our Azure cloud service what to do. After initializing our model using azureml.core.model, we start an ONNX Runtime inference session to evaluate the data passed in on our function calls."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -248,10 +268,13 @@
|
|||||||
" try:\n",
|
" try:\n",
|
||||||
" # load in our data, convert to readable format\n",
|
" # load in our data, convert to readable format\n",
|
||||||
" data = np.array(json.loads(input_data)['data']).astype('float32')\n",
|
" data = np.array(json.loads(input_data)['data']).astype('float32')\n",
|
||||||
|
" \n",
|
||||||
" start = time.time()\n",
|
" start = time.time()\n",
|
||||||
" r = session.run([output_name], {input_name : data})\n",
|
" r = session.run([output_name], {input_name : data})\n",
|
||||||
" end = time.time()\n",
|
" end = time.time()\n",
|
||||||
|
" \n",
|
||||||
" result = emotion_map(postprocess(r[0]))\n",
|
" result = emotion_map(postprocess(r[0]))\n",
|
||||||
|
" \n",
|
||||||
" result_dict = {\"result\": result,\n",
|
" result_dict = {\"result\": result,\n",
|
||||||
" \"time_in_sec\": [end - start]}\n",
|
" \"time_in_sec\": [end - start]}\n",
|
||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
@@ -260,9 +283,12 @@
|
|||||||
" return json.dumps(result_dict)\n",
|
" return json.dumps(result_dict)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def emotion_map(classes, N=1):\n",
|
"def emotion_map(classes, N=1):\n",
|
||||||
" \"\"\"Take the most probable labels (output of postprocess) and returns the top N emotional labels that fit the picture.\"\"\"\n",
|
" \"\"\"Take the most probable labels (output of postprocess) and returns the \n",
|
||||||
|
" top N emotional labels that fit the picture.\"\"\"\n",
|
||||||
|
" \n",
|
||||||
|
" emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n",
|
||||||
|
" 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n",
|
||||||
" \n",
|
" \n",
|
||||||
" emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n",
|
|
||||||
" emotion_keys = list(emotion_table.keys())\n",
|
" emotion_keys = list(emotion_table.keys())\n",
|
||||||
" emotions = []\n",
|
" emotions = []\n",
|
||||||
" for i in range(N):\n",
|
" for i in range(N):\n",
|
||||||
@@ -276,8 +302,8 @@
|
|||||||
" return e_x / e_x.sum(axis=0)\n",
|
" return e_x / e_x.sum(axis=0)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def postprocess(scores):\n",
|
"def postprocess(scores):\n",
|
||||||
" \"\"\"This function takes the scores generated by the network and returns the class IDs in decreasing \n",
|
" \"\"\"This function takes the scores generated by the network and \n",
|
||||||
" order of probability.\"\"\"\n",
|
" returns the class IDs in decreasing order of probability.\"\"\"\n",
|
||||||
" prob = softmax(scores)\n",
|
" prob = softmax(scores)\n",
|
||||||
" prob = np.squeeze(prob)\n",
|
" prob = np.squeeze(prob)\n",
|
||||||
" classes = np.argsort(prob)[::-1]\n",
|
" classes = np.argsort(prob)[::-1]\n",
|
||||||
@@ -329,7 +355,7 @@
|
|||||||
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
" runtime = \"python\",\n",
|
" runtime = \"python\",\n",
|
||||||
" conda_file = \"myenv.yml\",\n",
|
" conda_file = \"myenv.yml\",\n",
|
||||||
" description = \"test\",\n",
|
" description = \"Emotion ONNX Runtime container\",\n",
|
||||||
" tags = {\"demo\": \"onnx\"})\n",
|
" tags = {\"demo\": \"onnx\"})\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -346,8 +372,6 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Debugging\n",
|
|
||||||
"\n",
|
|
||||||
"In case you need to debug your code, the next line of code accesses the log file."
|
"In case you need to debug your code, the next line of code accesses the log file."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -364,9 +388,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We're all set! Let's get our model chugging.\n",
|
"We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Deploy the container image"
|
"### Deploy the container image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -439,23 +463,57 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Testing and Evaluation"
|
"## Testing and Evaluation\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"### Useful Helper Functions\n",
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"#### Useful Helper Functions\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)."
|
"We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def emotion_map(classes, N=1):\n",
|
||||||
|
" \"\"\"Take the most probable labels (output of postprocess) and returns the \n",
|
||||||
|
" top N emotional labels that fit the picture.\"\"\"\n",
|
||||||
|
" \n",
|
||||||
|
" emotion_table = {'neutral':0, 'happiness':1, 'surprise':2, 'sadness':3, \n",
|
||||||
|
" 'anger':4, 'disgust':5, 'fear':6, 'contempt':7}\n",
|
||||||
|
" \n",
|
||||||
|
" emotion_keys = list(emotion_table.keys())\n",
|
||||||
|
" emotions = []\n",
|
||||||
|
" for i in range(N):\n",
|
||||||
|
" emotions.append(emotion_keys[classes[i]])\n",
|
||||||
|
" \n",
|
||||||
|
" return emotions\n",
|
||||||
|
"\n",
|
||||||
|
"def softmax(x):\n",
|
||||||
|
" \"\"\"Compute softmax values (probabilities from 0 to 1) for each possible label.\"\"\"\n",
|
||||||
|
" x = x.reshape(-1)\n",
|
||||||
|
" e_x = np.exp(x - np.max(x))\n",
|
||||||
|
" return e_x / e_x.sum(axis=0)\n",
|
||||||
|
"\n",
|
||||||
|
"def postprocess(scores):\n",
|
||||||
|
" \"\"\"This function takes the scores generated by the network and \n",
|
||||||
|
" returns the class IDs in decreasing order of probability.\"\"\"\n",
|
||||||
|
" prob = softmax(scores)\n",
|
||||||
|
" prob = np.squeeze(prob)\n",
|
||||||
|
" classes = np.argsort(prob)[::-1]\n",
|
||||||
|
" return classes"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Load Test Data"
|
"### Load Test Data\n",
|
||||||
|
"\n",
|
||||||
|
"These are already in your directory from your ONNX model download (from the model zoo).\n",
|
||||||
|
"\n",
|
||||||
|
"Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -475,8 +533,6 @@
|
|||||||
"import json\n",
|
"import json\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from score import emotion_map, softmax, postprocess\n",
|
|
||||||
"\n",
|
|
||||||
"test_inputs = []\n",
|
"test_inputs = []\n",
|
||||||
"test_outputs = []\n",
|
"test_outputs = []\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -512,7 +568,7 @@
|
|||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"### Show some sample images\n",
|
"### Show some sample images\n",
|
||||||
"We use `matplotlib` to plot 3 test images from the model zoo with their labels over them."
|
"We use `matplotlib` to plot 3 test images from the dataset."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -532,7 +588,7 @@
|
|||||||
" plt.axhline('')\n",
|
" plt.axhline('')\n",
|
||||||
" plt.axvline('')\n",
|
" plt.axvline('')\n",
|
||||||
" plt.text(x = 10, y = -10, s = test_outputs[test_image], fontsize = 18)\n",
|
" plt.text(x = 10, y = -10, s = test_outputs[test_image], fontsize = 18)\n",
|
||||||
" plt.imshow(test_inputs[test_image].reshape(64, 64), cmap = plt.cm.Greys)\n",
|
" plt.imshow(test_inputs[test_image].reshape(64, 64), cmap = plt.cm.gray)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -571,7 +627,7 @@
|
|||||||
" print(r['error'])\n",
|
" print(r['error'])\n",
|
||||||
" break\n",
|
" break\n",
|
||||||
" \n",
|
" \n",
|
||||||
" result = r['result'][0][0]\n",
|
" result = r['result'][0]\n",
|
||||||
" time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n",
|
" time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" ground_truth = test_outputs[i]\n",
|
" ground_truth = test_outputs[i]\n",
|
||||||
@@ -583,7 +639,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
" # use different color for misclassified sample\n",
|
" # use different color for misclassified sample\n",
|
||||||
" font_color = 'red' if ground_truth != result else 'black'\n",
|
" font_color = 'red' if ground_truth != result else 'black'\n",
|
||||||
" clr_map = plt.cm.gray if ground_truth != result else plt.cm.Greys\n",
|
" clr_map = plt.cm.Greys if ground_truth != result else plt.cm.gray\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # ground truth labels are in blue\n",
|
" # ground truth labels are in blue\n",
|
||||||
" plt.text(x = 10, y = -70, s = ground_truth, fontsize = 18, color = 'blue')\n",
|
" plt.text(x = 10, y = -70, s = ground_truth, fontsize = 18, color = 'blue')\n",
|
||||||
@@ -611,15 +667,30 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from PIL import Image\n",
|
"# Preprocessing functions take your image and format it so it can be passed\n",
|
||||||
|
"# as input into our ONNX model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def preprocess(image_path):\n",
|
"import cv2\n",
|
||||||
" input_shape = (1, 1, 64, 64)\n",
|
"\n",
|
||||||
" img = Image.open(image_path)\n",
|
"def rgb2gray(rgb):\n",
|
||||||
" img = img.resize((64, 64), Image.ANTIALIAS)\n",
|
" \"\"\"Convert the input image into grayscale\"\"\"\n",
|
||||||
" img_data = np.array(img)\n",
|
" return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n",
|
||||||
" img_data = np.resize(img_data, input_shape)\n",
|
"\n",
|
||||||
" return img_data"
|
"def resize_img(img):\n",
|
||||||
|
" \"\"\"Resize image to MNIST model input dimensions\"\"\"\n",
|
||||||
|
" img = cv2.resize(img, dsize=(64, 64), interpolation=cv2.INTER_AREA)\n",
|
||||||
|
" img.resize((1, 1, 64, 64))\n",
|
||||||
|
" return img\n",
|
||||||
|
"\n",
|
||||||
|
"def preprocess(img):\n",
|
||||||
|
" \"\"\"Resize input images and convert them to grayscale.\"\"\"\n",
|
||||||
|
" if img.shape == (64, 64):\n",
|
||||||
|
" img.resize((1, 1, 64, 64))\n",
|
||||||
|
" return img\n",
|
||||||
|
" \n",
|
||||||
|
" grayscale = rgb2gray(img)\n",
|
||||||
|
" processed_img = resize_img(grayscale)\n",
|
||||||
|
" return processed_img"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -636,12 +707,15 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# e.g. your_test_image = \"C://Users//vinitra.swamy//Pictures//emotion_test_images//img_1.png\"\n",
|
"# e.g. your_test_image = \"C://Users//vinitra.swamy//Pictures//emotion_test_images//img_1.png\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"your_test_image = \"<path to file>\"\n",
|
"import matplotlib.image as mpimg\n",
|
||||||
"\n",
|
"\n",
|
||||||
"if your_test_image != \"<path to file>\":\n",
|
"if your_test_image != \"<path to file>\":\n",
|
||||||
" img = preprocess(your_test_image)\n",
|
" img = mpimg.imread(your_test_image)\n",
|
||||||
" plt.subplot(1,3,1)\n",
|
" plt.subplot(1,3,1)\n",
|
||||||
" plt.imshow(img.reshape((64,64)), cmap = plt.cm.gray)\n",
|
" plt.imshow(img, cmap = plt.cm.Greys)\n",
|
||||||
|
" print(\"Old Dimensions: \", img.shape)\n",
|
||||||
|
" img = preprocess(img)\n",
|
||||||
|
" print(\"New Dimensions: \", img.shape)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" img = None"
|
" img = None"
|
||||||
]
|
]
|
||||||
@@ -659,7 +733,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
" try:\n",
|
" try:\n",
|
||||||
" r = json.loads(aci_service.run(input_data))\n",
|
" r = json.loads(aci_service.run(input_data))\n",
|
||||||
" result = r['result'][0][0]\n",
|
" result = r['result'][0]\n",
|
||||||
" time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n",
|
" time_ms = np.round(r['time_in_sec'][0] * 1000, 2)\n",
|
||||||
" except Exception as e:\n",
|
" except Exception as e:\n",
|
||||||
" print(str(e))\n",
|
" print(str(e))\n",
|
||||||
@@ -668,12 +742,13 @@
|
|||||||
" plt.subplot(1,8,1)\n",
|
" plt.subplot(1,8,1)\n",
|
||||||
" plt.axhline('')\n",
|
" plt.axhline('')\n",
|
||||||
" plt.axvline('')\n",
|
" plt.axvline('')\n",
|
||||||
" plt.text(x = -10, y = -35, s = \"Model prediction: \", fontsize = 14)\n",
|
" plt.text(x = -10, y = -40, s = \"Model prediction: \", fontsize = 14)\n",
|
||||||
" plt.text(x = -10, y = -20, s = \"Inference time: \", fontsize = 14)\n",
|
" plt.text(x = -10, y = -25, s = \"Inference time: \", fontsize = 14)\n",
|
||||||
" plt.text(x = 100, y = -35, s = str(result), fontsize = 14)\n",
|
" plt.text(x = 100, y = -40, s = str(result), fontsize = 14)\n",
|
||||||
" plt.text(x = 100, y = -20, s = str(time_ms) + \" ms\", fontsize = 14)\n",
|
" plt.text(x = 100, y = -25, s = str(time_ms) + \" ms\", fontsize = 14)\n",
|
||||||
" plt.text(x = -10, y = -8, s = \"Input image: \", fontsize = 14)\n",
|
" plt.text(x = -10, y = -10, s = \"Model Input image: \", fontsize = 14)\n",
|
||||||
" plt.imshow(img.reshape(64, 64), cmap = plt.cm.gray) "
|
" plt.imshow(img.reshape((64, 64)), cmap = plt.cm.gray) \n",
|
||||||
|
" "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -684,7 +759,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# remember to delete your service after you are done using it!\n",
|
"# remember to delete your service after you are done using it!\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# aci_service.delete()"
|
"aci_service.delete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -708,10 +783,15 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "viswamy"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
"name": "python3"
|
"name": "python36"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"language_info": {
|
||||||
"codemirror_mode": {
|
"codemirror_mode": {
|
||||||
@@ -723,7 +803,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.6"
|
||||||
},
|
},
|
||||||
"msauthor": "vinitra.swamy"
|
"msauthor": "vinitra.swamy"
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -22,9 +22,9 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"#### Tutorial Objectives:\n",
|
"#### Tutorial Objectives:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. Describe the MNIST dataset and pretrained Convolutional Neural Net ONNX model, stored in the ONNX model zoo.\n",
|
"- Describe the MNIST dataset and pretrained Convolutional Neural Net ONNX model, stored in the ONNX model zoo.\n",
|
||||||
"2. Deploy and run the pretrained MNIST ONNX model on an Azure Machine Learning instance\n",
|
"- Deploy and run the pretrained MNIST ONNX model on an Azure Machine Learning instance\n",
|
||||||
"3. Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML"
|
"- Predict labels for test set data points in the cloud using ONNX Runtime and Azure ML"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -34,31 +34,61 @@
|
|||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### 1. Install Azure ML SDK and create a new workspace\n",
|
"### 1. Install Azure ML SDK and create a new workspace\n",
|
||||||
"Please follow [00.configuration.ipynb](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) notebook.\n",
|
"Please follow [Azure ML configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/00.configuration.ipynb) to set up your environment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### 2. Install additional packages needed for this Notebook\n",
|
"### 2. Install additional packages needed for this tutorial notebook\n",
|
||||||
"You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed. \n",
|
"You need to install the popular plotting library `matplotlib`, the image manipulation library `opencv`, and the `onnx` library in the conda environment where Azure Maching Learning SDK is installed. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"```sh\n",
|
"```sh\n",
|
||||||
"(myenv) $ pip install matplotlib onnx opencv-python\n",
|
"(myenv) $ pip install matplotlib onnx opencv-python\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"**Debugging tip**: Make sure that you run the \"jupyter notebook\" command to launch this notebook after activating your virtual environment. Choose the respective Python kernel for your new virtual environment using the `Kernel > Change Kernel` menu above. If you have completed the steps correctly, the upper right corner of your screen should state `Python [conda env:myenv]` instead of `Python [default]`.\n",
|
||||||
|
"\n",
|
||||||
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
|
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"[Download the ONNX MNIST model and corresponding test data](https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz) and place them in the same folder as this tutorial notebook. You can unzip the file through the following line of code.\n",
|
"In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# urllib is a built-in Python library to download files from URLs\n",
|
||||||
"\n",
|
"\n",
|
||||||
"```sh\n",
|
"# Objective: retrieve the latest version of the ONNX MNIST model files from the\n",
|
||||||
"(myenv) $ tar xvzf mnist.tar.gz\n",
|
"# ONNX Model Zoo and save it in the same folder as this tutorial\n",
|
||||||
"```\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"More information can be found about the ONNX MNIST model on [github](https://github.com/onnx/models/tree/master/mnist). For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)."
|
"import urllib.request\n",
|
||||||
|
"\n",
|
||||||
|
"onnx_model_url = \"https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz\"\n",
|
||||||
|
"\n",
|
||||||
|
"urllib.request.urlretrieve(onnx_model_url, filename=\"mnist.tar.gz\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# the ! magic command tells our jupyter notebook kernel to run the following line of \n",
|
||||||
|
"# code from the command line instead of the notebook kernel\n",
|
||||||
|
"\n",
|
||||||
|
"# We use tar and xvcf to unzip the files we just retrieved from the ONNX model zoo\n",
|
||||||
|
"\n",
|
||||||
|
"!tar xvzf mnist.tar.gz"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Load Azure ML workspace\n",
|
"## Deploy a VM with your ONNX model in the Cloud\n",
|
||||||
|
"\n",
|
||||||
|
"### Load Azure ML workspace\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
||||||
]
|
]
|
||||||
@@ -113,11 +143,11 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.model import Model\n",
|
"from azureml.core.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"model = Model.register(model_path = model_dir + \"//model.onnx\",\n",
|
"model = Model.register(workspace = ws,\n",
|
||||||
|
" model_path = model_dir + \"/\" + \"model.onnx\",\n",
|
||||||
" model_name = \"mnist_1\",\n",
|
" model_name = \"mnist_1\",\n",
|
||||||
" tags = {\"onnx\": \"demo\"},\n",
|
" tags = {\"onnx\": \"demo\"},\n",
|
||||||
" description = \"MNIST image classification CNN from ONNX Model Zoo\",\n",
|
" description = \"MNIST image classification CNN from ONNX Model Zoo\",)"
|
||||||
" workspace = ws)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -136,8 +166,8 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"models = ws.models()\n",
|
"models = ws.models()\n",
|
||||||
"for m in models:\n",
|
"for name, m in models.items():\n",
|
||||||
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
" print(\"Name:\", name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -188,16 +218,14 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Deploy our model on Azure ML"
|
"### Specify our Score and Environment Files"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file.\n",
|
"We are now going to deploy our ONNX Model on AML with inference in ONNX Runtime. We begin by writing a score.py file, which will help us run the model in our Azure ML virtual machine (VM), and then specify our environment by writing a yml file. You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n",
|
||||||
"\n",
|
|
||||||
"You will also notice that we import the onnxruntime library to do runtime inference on our ONNX models (passing in input and evaluating out model's predicted output). More information on the API and commands can be found in the [ONNX Runtime documentation](https://aka.ms/onnxruntime).\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"### Write Score File\n",
|
"### Write Score File\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -248,7 +276,7 @@
|
|||||||
" return json.dumps(result_dict)\n",
|
" return json.dumps(result_dict)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def choose_class(result_prob):\n",
|
"def choose_class(result_prob):\n",
|
||||||
" \"\"\"We use argmax to determine the right label to choose from our output, after calling softmax on the 10 numbers we receive\"\"\"\n",
|
" \"\"\"We use argmax to determine the right label to choose from our output\"\"\"\n",
|
||||||
" return int(np.argmax(result_prob, axis=0))"
|
" return int(np.argmax(result_prob, axis=0))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -256,14 +284,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Write Environment File"
|
"### Write Environment File\n",
|
||||||
]
|
"\n",
|
||||||
},
|
"This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine."
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"This step creates a YAML file that specifies which dependencies we would like to see in our Linux Virtual Machine."
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -289,10 +312,19 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Create the Container Image\n",
|
"### Create the Container Image\n",
|
||||||
"\n",
|
|
||||||
"This step will likely take a few minutes."
|
"This step will likely take a few minutes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"help(ContainerImage.image_configuration)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -304,8 +336,8 @@
|
|||||||
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
" runtime = \"python\",\n",
|
" runtime = \"python\",\n",
|
||||||
" conda_file = \"myenv.yml\",\n",
|
" conda_file = \"myenv.yml\",\n",
|
||||||
" description = \"test\",\n",
|
" description = \"MNIST ONNX Runtime container\",\n",
|
||||||
" tags = {\"demo\": \"onnx\"}) )\n",
|
" tags = {\"demo\": \"onnx\"}) \n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"image = ContainerImage.create(name = \"onnxtest\",\n",
|
"image = ContainerImage.create(name = \"onnxtest\",\n",
|
||||||
@@ -321,8 +353,6 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Debugging\n",
|
|
||||||
"\n",
|
|
||||||
"In case you need to debug your code, the next line of code accesses the log file."
|
"In case you need to debug your code, the next line of code accesses the log file."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -339,9 +369,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We're all set! Let's get our model chugging.\n",
|
"We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Deploy the container image"
|
"### Deploy the container image"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -373,7 +403,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service_name = 'onnx-demo-mnist'\n",
|
"aci_service_name = 'onnx-demo-mnist20'\n",
|
||||||
"print(\"Service\", aci_service_name)\n",
|
"print(\"Service\", aci_service_name)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
@@ -414,16 +444,13 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Testing and Evaluation"
|
"## Testing and Evaluation\n",
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Load Test Data\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"These are already in your directory from your ONNX model download (from the model zoo). If you didn't place your model and test data in the same directory as this notebook, edit the \"model_dir\" filename below."
|
"### Load Test Data\n",
|
||||||
|
"\n",
|
||||||
|
"These are already in your directory from your ONNX model download (from the model zoo).\n",
|
||||||
|
"\n",
|
||||||
|
"Notice that our Model Zoo files have a .pb extension. This is because they are [protobuf files (Protocol Buffers)](https://developers.google.com/protocol-buffers/docs/pythontutorial), so we need to read in our data through our ONNX TensorProto reader into a format we can work with, like numerical arrays."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -579,7 +606,9 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Preprocessing functions\n",
|
"# Preprocessing functions take your image and format it so it can be passed\n",
|
||||||
|
"# as input into our ONNX model\n",
|
||||||
|
"\n",
|
||||||
"import cv2\n",
|
"import cv2\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def rgb2gray(rgb):\n",
|
"def rgb2gray(rgb):\n",
|
||||||
@@ -587,12 +616,17 @@
|
|||||||
" return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n",
|
" return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def resize_img(img):\n",
|
"def resize_img(img):\n",
|
||||||
|
" \"\"\"Resize image to MNIST model input dimensions\"\"\"\n",
|
||||||
" img = cv2.resize(img, dsize=(28, 28), interpolation=cv2.INTER_AREA)\n",
|
" img = cv2.resize(img, dsize=(28, 28), interpolation=cv2.INTER_AREA)\n",
|
||||||
" img.resize((1, 1, 28, 28))\n",
|
" img.resize((1, 1, 28, 28))\n",
|
||||||
" return img\n",
|
" return img\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def preprocess(img):\n",
|
"def preprocess(img):\n",
|
||||||
" \"\"\"Resize input images and convert them to grayscale.\"\"\"\n",
|
" \"\"\"Resize input images and convert them to grayscale.\"\"\"\n",
|
||||||
|
" if img.shape == (28, 28):\n",
|
||||||
|
" img.resize((1, 1, 28, 28))\n",
|
||||||
|
" return img\n",
|
||||||
|
" \n",
|
||||||
" grayscale = rgb2gray(img)\n",
|
" grayscale = rgb2gray(img)\n",
|
||||||
" processed_img = resize_img(grayscale)\n",
|
" processed_img = resize_img(grayscale)\n",
|
||||||
" return processed_img"
|
" return processed_img"
|
||||||
@@ -608,11 +642,8 @@
|
|||||||
"# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n",
|
"# Make sure your image is square and the dimensions are equal (i.e. 100 * 100 pixels or 28 * 28 pixels)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Any PNG or JPG image file should work\n",
|
"# Any PNG or JPG image file should work\n",
|
||||||
"# Make sure to include the entire path with // instead of /\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# e.g. your_test_image = \"C://Users//vinitra.swamy//Pictures//digit.png\"\n",
|
"# e.g. your_test_image = \"C:/Users/vinitra.swamy/Pictures/handwritten_digit.png\"\n",
|
||||||
"\n",
|
|
||||||
"your_test_image = \"<path to file>\"\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"import matplotlib.image as mpimg\n",
|
"import matplotlib.image as mpimg\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -721,7 +752,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# remember to delete your service after you are done using it!\n",
|
"# remember to delete your service after you are done using it!\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# aci_service.delete()"
|
"aci_service.delete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -738,16 +769,21 @@
|
|||||||
"- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n",
|
"- ensured that your deep learning model is working perfectly (in the cloud) on test data, and checked it against some of your own!\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Next steps:\n",
|
"Next steps:\n",
|
||||||
"- Check out another interesting application based on a Microsoft Research computer vision paper that lets you set up a [facial emotion recognition model](https://github.com/Azure/MachineLearningNotebooks/tree/master/onnx/onnx-inference-emotion-recognition.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model in an Azure ML virtual machine with GPU support.\n",
|
"- Check out another interesting application based on a Microsoft Research computer vision paper that lets you set up a [facial emotion recognition model](https://github.com/Azure/MachineLearningNotebooks/tree/master/onnx/onnx-inference-emotion-recognition.ipynb) in the cloud! This tutorial deploys a pre-trained ONNX Computer Vision model in an Azure ML virtual machine.\n",
|
||||||
"- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)"
|
"- Contribute to our [open source ONNX repository on github](http://github.com/onnx/onnx) and/or add to our [ONNX model zoo](http://github.com/onnx/models)"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "viswamy"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
"name": "python3"
|
"name": "python36"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"language_info": {
|
||||||
"codemirror_mode": {
|
"codemirror_mode": {
|
||||||
@@ -759,7 +795,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.6"
|
||||||
},
|
},
|
||||||
"msauthor": "vinitra.swamy"
|
"msauthor": "vinitra.swamy"
|
||||||
},
|
},
|
||||||
|
|||||||
409
onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb
Normal file
409
onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb
Normal file
@@ -0,0 +1,409 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# ResNet50 Image Classification using ONNX and AzureML\n",
|
||||||
|
"\n",
|
||||||
|
"This example shows how to deploy the ResNet50 ONNX model as a web service using Azure Machine Learning services and the ONNX Runtime.\n",
|
||||||
|
"\n",
|
||||||
|
"## What is ONNX\n",
|
||||||
|
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
|
||||||
|
"\n",
|
||||||
|
"## ResNet50 Details\n",
|
||||||
|
"ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/models/image_classification/resnet). "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prerequisites\n",
|
||||||
|
"\n",
|
||||||
|
"To make the best use of your time, make sure you have done the following:\n",
|
||||||
|
"\n",
|
||||||
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
||||||
|
"* Go through the [00.configuration.ipynb](../00.configuration.ipynb) notebook to:\n",
|
||||||
|
" * install the AML SDK\n",
|
||||||
|
" * create a workspace and its configuration file (config.json)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Check core SDK version number\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Download pre-trained ONNX model from ONNX Model Zoo.\n",
|
||||||
|
"\n",
|
||||||
|
"Download the [ResNet50v2 model and test data](https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz) and place it in the same folder as this tutorial notebook. You can unzip the file through the following line of code.\n",
|
||||||
|
"\n",
|
||||||
|
"```sh\n",
|
||||||
|
"(myenv) $ tar xvzf resnet50v2.tar.gz\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploying as a web service with Azure ML"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load your Azure ML workspace\n",
|
||||||
|
"\n",
|
||||||
|
"We begin by instantiating a workspace object from the existing workspace created earlier in the configuration notebook."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.location, ws.resource_group, sep = '\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Register your model with Azure ML\n",
|
||||||
|
"\n",
|
||||||
|
"Now we upload the model and register it in the workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"model = Model.register(model_path = \"resnet50v2/resnet50v2.onnx\",\n",
|
||||||
|
" model_name = \"resnet50v2\",\n",
|
||||||
|
" tags = {\"onnx\": \"demo\"},\n",
|
||||||
|
" description = \"ResNet50v2 from ONNX Model Zoo\",\n",
|
||||||
|
" workspace = ws)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Displaying your registered models\n",
|
||||||
|
"\n",
|
||||||
|
"You can optionally list out all the models that you have registered in this workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"models = ws.models()\n",
|
||||||
|
"for m in models:\n",
|
||||||
|
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Write scoring file\n",
|
||||||
|
"\n",
|
||||||
|
"We are now going to deploy our ONNX model on Azure ML using the ONNX Runtime. We begin by writing a score.py file that will be invoked by the web service call. The `init()` function is called once when the container is started so we load the model using the ONNX Runtime into a global session object."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import json\n",
|
||||||
|
"import time\n",
|
||||||
|
"import sys\n",
|
||||||
|
"import os\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"import numpy as np # we're going to use numpy to process input and output data\n",
|
||||||
|
"import onnxruntime # to inference ONNX models, we use the ONNX Runtime\n",
|
||||||
|
"\n",
|
||||||
|
"def softmax(x):\n",
|
||||||
|
" x = x.reshape(-1)\n",
|
||||||
|
" e_x = np.exp(x - np.max(x))\n",
|
||||||
|
" return e_x / e_x.sum(axis=0)\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global session\n",
|
||||||
|
" model = Model.get_model_path(model_name = 'resnet50v2')\n",
|
||||||
|
" session = onnxruntime.InferenceSession(model, None)\n",
|
||||||
|
"\n",
|
||||||
|
"def preprocess(input_data_json):\n",
|
||||||
|
" # convert the JSON data into the tensor input\n",
|
||||||
|
" img_data = np.array(json.loads(input_data_json)['data']).astype('float32')\n",
|
||||||
|
" \n",
|
||||||
|
" #normalize\n",
|
||||||
|
" mean_vec = np.array([0.485, 0.456, 0.406])\n",
|
||||||
|
" stddev_vec = np.array([0.229, 0.224, 0.225])\n",
|
||||||
|
" norm_img_data = np.zeros(img_data.shape).astype('float32')\n",
|
||||||
|
" for i in range(img_data.shape[0]):\n",
|
||||||
|
" norm_img_data[i,:,:] = (img_data[i,:,:]/255 - mean_vec[i]) / stddev_vec[i]\n",
|
||||||
|
"\n",
|
||||||
|
" return norm_img_data\n",
|
||||||
|
"\n",
|
||||||
|
"def postprocess(result):\n",
|
||||||
|
" return softmax(np.array(result)).tolist()\n",
|
||||||
|
"\n",
|
||||||
|
"def run(input_data_json):\n",
|
||||||
|
" try:\n",
|
||||||
|
" start = time.time()\n",
|
||||||
|
" # load in our data which is expected as NCHW 224x224 image\n",
|
||||||
|
" input_data = preprocess(input_data_json)\n",
|
||||||
|
" input_name = session.get_inputs()[0].name # get the id of the first input of the model \n",
|
||||||
|
" result = session.run([], {input_name: input_data})\n",
|
||||||
|
" end = time.time() # stop timer\n",
|
||||||
|
" return {\"result\": postprocess(result),\n",
|
||||||
|
" \"time\": end - start}\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return {\"error\": result}"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create container image"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"First we create a YAML file that specifies which dependencies we would like to see in our container."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\"])\n",
|
||||||
|
"\n",
|
||||||
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
|
" f.write(myenv.serialize_to_string())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Then we have Azure ML create the container. This step will likely take a few minutes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
|
||||||
|
" runtime = \"python\",\n",
|
||||||
|
" conda_file = \"myenv.yml\",\n",
|
||||||
|
" description = \"ONNX ResNet50 Demo\",\n",
|
||||||
|
" tags = {\"demo\": \"onnx\"}\n",
|
||||||
|
" )\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"image = ContainerImage.create(name = \"onnxresnet50v2\",\n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"In case you need to debug your code, the next line of code accesses the log file."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(image.image_build_log_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We're all set! Let's get our model chugging.\n",
|
||||||
|
"\n",
|
||||||
|
"### Deploy the container image"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'demo': 'onnx'}, \n",
|
||||||
|
" description = 'web service for ResNet50 ONNX model')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The following cell will likely take a few minutes to run as well."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"from random import randint\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
|
||||||
|
"print(\"Service\", aci_service_name)\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"In case the deployment fails, you can check the logs. Make sure to delete your aci_service before trying again."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"if aci_service.state != 'Healthy':\n",
|
||||||
|
" # run this command for debugging.\n",
|
||||||
|
" print(aci_service.get_logs())\n",
|
||||||
|
" aci_service.delete()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Success!\n",
|
||||||
|
"\n",
|
||||||
|
"If you've made it this far, you've deployed a working web service that does image classification using an ONNX model. You can get the URL for the webservice with the code below."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(aci_service.scoring_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"When you are eventually done using the web service, remember to delete it."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "onnx"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.5.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
651
onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb
Normal file
651
onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb
Normal file
File diff suppressed because one or more lines are too long
@@ -53,6 +53,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "hichando"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -16,6 +16,14 @@
|
|||||||
"This notebook demonstrates how to run batch scoring job. __[Inception-V3 model](https://arxiv.org/abs/1512.00567)__ and unlabeled images from __[ImageNet](http://image-net.org/)__ dataset will be used. It registers a pretrained inception model in model registry then uses the model to do batch scoring on images in a blob container."
|
"This notebook demonstrates how to run batch scoring job. __[Inception-V3 model](https://arxiv.org/abs/1512.00567)__ and unlabeled images from __[ImageNet](http://image-net.org/)__ dataset will be used. It registers a pretrained inception model in model registry then uses the model to do batch scoring on images in a blob container."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prerequisites\n",
|
||||||
|
"Make sure you go through the [00. Installation and Configuration](./00.configuration.ipynb) Notebook first if you haven't.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -32,11 +40,10 @@
|
|||||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')\n",
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Also create a Project and attach to Workspace\n",
|
"# Also create a Project and attach to Workspace\n",
|
||||||
"project_folder = \"sample_projects\"\n",
|
"scripts_folder = \"scripts\"\n",
|
||||||
"run_history_name = project_folder\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"if not os.path.isdir(project_folder):\n",
|
"if not os.path.isdir(scripts_folder):\n",
|
||||||
" os.mkdir(project_folder)"
|
" os.mkdir(scripts_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -68,7 +75,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Batch AI compute\n",
|
"# Batch AI compute\n",
|
||||||
"cluster_name = \"gpu_cluster\"\n",
|
"cluster_name = \"gpu-cluster\"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" cluster = BatchAiCompute(ws, cluster_name)\n",
|
" cluster = BatchAiCompute(ws, cluster_name)\n",
|
||||||
" print(\"found existing cluster.\")\n",
|
" print(\"found existing cluster.\")\n",
|
||||||
@@ -104,7 +111,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%writefile $project_folder/batchai_score.py\n",
|
"%%writefile $scripts_folder/batchai_score.py\n",
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"import argparse\n",
|
"import argparse\n",
|
||||||
"import datetime,time\n",
|
"import datetime,time\n",
|
||||||
@@ -225,6 +232,15 @@
|
|||||||
"## Prepare Model and Input data"
|
"## Prepare Model and Input data"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Download Model\n",
|
||||||
|
"\n",
|
||||||
|
"Download and extract model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to `\"models\"`"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -238,27 +254,29 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"### Download Model\n",
|
"import tarfile\n",
|
||||||
"<font color=red>This manual step is required to register the model to the workspace</font>\n",
|
"import urllib.request\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Download and extract model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to model_dir"
|
"url=\"http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz\"\n",
|
||||||
|
"response = urllib.request.urlretrieve(url, \"model.tar.gz\")\n",
|
||||||
|
"tar = tarfile.open(\"model.tar.gz\", \"r:gz\")\n",
|
||||||
|
"tar.extractall(model_dir)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Get samples images and upload to Datastore\n",
|
"### Create a datastore that points to blob container containing sample images\n",
|
||||||
"<font color=red>This manual step is required to run batchai_score.py</font>\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"Download and extract sample images from ImageNet evaluation set and **upload** to a blob that will be registered as a Datastore in the next step\n",
|
"We have created a public blob container `sampledata` on an account named `pipelinedata` containing images from ImageNet evaluation set. In the next step, we create a datastore with name `images_datastore` that points to this container. The `overwrite=True` step overwrites any datastore that was created previously with that name. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"A copy of sample images from ImageNet evaluation set can be found at __[BatchAI Samples Blob](https://batchaisamples.blob.core.windows.net/samples/imagenet_samples.zip?st=2017-09-29T18%3A29%3A00Z&se=2099-12-31T08%3A00%3A00Z&sp=rl&sv=2016-05-31&sr=c&sig=PmhL%2BYnYAyNTZr1DM2JySvrI12e%2F4wZNIwCtf7TRI%2BM%3D)__ \n",
|
"This step can be changed to point to your blob container by providing an additional `account_key` parameter with `account_name`. "
|
||||||
"\n",
|
|
||||||
"There are multiple ways to create folders and upload files into Azure Blob Container - you can use __[Azure Portal](https://ms.portal.azure.com/)__, __[Storage Explorer](http://storageexplorer.com/)__, __[Azure CLI2](https://render.githubusercontent.com/azure-cli-extension)__ or Azure SDK for your preferable programming language. "
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -267,8 +285,8 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"account_name = \"batchscoringdata\"\n",
|
"account_name = \"pipelinedata\"\n",
|
||||||
"sample_data = Datastore.register_azure_blob_container(ws, \"sampledata\", \"sampledata\", \n",
|
"sample_data = Datastore.register_azure_blob_container(ws, datastore_name=\"images_datastore\", container_name=\"sampledata\", \n",
|
||||||
" account_name=account_name, \n",
|
" account_name=account_name, \n",
|
||||||
" overwrite=True)"
|
" overwrite=True)"
|
||||||
]
|
]
|
||||||
@@ -293,7 +311,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"default_ds = \"workspaceblobstore\""
|
"default_ds = ws.get_default_datastore()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -338,7 +356,7 @@
|
|||||||
" mode=\"download\" \n",
|
" mode=\"download\" \n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"output_dir = PipelineData(name=\"scores\", \n",
|
"output_dir = PipelineData(name=\"scores\", \n",
|
||||||
" datastore_name=default_ds, \n",
|
" datastore_name=default_ds.name, \n",
|
||||||
" output_path_on_compute=\"batchscoring/results\")"
|
" output_path_on_compute=\"batchscoring/results\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -435,7 +453,7 @@
|
|||||||
" inputs=[input_images, label_dir],\n",
|
" inputs=[input_images, label_dir],\n",
|
||||||
" outputs=[output_dir],\n",
|
" outputs=[output_dir],\n",
|
||||||
" runconfig=batchai_run_config,\n",
|
" runconfig=batchai_run_config,\n",
|
||||||
" source_directory=project_folder\n",
|
" source_directory=scripts_folder\n",
|
||||||
")"
|
")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -599,6 +617,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "hichando"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
@@ -614,7 +637,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.5"
|
"version": "3.6.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
39
pr.md
Normal file
39
pr.md
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
# Azure Machine Learning Resources & Links
|
||||||
|
## Product Documentation
|
||||||
|
- [Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/)
|
||||||
|
- [Azure Machine Learning Studio](https://docs.microsoft.com/en-us/azure/machine-learning/studio/)
|
||||||
|
|
||||||
|
## Product Team Blogs
|
||||||
|
- [What’s new in Azure Machine Learning service](https://aka.ms/aml-blog-whats-new)
|
||||||
|
- [Announcing automated ML capability in Azure Machine Learning](https://aka.ms/aml-blog-automl)
|
||||||
|
- [Experimentation using Azure Machine Learning](https://aka.ms/aml-blog-experimentation)
|
||||||
|
- [Azure AI – Making AI real for business](https://aka.ms/aml-blog-overview)
|
||||||
|
|
||||||
|
## Community Blogs
|
||||||
|
- [Power Bat – How Spektacom is Powering the Game of Cricket with Microsoft AI](https://blogs.technet.microsoft.com/machinelearning/2018/10/11/power-bat-how-spektacom-is-powering-the-game-of-cricket-with-microsoft-ai/)
|
||||||
|
|
||||||
|
## Third Party Articles
|
||||||
|
- [Azure’s new machine learning features embrace Python](https://www.infoworld.com/article/3306840/azure/azures-new-machine-learning-features-embrace-python.html) (InfoWorld)
|
||||||
|
- [How to use Azure ML in Windows 10](https://www.infoworld.com/article/3308381/azure/how-to-use-azure-ml-in-windows-10.html) (InfoWorld)
|
||||||
|
- [How Azure ML Streamlines Cloud-based Machine Learning](https://thenewstack.io/how-the-azure-ml-streamlines-cloud-based-machine-learning/) (The New Stack)
|
||||||
|
- [Facebook launches PyTorch 1.0 with integrations for Google Cloud, AWS, and Azure Machine Learning](https://venturebeat.com/2018/10/02/facebook-launches-pytorch-1-0-integrations-for-google-cloud-aws-and-azure-machine-learning/) (VentureBeat)
|
||||||
|
- [How Microsoft Uses Machine Learning to Help You Build Machine Learning Pipelines](https://towardsdatascience.com/how-microsoft-uses-machine-learning-to-help-you-build-machine-learning-pipelines-be75f710613b) (Towards Data Science)
|
||||||
|
- [Microsoft's Machine Learning Tools for Developers Get Smarter](https://techcrunch.com/2018/09/24/microsofts-machine-learning-tools-for-developers-get-smarter/) (TechCrunch)
|
||||||
|
- [Microsoft introduces Azure service to automatically build AI models](https://venturebeat.com/2018/09/24/microsoft-introduces-azure-service-to-automatically-build-ai-models/) (VentureBeat)
|
||||||
|
|
||||||
|
## Community Projects
|
||||||
|
- [Fashion MNIST](https://github.com/amynic/azureml-sdk-fashion)
|
||||||
|
- Keras on Databricks
|
||||||
|
- Samples from CSS
|
||||||
|
|
||||||
|
|
||||||
|
## Azure Machine Learning Studio Resources
|
||||||
|
- [A-Z Machine Learning using Azure Machine Learning (AzureML)](https://www.udemy.com/machine-learning-using-azureml/)
|
||||||
|
- [Machine Learning In The Cloud With Azure Machine Learning](https://www.udemy.com/machine-learning-in-the-cloud-with-azure-machine-learning/)
|
||||||
|
- [How to Become A Data Scientist Using Azure Machine Learning](https://www.udemy.com/azure-machine-learning-introduction/)
|
||||||
|
- [Learn Azure Machine Learning from scratch](https://www.udemy.com/learn-azure-machine-learning-from-scratch/)
|
||||||
|
- [Azure Machine Learning Studio PowerShell Module](https://aka.ms/amlps)
|
||||||
|
|
||||||
|
## Forum Help
|
||||||
|
- [Azure Machine Learning service](https://social.msdn.microsoft.com/Forums/en-US/home?forum=AzureMachineLearningService)
|
||||||
|
- [Azure Machine Learning Studio](https://social.msdn.microsoft.com/forums/azure/en-US/home?forum=MachineLearning)
|
||||||
@@ -594,6 +594,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "coverste"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -103,7 +103,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"classifier_input, classifier_output = Resnet50.get_default_classifier(feature_tensor, model_path)"
|
"classifier_output = model.get_default_classifier(feature_tensor)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -131,7 +131,7 @@
|
|||||||
"with tf.Session() as sess:\n",
|
"with tf.Session() as sess:\n",
|
||||||
" model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors))\n",
|
" model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors))\n",
|
||||||
" model_def.pipeline.append(BrainWaveStage(sess, model))\n",
|
" model_def.pipeline.append(BrainWaveStage(sess, model))\n",
|
||||||
" model_def.pipeline.append(TensorflowStage(sess, classifier_input, classifier_output))\n",
|
" model_def.pipeline.append(TensorflowStage(sess, feature_tensor, classifier_output))\n",
|
||||||
" model_def.save(model_def_path)\n",
|
" model_def.save(model_def_path)\n",
|
||||||
" print(model_def_path)"
|
" print(model_def_path)"
|
||||||
]
|
]
|
||||||
@@ -286,6 +286,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "coverste"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -511,7 +511,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"New BSD License\n",
|
"New BSD License\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Copyright (c) 2007–2018 The scikit-learn developers.\n",
|
"Copyright (c) 2007\u00e2\u20ac\u201c2018 The scikit-learn developers.\n",
|
||||||
"All rights reserved.\n",
|
"All rights reserved.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -544,6 +544,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "coverste"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -43,6 +43,28 @@
|
|||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Diagnostics\n",
|
||||||
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"Diagnostics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -244,7 +266,7 @@
|
|||||||
"In `pytorch_train.py`, we will log some metrics to our AML run. To do so, we will access the AML run object within the script:\n",
|
"In `pytorch_train.py`, we will log some metrics to our AML run. To do so, we will access the AML run object within the script:\n",
|
||||||
"```Python\n",
|
"```Python\n",
|
||||||
"from azureml.core.run import Run\n",
|
"from azureml.core.run import Run\n",
|
||||||
"run = Run.get_submitted_run()\n",
|
"run = Run.get_context()\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"Further within `pytorch_train.py`, we log the learning rate and momentum parameters, and the best validation accuracy the model achieves:\n",
|
"Further within `pytorch_train.py`, we log the learning rate and momentum parameters, and the best validation accuracy the model achieves:\n",
|
||||||
"```Python\n",
|
"```Python\n",
|
||||||
@@ -735,6 +757,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "minxia"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ import argparse
|
|||||||
|
|
||||||
from azureml.core.run import Run
|
from azureml.core.run import Run
|
||||||
# get the Azure ML run object
|
# get the Azure ML run object
|
||||||
run = Run.get_submitted_run()
|
run = Run.get_context()
|
||||||
|
|
||||||
|
|
||||||
def load_data(data_dir):
|
def load_data(data_dir):
|
||||||
@@ -162,8 +162,8 @@ def main():
|
|||||||
parser.add_argument('--data_dir', type=str, help='directory of training data')
|
parser.add_argument('--data_dir', type=str, help='directory of training data')
|
||||||
parser.add_argument('--num_epochs', type=int, default=25, help='number of epochs to train')
|
parser.add_argument('--num_epochs', type=int, default=25, help='number of epochs to train')
|
||||||
parser.add_argument('--output_dir', type=str, help='output directory')
|
parser.add_argument('--output_dir', type=str, help='output directory')
|
||||||
parser.add_argument('--learning_rate', type=float, help='learning rate')
|
parser.add_argument('--learning_rate', type=float, default=0.001, help='learning rate')
|
||||||
parser.add_argument('--momentum', type=float, help='momentum')
|
parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
print("data directory is: " + args.data_dir)
|
print("data directory is: " + args.data_dir)
|
||||||
|
|||||||
@@ -18,7 +18,6 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"attachments": {},
|
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
@@ -42,6 +41,28 @@
|
|||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Diagnostics\n",
|
||||||
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"Diagnostics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -265,6 +286,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "minxia"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -17,7 +17,7 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"# 03. Training MNIST dataset with hyperparameter tuning & deploy to ACI\n",
|
"# 03. Training, hyperparameter tune, and deploy with TensorFlow\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Introduction\n",
|
"## Introduction\n",
|
||||||
"This tutorial shows how to train a simple deep neural network using the MNIST dataset and TensorFlow on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n",
|
"This tutorial shows how to train a simple deep neural network using the MNIST dataset and TensorFlow on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n",
|
||||||
@@ -72,6 +72,28 @@
|
|||||||
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Diagnostics\n",
|
||||||
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"Diagnostics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
|
"set_diagnostics_collection(send_diagnostics=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -246,17 +268,17 @@
|
|||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# choose a name for your cluster\n",
|
||||||
"batchai_cluster_name = \"gpucluster\"\n",
|
"cluster_name = \"gpucluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" # look for the existing cluster by name\n",
|
" # look for the existing cluster by name\n",
|
||||||
" compute_target = ComputeTarget(workspace=ws, name=batchai_cluster_name)\n",
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||||
" if type(compute_target) is BatchAiCompute:\n",
|
" if type(compute_target) is BatchAiCompute:\n",
|
||||||
" print('found compute target {}, just use it.'.format(batchai_cluster_name))\n",
|
" print('Found existing compute target {}.'.format(cluster_name))\n",
|
||||||
" else:\n",
|
" else:\n",
|
||||||
" print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))\n",
|
" print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(cluster_name))\n",
|
||||||
"except ComputeTargetException:\n",
|
"except ComputeTargetException:\n",
|
||||||
" print('creating a new compute target...')\n",
|
" print('Creating a new compute target...')\n",
|
||||||
" compute_config = BatchAiCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\", # GPU-based VM\n",
|
" compute_config = BatchAiCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\", # GPU-based VM\n",
|
||||||
" #vm_priority='lowpriority', # optional\n",
|
" #vm_priority='lowpriority', # optional\n",
|
||||||
" autoscale_enabled=True,\n",
|
" autoscale_enabled=True,\n",
|
||||||
@@ -264,7 +286,7 @@
|
|||||||
" cluster_max_nodes=4)\n",
|
" cluster_max_nodes=4)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # create the cluster\n",
|
" # create the cluster\n",
|
||||||
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)\n",
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
||||||
" # if no min node count is provided it uses the scale settings for the cluster\n",
|
" # if no min node count is provided it uses the scale settings for the cluster\n",
|
||||||
@@ -278,7 +300,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now that you have created the compute target, let's see what the workspace's `compute_targets()` function returns. You should now see one entry named 'cpucluster' of type BatchAI."
|
"Now that you have created the compute target, let's see what the workspace's `compute_targets()` function returns. You should now see one entry named 'gpucluster' of type BatchAI."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -287,8 +309,9 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"for ct in ws.compute_targets():\n",
|
"compute_targets = ws.compute_targets()\n",
|
||||||
" print(ct.name, ct.type, ct.provisioning_state)"
|
"for name, ct in compute_targets.items():\n",
|
||||||
|
" print(name, ct.type, ct.provisioning_state)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -338,7 +361,7 @@
|
|||||||
" parser = argparse.ArgumentParser()\n",
|
" parser = argparse.ArgumentParser()\n",
|
||||||
" parser.add_argument('--data_folder')\n",
|
" parser.add_argument('--data_folder')\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_submitted_run()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n",
|
"2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_context()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
" run.log('training_acc', np.float(acc_train))\n",
|
" run.log('training_acc', np.float(acc_train))\n",
|
||||||
" run.log('validation_acc', np.float(acc_val))\n",
|
" run.log('validation_acc', np.float(acc_val))\n",
|
||||||
@@ -1056,14 +1079,17 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"for model in ws.models():\n",
|
"models = ws.models()\n",
|
||||||
" print(\"Model:\", model.name, model.id)\n",
|
"for name, model in models.items():\n",
|
||||||
|
" print(\"Model: {}, ID: {}\".format(name, model.id))\n",
|
||||||
" \n",
|
" \n",
|
||||||
"for image in ws.images():\n",
|
"images = ws.images()\n",
|
||||||
" print(\"Image:\", image.name, image.image_location)\n",
|
"for name, image in images.items():\n",
|
||||||
|
" print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
|
||||||
" \n",
|
" \n",
|
||||||
"for webservice in ws.webservices():\n",
|
"webservices = ws.webservices()\n",
|
||||||
" print(\"Webservice:\", webservice.name, webservice.scoring_uri)"
|
"for name, webservice in webservices.items():\n",
|
||||||
|
" print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -1102,6 +1128,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "minxia"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -64,7 +64,7 @@ init = tf.global_variables_initializer()
|
|||||||
saver = tf.train.Saver()
|
saver = tf.train.Saver()
|
||||||
|
|
||||||
# start an Azure ML run
|
# start an Azure ML run
|
||||||
run = Run.get_submitted_run()
|
run = Run.get_context()
|
||||||
|
|
||||||
with tf.Session() as sess:
|
with tf.Session() as sess:
|
||||||
init.run()
|
init.run()
|
||||||
|
|||||||
@@ -18,7 +18,6 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"attachments": {},
|
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
@@ -42,6 +41,28 @@
|
|||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Diagnostics\n",
|
||||||
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"Diagnostics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -336,6 +357,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -222,7 +222,7 @@ with tf.Session(graph=graph, config=config) as session:
|
|||||||
init.run()
|
init.run()
|
||||||
bcast.run()
|
bcast.run()
|
||||||
print('Initialized')
|
print('Initialized')
|
||||||
run = Run.get_submitted_run()
|
run = Run.get_context()
|
||||||
average_loss = 0
|
average_loss = 0
|
||||||
for step in xrange(num_steps):
|
for step in xrange(num_steps):
|
||||||
# simulate various sentence length by randomization
|
# simulate various sentence length by randomization
|
||||||
|
|||||||
@@ -41,6 +41,28 @@
|
|||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Diagnostics\n",
|
||||||
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"Diagnostics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -262,6 +284,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "minxia"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -263,7 +263,7 @@ def main(unused_argv):
|
|||||||
print("After %d training step(s), validation cross entropy = %g" %
|
print("After %d training step(s), validation cross entropy = %g" %
|
||||||
(FLAGS.train_steps, val_xent))
|
(FLAGS.train_steps, val_xent))
|
||||||
if job_name == "worker" and task_index == 0:
|
if job_name == "worker" and task_index == 0:
|
||||||
run = Run.get_submitted_run()
|
run = Run.get_context()
|
||||||
run.log("CrossEntropy", val_xent)
|
run.log("CrossEntropy", val_xent)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -40,6 +40,28 @@
|
|||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Diagnostics\n",
|
||||||
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"Diagnostics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -341,6 +363,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "minxia"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -41,6 +41,28 @@
|
|||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Diagnostics\n",
|
||||||
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"Diagnostics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
|
"set_diagnostics_collection(send_diagnostics = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -481,6 +503,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -220,6 +220,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
52
training/readme.md
Normal file
52
training/readme.md
Normal file
@@ -0,0 +1,52 @@
|
|||||||
|
# Training ML models with Azure ML SDK
|
||||||
|
These notebook tutorials cover the various scenarios for training machine learning and deep learning models with Azure Machine Learning.
|
||||||
|
|
||||||
|
## Sample notebooks
|
||||||
|
- [01.train-hyperparameter-tune-deploy-with-pytorch](./01.train-hyperparameter-tune-deploy-with-pytorch/01.train-hyperparameter-tune-deploy-with-pytorch.ipynb)
|
||||||
|
Train, hyperparameter tune, and deploy a PyTorch image classification model that distinguishes bees vs. ants using transfer learning. Azure ML concepts covered:
|
||||||
|
- Create a remote compute target (Batch AI cluster)
|
||||||
|
- Upload training data using `Datastore`
|
||||||
|
- Run a single-node `PyTorch` training job
|
||||||
|
- Hyperparameter tune model with HyperDrive
|
||||||
|
- Find and register the best model
|
||||||
|
- Deploy model to ACI
|
||||||
|
- [02.distributed-pytorch-with-horovod](./02.distributed-pytorch-with-horovod/02.distributed-pytorch-with-horovod.ipynb)
|
||||||
|
Train a PyTorch model on the MNIST dataset using distributed training with Horovod. Azure ML concepts covered:
|
||||||
|
- Create a remote compute target (Batch AI cluster)
|
||||||
|
- Run a two-node distributed `PyTorch` training job using Horovod
|
||||||
|
- [03.train-hyperparameter-tun-deploy-with-tensorflow](./03.train-hyperparameter-tune-deploy-with-tensorflow/03.train-hyperparameter-tune-deploy-with-tensorflow.ipynb)
|
||||||
|
Train, hyperparameter tune, and deploy a TensorFlow model on the MNIST dataset. Azure ML concepts covered:
|
||||||
|
- Create a remote compute target (Batch AI cluster)
|
||||||
|
- Upload training data using `Datastore`
|
||||||
|
- Run a single-node `TensorFlow` training job
|
||||||
|
- Leverage features of the `Run` object
|
||||||
|
- Download the trained model
|
||||||
|
- Hyperparameter tune model with HyperDrive
|
||||||
|
- Find and register the best model
|
||||||
|
- Deploy model to ACI
|
||||||
|
- [04.distributed-tensorflow-with-horovod](./04.distributed-tensorflow-with-horovod/04.distributed-tensorflow-with-horovod.ipynb)
|
||||||
|
Train a TensorFlow word2vec model using distributed training with Horovod. Azure ML concepts covered:
|
||||||
|
- Create a remote compute target (Batch AI cluster)
|
||||||
|
- Upload training data using `Datastore`
|
||||||
|
- Run a two-node distributed `TensorFlow` training job using Horovod
|
||||||
|
- [05.distributed-tensorflow-with-parameter-server](./05.distributed-tensorflow-with-parameter-server/05.distributed-tensorflow-with-parameter-server.ipynb)
|
||||||
|
Train a TensorFlow model on the MNIST dataset using native distributed TensorFlow (parameter server). Azure ML concepts covered:
|
||||||
|
- Create a remote compute target (Batch AI cluster)
|
||||||
|
- Run a two workers, one parameter server distributed `TensorFlow` training job
|
||||||
|
- [06.distributed-cntk-with-custom-docker](./06.distributed-cntk-with-custom-docker/06.distributed-cntk-with-custom-docker.ipynb)
|
||||||
|
Train a CNTK model on the MNIST dataset using the Azure ML base `Estimator` with custom Docker image and distributed training. Azure ML concepts covered:
|
||||||
|
- Create a remote compute target (Batch AI cluster)
|
||||||
|
- Upload training data using `Datastore`
|
||||||
|
- Run a base `Estimator` training job using a custom Docker image from Docker Hub
|
||||||
|
- Distributed CNTK two-node training job via MPI using base `Estimator`
|
||||||
|
|
||||||
|
- [07.tensorboard](./07.tensorboard/07.tensorboard.ipynb)
|
||||||
|
Train a TensorFlow MNIST model locally, on a DSVM, and on Batch AI and view the logs live on TensorBoard. Azure ML concepts covered:
|
||||||
|
- Run the training job locally with Azure ML and run TensorBoard locally. Start (and stop) an Azure ML `TensorBoard` object to stream and view the logs
|
||||||
|
- Run the training job on a remote DSVM and stream the logs to TensorBoard
|
||||||
|
- Run the training job on a remote Batch AI cluster and stream the logs to TensorBoard
|
||||||
|
- Start a `Tensorboard` instance that displays the logs from all three above runs in one
|
||||||
|
- [08.export-run-history-to-tensorboard](./08.export-run-history-to-tensorboard/08.export-run-history-to-tensorboard.ipynb)
|
||||||
|
- Start an Azure ML `Experiment` and log metrics to `Run` history
|
||||||
|
- Export the `Run` history logs to TensorBoard logs
|
||||||
|
- View the logs in TensorBoard
|
||||||
@@ -57,7 +57,12 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"name": "import",
|
||||||
|
"tags": [
|
||||||
|
"check version"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib inline\n",
|
"%matplotlib inline\n",
|
||||||
@@ -84,7 +89,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"load workspace"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# load workspace configuration from the config.json file in the current folder.\n",
|
"# load workspace configuration from the config.json file in the current folder.\n",
|
||||||
@@ -104,7 +113,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"create experiment"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"experiment_name = 'sklearn-mnist'\n",
|
"experiment_name = 'sklearn-mnist'\n",
|
||||||
@@ -135,35 +148,38 @@
|
|||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget, BatchAiCompute\n",
|
"from azureml.core.compute import BatchAiCompute\n",
|
||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# choose a name for your cluster\n",
|
||||||
"batchai_cluster_name = \"traincluster\"\n",
|
"batchai_cluster_name = os.environ.get(\"BATCHAI_CLUSTER_NAME\", ws.name + \"gpu\")\n",
|
||||||
|
"cluster_min_nodes = os.environ.get(\"BATCHAI_CLUSTER_MIN_NODES\", 1)\n",
|
||||||
|
"cluster_max_nodes = os.environ.get(\"BATCHAI_CLUSTER_MAX_NODES\", 3)\n",
|
||||||
|
"vm_size = os.environ.get(\"BATCHAI_CLUSTER_SKU\", \"STANDARD_NC6\")\n",
|
||||||
|
"autoscale_enabled = os.environ.get(\"BATCHAI_CLUSTER_AUTOSCALE_ENABLED\", True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"\n",
|
||||||
" # look for the existing cluster by name\n",
|
"if batchai_cluster_name in ws.compute_targets():\n",
|
||||||
" compute_target = ComputeTarget(workspace=ws, name=batchai_cluster_name)\n",
|
" compute_target = ws.compute_targets()[batchai_cluster_name]\n",
|
||||||
" if type(compute_target) is BatchAiCompute:\n",
|
" if compute_target and type(compute_target) is BatchAiCompute:\n",
|
||||||
" print('found compute target {}, just use it.'.format(batchai_cluster_name))\n",
|
" print('found compute target. just use it. ' + batchai_cluster_name)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))\n",
|
|
||||||
"except ComputeTargetException:\n",
|
|
||||||
" print('creating a new compute target...')\n",
|
" print('creating a new compute target...')\n",
|
||||||
" compute_config = BatchAiCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\", # small CPU-based VM\n",
|
" provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n",
|
||||||
" #vm_priority='lowpriority', # optional\n",
|
" vm_priority = 'lowpriority', # optional\n",
|
||||||
" autoscale_enabled=True,\n",
|
" autoscale_enabled = autoscale_enabled,\n",
|
||||||
" cluster_min_nodes=0, \n",
|
" cluster_min_nodes = cluster_min_nodes, \n",
|
||||||
" cluster_max_nodes=4)\n",
|
" cluster_max_nodes = cluster_max_nodes)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # create the cluster\n",
|
" # create the cluster\n",
|
||||||
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)\n",
|
" compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
||||||
" # if no min node count is provided it uses the scale settings for the cluster\n",
|
" # if no min node count is provided it will use the scale settings for the cluster\n",
|
||||||
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # Use the 'status' property to get a detailed status for the current cluster. \n",
|
" # For a more detailed view of current BatchAI cluster status, use the 'status' property \n",
|
||||||
" print(compute_target.status.serialize())"
|
" print(compute_target.status.serialize())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -265,7 +281,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"use datastore"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"ds = ws.get_default_datastore()\n",
|
"ds = ws.get_default_datastore()\n",
|
||||||
@@ -394,7 +414,7 @@
|
|||||||
"print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n",
|
"print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# get hold of the current run\n",
|
"# get hold of the current run\n",
|
||||||
"run = Run.get_submitted_run()\n",
|
"run = Run.get_context()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print('Train a logistic regression model with regularizaion rate of', args.reg)\n",
|
"print('Train a logistic regression model with regularizaion rate of', args.reg)\n",
|
||||||
"clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n",
|
"clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n",
|
||||||
@@ -473,7 +493,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"configure estimator"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.estimator import Estimator\n",
|
"from azureml.train.estimator import Estimator\n",
|
||||||
@@ -502,7 +526,13 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"remote run",
|
||||||
|
"batchai",
|
||||||
|
"scikit-learn"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run = exp.submit(config=est)\n",
|
"run = exp.submit(config=est)\n",
|
||||||
@@ -565,7 +595,13 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"remote run",
|
||||||
|
"batchai",
|
||||||
|
"scikit-learn"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output=False) # specify True for a verbose log"
|
"run.wait_for_completion(show_output=False) # specify True for a verbose log"
|
||||||
@@ -609,7 +645,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"query history"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(run.get_file_names())"
|
"print(run.get_file_names())"
|
||||||
@@ -625,7 +665,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"register model from history"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# register model \n",
|
"# register model \n",
|
||||||
@@ -633,27 +677,6 @@
|
|||||||
"print(model.name, model.id, model.version, sep = '\\t')"
|
"print(model.name, model.id, model.version, sep = '\\t')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Clean up resources\n",
|
|
||||||
"\n",
|
|
||||||
"If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n",
|
|
||||||
"\n",
|
|
||||||
"You can also just delete the Azure Managed Compute cluster. But even if you don't delete it, since `autoscale_enabled` is set to `True`, and `cluster_min_nodes` is set to `0`, when the jobs are done, all cluster nodes will be shut down and you will not incur any additional compute charges. "
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# optionally, delete the Azure Managed Compute cluster\n",
|
|
||||||
"compute_target.delete()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -675,6 +698,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -39,7 +39,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"register model from file"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# If you did NOT complete the tutorial, you can instead run this cell \n",
|
"# If you did NOT complete the tutorial, you can instead run this cell \n",
|
||||||
@@ -86,7 +90,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"check version"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%matplotlib inline\n",
|
"%matplotlib inline\n",
|
||||||
@@ -113,7 +121,12 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"load workspace",
|
||||||
|
"download model"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
@@ -298,7 +311,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"set conda dependencies"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
"from azureml.core.conda_dependencies import CondaDependencies \n",
|
||||||
@@ -339,7 +356,12 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"configure web service",
|
||||||
|
"aci"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.webservice import AciWebservice\n",
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
@@ -372,7 +394,14 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"configure image",
|
||||||
|
"create image",
|
||||||
|
"deploy web service",
|
||||||
|
"aci"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"%%time\n",
|
"%%time\n",
|
||||||
@@ -403,7 +432,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"get scoring uri"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"print(service.scoring_uri)"
|
"print(service.scoring_uri)"
|
||||||
@@ -430,7 +463,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"score web service"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import json\n",
|
"import json\n",
|
||||||
@@ -475,7 +512,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"score web service"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"import requests\n",
|
"import requests\n",
|
||||||
@@ -511,7 +552,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"delete web service"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"service.delete()"
|
"service.delete()"
|
||||||
@@ -540,6 +585,11 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
@@ -15,7 +15,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# Tutorial: Train a classification model with automated machine learning\n",
|
"# Tutorial: Train a classification model with automated machine learning\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform data preprocessing, algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n",
|
"In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"[flow diagram](./imgs/flow2.png)\n",
|
"[flow diagram](./imgs/flow2.png)\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -132,13 +132,9 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"digits = datasets.load_digits()\n",
|
"digits = datasets.load_digits()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# only take the first 100 rows if you want the training steps to run faster\n",
|
"# Exclude the first 100 rows from training so that they can be used for test.\n",
|
||||||
"X_digits = digits.data[100:,:]\n",
|
"X_train = digits.data[100:,:]\n",
|
||||||
"y_digits = digits.target[100:]\n",
|
"y_train = digits.target[100:]"
|
||||||
"\n",
|
|
||||||
"# use full dataset\n",
|
|
||||||
"#X_digits = digits.data\n",
|
|
||||||
"#y_digits = digits.target"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -159,13 +155,13 @@
|
|||||||
"count = 0\n",
|
"count = 0\n",
|
||||||
"sample_size = 30\n",
|
"sample_size = 30\n",
|
||||||
"plt.figure(figsize = (16, 6))\n",
|
"plt.figure(figsize = (16, 6))\n",
|
||||||
"for i in np.random.permutation(X_digits.shape[0])[:sample_size]:\n",
|
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
|
||||||
" count = count + 1\n",
|
" count = count + 1\n",
|
||||||
" plt.subplot(1, sample_size, count)\n",
|
" plt.subplot(1, sample_size, count)\n",
|
||||||
" plt.axhline('')\n",
|
" plt.axhline('')\n",
|
||||||
" plt.axvline('')\n",
|
" plt.axvline('')\n",
|
||||||
" plt.text(x = 2, y = -2, s = y_digits[i], fontsize = 18)\n",
|
" plt.text(x = 2, y = -2, s = y_train[i], fontsize = 18)\n",
|
||||||
" plt.imshow(X_digits[i].reshape(8, 8), cmap = plt.cm.Greys)\n",
|
" plt.imshow(X_train[i].reshape(8, 8), cmap = plt.cm.Greys)\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -191,15 +187,18 @@
|
|||||||
"|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n",
|
"|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n",
|
||||||
"|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n",
|
"|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n",
|
||||||
"|**n_cross_validations**|3|Number of cross validation splits|\n",
|
"|**n_cross_validations**|3|Number of cross validation splits|\n",
|
||||||
"|**preprocess**|False| *True/False* Enables experiment to perform preprocessing on the input. Preprocessing handles *missing data*, and performs some common *feature extraction*|\n",
|
"|**exit_score**|0.9985|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n",
|
||||||
"|**exit_score**|0.995|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n",
|
|
||||||
"|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n"
|
"|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"configure automl"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
@@ -210,11 +209,10 @@
|
|||||||
" max_time_sec = 12000,\n",
|
" max_time_sec = 12000,\n",
|
||||||
" iterations = 20,\n",
|
" iterations = 20,\n",
|
||||||
" n_cross_validations = 3,\n",
|
" n_cross_validations = 3,\n",
|
||||||
" preprocess = False,\n",
|
" exit_score = 0.9985,\n",
|
||||||
" exit_score = 0.995,\n",
|
|
||||||
" blacklist_algos = ['kNN','LinearSVM'],\n",
|
" blacklist_algos = ['kNN','LinearSVM'],\n",
|
||||||
" X = X_digits,\n",
|
" X = X_train,\n",
|
||||||
" y = y_digits,\n",
|
" y = y_train,\n",
|
||||||
" path=project_folder)"
|
" path=project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -230,7 +228,12 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"local submitted run",
|
||||||
|
"automl"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
@@ -254,7 +257,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"use notebook widget"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.widgets import RunDetails\n",
|
"from azureml.train.widgets import RunDetails\n",
|
||||||
@@ -273,7 +280,12 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"get metrics",
|
||||||
|
"query history"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"children = list(local_run.get_children())\n",
|
"children = list(local_run.get_children())\n",
|
||||||
@@ -300,7 +312,12 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"query history",
|
||||||
|
"register model from history"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# find the run with the highest accuracy value.\n",
|
"# find the run with the highest accuracy value.\n",
|
||||||
@@ -332,8 +349,10 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# find 30 random samples from test set\n",
|
"# find 30 random samples from test set\n",
|
||||||
"n = 30\n",
|
"n = 30\n",
|
||||||
"sample_indices = np.random.permutation(X_digits.shape[0])[0:n]\n",
|
"X_test = digits.data[:100, :]\n",
|
||||||
"test_samples = X_digits[sample_indices]\n",
|
"y_test = digits.target[:100]\n",
|
||||||
|
"sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n",
|
||||||
|
"test_samples = X_test[sample_indices]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# predict using the model\n",
|
"# predict using the model\n",
|
||||||
@@ -349,11 +368,11 @@
|
|||||||
" plt.axvline('')\n",
|
" plt.axvline('')\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # use different color for misclassified sample\n",
|
" # use different color for misclassified sample\n",
|
||||||
" font_color = 'red' if y_digits[s] != result[i] else 'black'\n",
|
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
|
||||||
" clr_map = plt.cm.gray if y_digits[s] != result[i] else plt.cm.Greys\n",
|
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
|
||||||
" \n",
|
" \n",
|
||||||
" plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n",
|
" plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n",
|
||||||
" plt.imshow(X_digits[s].reshape(8, 8), cmap = clr_map)\n",
|
" plt.imshow(X_test[s].reshape(8, 8), cmap = clr_map)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" i = i + 1\n",
|
" i = i + 1\n",
|
||||||
"plt.show()"
|
"plt.show()"
|
||||||
@@ -374,11 +393,16 @@
|
|||||||
"> * Review training results\n",
|
"> * Review training results\n",
|
||||||
"> * Register the best model\n",
|
"> * Register the best model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Learn more about [how to configure settings for automatic training]() or [how to use automatic training on a remote resource]()."
|
"Learn more about [how to configure settings for automatic training](https://aka.ms/aml-how-configure-auto) or [how to use automatic training on a remote resource](https://aka.ms/aml-how-to-auto-remote)."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "jeffshep"
|
||||||
|
}
|
||||||
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
|
|||||||
Reference in New Issue
Block a user