394 lines
13 KiB
Plaintext
394 lines
13 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Tutorial: Automatically train a classification model with Azure Automated Machine Learning\n",
|
|
"\n",
|
|
"In this tutorial, you'll learn how to automatically generate a machine learning model. This model can then be deployed following the workflow in the [Deploy a model](tutorial-deploy-models-with-aml.md) tutorial.\n",
|
|
"\n",
|
|
"[flow diagram](imgs/nn.png)\n",
|
|
"\n",
|
|
"Similar to the [train models tutorial](tutorial-train-models-with-aml.md), this tutorial classifies handwritten images of digits (0-9) from the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset.\n",
|
|
"\n",
|
|
"You'll learn how to:\n",
|
|
"\n",
|
|
"> * Set up your development environment\n",
|
|
"> * Access and examine the data\n",
|
|
"> * Train using an automated classifier locally with custom parameters\n",
|
|
"> * Explore the results\n",
|
|
"> * Review training results\n",
|
|
"> * Register the best model\n",
|
|
"\n",
|
|
"## Prerequisites\n",
|
|
"\n",
|
|
"Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to: \n",
|
|
"* Create a workspace and its configuration file (**config.json**) \n",
|
|
"* Upload your **config.json** to the same folder as this notebook"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Start a notebook\n",
|
|
"\n",
|
|
"To follow along, start a new notebook from the same directory as **config.json** and copy the code from the sections below.\n",
|
|
"\n",
|
|
"\n",
|
|
"## Set up your development environment\n",
|
|
"\n",
|
|
"All the setup for your development work can be accomplished in the Python notebook. Setup includes:\n",
|
|
"\n",
|
|
"* Import Python packages\n",
|
|
"* Configure a workspace to enable communication between your local computer and remote resources\n",
|
|
"* Create a directory to store training scripts\n",
|
|
"\n",
|
|
"### Import packages\n",
|
|
"Import Python packages you need in this tutorial."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import azureml.core\n",
|
|
"import pandas as pd\n",
|
|
"from azureml.core.workspace import Workspace\n",
|
|
"from azureml.train.automl.run import AutoMLRun\n",
|
|
"import time\n",
|
|
"import logging\n",
|
|
"from sklearn import datasets\n",
|
|
"from matplotlib import pyplot as plt\n",
|
|
"from matplotlib.pyplot import imshow\n",
|
|
"import random\n",
|
|
"import numpy as np"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Configure workspace\n",
|
|
"\n",
|
|
"Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **aml_config/config.json** and loads the details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial.\n",
|
|
"\n",
|
|
"Once you have a workspace object, specify a name for the experiment and create and register a local directory with the workspace. The history of all runs is recorded under the specified experiment."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ws = Workspace.from_config()\n",
|
|
"# choose a name for the run history container in the workspace\n",
|
|
"experiment_name = 'automl-classifier'\n",
|
|
"# project folder\n",
|
|
"project_folder = './automl-classifier'\n",
|
|
"\n",
|
|
"import os\n",
|
|
"\n",
|
|
"output = {}\n",
|
|
"output['SDK version'] = azureml.core.VERSION\n",
|
|
"output['Subscription ID'] = ws.subscription_id\n",
|
|
"output['Workspace'] = ws.name\n",
|
|
"output['Resource Group'] = ws.resource_group\n",
|
|
"output['Location'] = ws.location\n",
|
|
"output['Project Directory'] = project_folder\n",
|
|
"pd.set_option('display.max_colwidth', -1)\n",
|
|
"pd.DataFrame(data=output, index=['']).T"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Explore data\n",
|
|
"\n",
|
|
"The initial training tutorial used a high-resolution version of the MNIST dataset (28x28 pixels). Since auto training requires many iterations, this tutorial uses a smaller resolution version of the images (8x8 pixels) to demonstrate the concepts while speeding up the time needed for each iteration."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from sklearn import datasets\n",
|
|
"\n",
|
|
"digits = datasets.load_digits()\n",
|
|
"\n",
|
|
"# only take the first 100 rows if you want the training steps to run faster\n",
|
|
"X_digits = digits.data[100:,:]\n",
|
|
"y_digits = digits.target[100:]\n",
|
|
"\n",
|
|
"# use full dataset\n",
|
|
"#X_digits = digits.data\n",
|
|
"#y_digits = digits.target"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Display some sample images\n",
|
|
"\n",
|
|
"Load the data into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"count = 0\n",
|
|
"sample_size = 30\n",
|
|
"plt.figure(figsize = (16, 6))\n",
|
|
"for i in np.random.permutation(X_digits.shape[0])[:sample_size]:\n",
|
|
" count = count + 1\n",
|
|
" plt.subplot(1, sample_size, count)\n",
|
|
" plt.axhline('')\n",
|
|
" plt.axvline('')\n",
|
|
" plt.text(x = 2, y = -2, s = y_digits[i], fontsize = 18)\n",
|
|
" plt.imshow(X_digits[i].reshape(8, 8), cmap = plt.cm.Greys)\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You now have the necessary packages and data ready for auto training for your model. \n",
|
|
"\n",
|
|
"## Auto train a model \n",
|
|
"\n",
|
|
"To auto train a model, first define settings for autogeneration and tuning and then run the automatic classifier.\n",
|
|
"\n",
|
|
"\n",
|
|
"### Define settings for autogeneration and tuning\n",
|
|
"\n",
|
|
"Define the experiment parameters and models settings for autogeneration and tuning. \n",
|
|
"\n",
|
|
"\n",
|
|
"|Property| Value in this tutorial |Description|\n",
|
|
"|----|----|---|\n",
|
|
"|**primary_metric**|AUC Weighted | Metric that you want to optimize.|\n",
|
|
"|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n",
|
|
"|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n",
|
|
"|**n_cross_validations**|5|Number of cross validation splits|\n",
|
|
"|**preprocess**|True| *True/False* Enables experiment to perform preprocessing on the input. Preprocessing handles *missing data*, and performs some common *feature extraction*|\n",
|
|
"|**exit_score**|0.994|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n",
|
|
"|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n",
|
|
"|**concurrent_iterations**|5|Max number of iterations that would be executed in parallel. This number should be less than the number of cores on the DSVM. Used in remote training.|"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.train.automl import AutoMLConfig\n",
|
|
"\n",
|
|
"##Local compute \n",
|
|
"Automl_config = AutoMLConfig(task = 'classification',\n",
|
|
" primary_metric = 'AUC_weighted',\n",
|
|
" max_time_sec = 12000,\n",
|
|
" iterations = 20,\n",
|
|
" n_cross_validations = 3,\n",
|
|
" blacklist_algos = ['kNN','LinearSVM'],\n",
|
|
" X = X_digits,\n",
|
|
" y = y_digits,\n",
|
|
" path=project_folder)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Run the automatic classifier\n",
|
|
"\n",
|
|
"Start the experiment to run locally. Define the compute target as local and set the output to true to view progress on the experiment."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.experiment import Experiment\n",
|
|
"experiment=Experiment(ws, experiment_name)\n",
|
|
"local_run = experiment.submit(Automl_config, show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Explore the results\n",
|
|
"\n",
|
|
"Explore the results of automatic training with a Jupyter widget or by examining the experiment history.\n",
|
|
"\n",
|
|
"### Jupyter widget\n",
|
|
"\n",
|
|
"Use the Jupyter notebook widget to see a graph and a table of all results."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.train.widgets import RunDetails\n",
|
|
"RunDetails(local_run).show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Retrieve all iterations\n",
|
|
"\n",
|
|
"View the experiment history and see individual metrics for each iteration run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"children = list(local_run.get_children())\n",
|
|
"metricslist = {}\n",
|
|
"for run in children:\n",
|
|
" properties = run.get_properties()\n",
|
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n",
|
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
|
"\n",
|
|
"import pandas as pd\n",
|
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
|
"rundata"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# find the run with the highest accuracy value.\n",
|
|
"best_run, fitted_model = local_run.get_output()\n",
|
|
"\n",
|
|
"# register model in workspace\n",
|
|
"description = 'Automated Machine Learning Model'\n",
|
|
"tags = None\n",
|
|
"local_run.register_model(description=description, tags=tags)\n",
|
|
"local_run.model_id # Use this id to deploy the model as a web service in Azure"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Test the best model\n",
|
|
"\n",
|
|
"Use the model to predict a few random digits. Display the predicted and the image. Red font and inverse image (white on black) is used to highlight the misclassified samples.\n",
|
|
"\n",
|
|
"Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# find 30 random samples from test set\n",
|
|
"n = 30\n",
|
|
"sample_indices = np.random.permutation(X_digits.shape[0])[0:n]\n",
|
|
"test_samples = X_digits[sample_indices]\n",
|
|
"\n",
|
|
"\n",
|
|
"# predict using the model\n",
|
|
"result = fitted_model.predict(test_samples)\n",
|
|
"\n",
|
|
"# compare actual value vs. the predicted values:\n",
|
|
"i = 0\n",
|
|
"plt.figure(figsize = (20, 1))\n",
|
|
"\n",
|
|
"for s in sample_indices:\n",
|
|
" plt.subplot(1, n, i + 1)\n",
|
|
" plt.axhline('')\n",
|
|
" plt.axvline('')\n",
|
|
" \n",
|
|
" # use different color for misclassified sample\n",
|
|
" font_color = 'red' if y_digits[s] != result[i] else 'black'\n",
|
|
" clr_map = plt.cm.gray if y_digits[s] != result[i] else plt.cm.Greys\n",
|
|
" \n",
|
|
" plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n",
|
|
" plt.imshow(X_digits[s].reshape(8, 8), cmap = clr_map)\n",
|
|
" \n",
|
|
" i = i + 1\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next steps\n",
|
|
"\n",
|
|
"In this Azure Machine Learning tutorial, you used Python to:\n",
|
|
"\n",
|
|
"> * Set up your development environment\n",
|
|
"> * Access and examine the data\n",
|
|
"> * Train using an automated classifier locally with custom parameters\n",
|
|
"> * Explore the results\n",
|
|
"> * Review training results\n",
|
|
"> * Register the best model\n",
|
|
"\n",
|
|
"Learn more about [how to configure settings for automatic training]() or [how to use automatic training on a remote resource]()."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python [default]",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.6"
|
|
},
|
|
"msauthor": "sgilley"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|