mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 09:37:04 -05:00
375 lines
13 KiB
Plaintext
375 lines
13 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Automated Machine Learning\n",
|
|
"_**Classification with Local Compute**_\n",
|
|
"\n",
|
|
"## Contents\n",
|
|
"1. [Introduction](#Introduction)\n",
|
|
"1. [Setup](#Setup)\n",
|
|
"1. [Data](#Data)\n",
|
|
"1. [Train](#Train)\n",
|
|
"1. [Results](#Results)\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Introduction\n",
|
|
"\n",
|
|
"In this example we use the scikit-learn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use AutoML for a simple classification problem.\n",
|
|
"\n",
|
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
|
"\n",
|
|
"Please find the ONNX related documentations [here](https://github.com/onnx/onnx).\n",
|
|
"\n",
|
|
"In this notebook you will learn how to:\n",
|
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
|
"3. Train the model using local compute with ONNX compatible config on.\n",
|
|
"4. Explore the results and save the ONNX model.\n",
|
|
"5. Inference with the ONNX model."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import logging\n",
|
|
"\n",
|
|
"from matplotlib import pyplot as plt\n",
|
|
"import numpy as np\n",
|
|
"import pandas as pd\n",
|
|
"from sklearn import datasets\n",
|
|
"from sklearn.model_selection import train_test_split\n",
|
|
"\n",
|
|
"import azureml.core\n",
|
|
"from azureml.core.experiment import Experiment\n",
|
|
"from azureml.core.workspace import Workspace\n",
|
|
"from azureml.train.automl import AutoMLConfig, constants"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ws = Workspace.from_config()\n",
|
|
"\n",
|
|
"# Choose a name for the experiment.\n",
|
|
"experiment_name = 'automl-classification-onnx'\n",
|
|
"\n",
|
|
"experiment = Experiment(ws, experiment_name)\n",
|
|
"\n",
|
|
"output = {}\n",
|
|
"output['SDK version'] = azureml.core.VERSION\n",
|
|
"output['Subscription ID'] = ws.subscription_id\n",
|
|
"output['Workspace Name'] = ws.name\n",
|
|
"output['Resource Group'] = ws.resource_group\n",
|
|
"output['Location'] = ws.location\n",
|
|
"output['Experiment Name'] = experiment.name\n",
|
|
"pd.set_option('display.max_colwidth', -1)\n",
|
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
|
"outputDf.T"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Data\n",
|
|
"\n",
|
|
"This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"iris = datasets.load_iris()\n",
|
|
"X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
|
|
" iris.target, \n",
|
|
" test_size=0.2, \n",
|
|
" random_state=0)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Ensure the x_train and x_test are pandas DataFrame."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
|
"# This is needed for initializing the input variable names of ONNX model, \n",
|
|
"# and the prediction with the ONNX model using the inference helper.\n",
|
|
"X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
|
|
"X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train\n",
|
|
"\n",
|
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
|
"\n",
|
|
"**Note:** Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n",
|
|
"\n",
|
|
"|Property|Description|\n",
|
|
"|-|-|\n",
|
|
"|**task**|classification or regression|\n",
|
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
|
"|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Set the preprocess=True, currently the InferenceHelper only supports this mode."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
|
" debug_log = 'automl_errors.log',\n",
|
|
" primary_metric = 'AUC_weighted',\n",
|
|
" iteration_timeout_minutes = 60,\n",
|
|
" iterations = 10,\n",
|
|
" verbosity = logging.INFO, \n",
|
|
" X = X_train, \n",
|
|
" y = y_train,\n",
|
|
" preprocess=True,\n",
|
|
" enable_onnx_compatible_models=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"local_run"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Results"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Widget for Monitoring Runs\n",
|
|
"\n",
|
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
|
"\n",
|
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"RunDetails(local_run).show() "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Retrieve the Best ONNX Model\n",
|
|
"\n",
|
|
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
|
|
"\n",
|
|
"Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Save the best ONNX model"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
|
|
"onnx_fl_path = \"./best_model.onnx\"\n",
|
|
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Predict with the ONNX model, using onnxruntime package"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import sys\n",
|
|
"import json\n",
|
|
"from azureml.automl.core.onnx_convert import OnnxConvertConstants\n",
|
|
"\n",
|
|
"if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n",
|
|
" python_version_compatible = True\n",
|
|
"else:\n",
|
|
" python_version_compatible = False\n",
|
|
"\n",
|
|
"try:\n",
|
|
" import onnxruntime\n",
|
|
" from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n",
|
|
" onnxrt_present = True\n",
|
|
"except ImportError:\n",
|
|
" onnxrt_present = False\n",
|
|
"\n",
|
|
"def get_onnx_res(run):\n",
|
|
" res_path = 'onnx_resource.json'\n",
|
|
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
|
|
" with open(res_path) as f:\n",
|
|
" onnx_res = json.load(f)\n",
|
|
" return onnx_res\n",
|
|
"\n",
|
|
"if onnxrt_present and python_version_compatible: \n",
|
|
" mdl_bytes = onnx_mdl.SerializeToString()\n",
|
|
" onnx_res = get_onnx_res(best_run)\n",
|
|
"\n",
|
|
" onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n",
|
|
" pred_onnx, pred_prob_onnx = onnxrt_helper.predict(X_test)\n",
|
|
"\n",
|
|
" print(pred_onnx)\n",
|
|
" print(pred_prob_onnx)\n",
|
|
"else:\n",
|
|
" if not python_version_compatible:\n",
|
|
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
|
" if not onnxrt_present:\n",
|
|
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "savitam"
|
|
}
|
|
],
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |