{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Automated Machine Learning\n", "_**Classification with Local Compute**_\n", "\n", "## Contents\n", "1. [Introduction](#Introduction)\n", "1. [Setup](#Setup)\n", "1. [Data](#Data)\n", "1. [Train](#Train)\n", "1. [Results](#Results)\n", "1. [Test](#Test)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n", "\n", "Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n", "\n", "Please find the ONNX related documentations [here](https://github.com/onnx/onnx).\n", "\n", "In this notebook you will learn how to:\n", "1. Create an `Experiment` in an existing `Workspace`.\n", "2. Configure AutoML using `AutoMLConfig`.\n", "3. Train the model using local compute with ONNX compatible config on.\n", "4. Explore the results and save the ONNX model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import logging\n", "\n", "from matplotlib import pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn import datasets\n", "from sklearn.model_selection import train_test_split\n", "\n", "import azureml.core\n", "from azureml.core.experiment import Experiment\n", "from azureml.core.workspace import Workspace\n", "from azureml.train.automl import AutoMLConfig, constants" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ws = Workspace.from_config()\n", "\n", "# Choose a name for the experiment and specify the project folder.\n", "experiment_name = 'automl-classification-onnx'\n", "project_folder = './sample_projects/automl-classification-onnx'\n", "\n", "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", "output['Project Directory'] = project_folder\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "outputDf = pd.DataFrame(data = output, index = [''])\n", "outputDf.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "\n", "This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "iris = datasets.load_iris()\n", "X_train, X_test, y_train, y_test = train_test_split(iris.data, \n", " iris.target, \n", " test_size=0.2, \n", " random_state=0)\n", "\n", "# Convert the X_train and X_test to pandas DataFrame and set column names,\n", "# This is needed for initializing the input variable names of ONNX model, \n", "# and the prediction with the ONNX model using the inference helper.\n", "X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n", "X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train with enable ONNX compatible models config on\n", "\n", "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n", "\n", "Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n", "\n", "|Property|Description|\n", "|-|-|\n", "|**task**|classification or regression|\n", "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
average_precision_score_weighted
norm_macro_recall
precision_score_weighted|\n", "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", "|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n", "|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n", "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", " primary_metric = 'AUC_weighted',\n", " iteration_timeout_minutes = 60,\n", " iterations = 10,\n", " verbosity = logging.INFO, \n", " X = X_train, \n", " y = y_train,\n", " preprocess=True,\n", " enable_onnx_compatible_models=True,\n", " path = project_folder)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", "In this example, we specify `show_output = True` to print currently running iterations to the console." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "local_run = experiment.submit(automl_config, show_output = True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "local_run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Widget for Monitoring Runs\n", "\n", "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", "\n", "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.widgets import RunDetails\n", "RunDetails(local_run).show() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieve the Best ONNX Model\n", "\n", "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n", "\n", "Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save the best ONNX model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.automl.core.onnx_convert import OnnxConverter\n", "onnx_fl_path = \"./best_model.onnx\"\n", "OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predict with the ONNX model, using onnxruntime package" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import json\n", "from azureml.automl.core.onnx_convert import OnnxConvertConstants\n", "\n", "if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n", " python_version_compatible = True\n", "else:\n", " python_version_compatible = False\n", "\n", "try:\n", " import onnxruntime\n", " from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n", " onnxrt_present = True\n", "except ImportError:\n", " onnxrt_present = False\n", "\n", "def get_onnx_res(run):\n", " res_path = '_debug_y_trans_converter.json'\n", " run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n", " with open(res_path) as f:\n", " onnx_res = json.load(f)\n", " return onnx_res\n", "\n", "if onnxrt_present and python_version_compatible: \n", " mdl_bytes = onnx_mdl.SerializeToString()\n", " onnx_res = get_onnx_res(best_run)\n", "\n", " onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n", " pred_onnx, pred_prob_onnx = onnxrt_helper.predict(X_test)\n", "\n", " print(pred_onnx)\n", " print(pred_prob_onnx)\n", "else:\n", " if not python_version_compatible:\n", " print('Please use Python version 3.6 to run the inference helper.') \n", " if not onnxrt_present:\n", " print('Please install the onnxruntime package to do the prediction with ONNX model.')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "authors": [ { "name": "savitam" } ], "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }