MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.ipynb

{
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3.6",
      "name": "python36",
      "language": "python"
    },
    "authors": [
      {
        "name": "savitam"
      }
    ],
    "language_info": {
      "mimetype": "text/x-python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "pygments_lexer": "ipython3",
      "name": "python",
      "file_extension": ".py",
      "nbconvert_exporter": "python",
      "version": "3.6.6"
    }
  },
  "nbformat": 4,
  "cells": [
    {
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.png)"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Automated Machine Learning\n",
        "_**Classification with Local Compute**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Data](#Data)\n",
        "1. [Train](#Train)\n",
        "1. [Results](#Results)\n",
        "1. [Test](#Test)\n",
        "\n"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "## Introduction\n",
        "\n",
        "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
        "\n",
        "Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
        "\n",
        "Please find the ONNX related documentations [here](https://github.com/onnx/onnx).\n",
        "\n",
        "In this notebook you will learn how to:\n",
        "1. Create an `Experiment` in an existing `Workspace`.\n",
        "2. Configure AutoML using `AutoMLConfig`.\n",
        "3. Train the model using local compute with ONNX compatible config on.\n",
        "4. Explore the results and save the ONNX model."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "## Setup\n",
        "\n",
        "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "import logging\n",
        "\n",
        "from matplotlib import pyplot as plt\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "from sklearn import datasets\n",
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "from azureml.train.automl import AutoMLConfig, constants"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# Choose a name for the experiment and specify the project folder.\n",
        "experiment_name = 'automl-classification-onnx'\n",
        "project_folder = './sample_projects/automl-classification-onnx'\n",
        "\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output['SDK version'] = azureml.core.VERSION\n",
        "output['Subscription ID'] = ws.subscription_id\n",
        "output['Workspace Name'] = ws.name\n",
        "output['Resource Group'] = ws.resource_group\n",
        "output['Location'] = ws.location\n",
        "output['Project Directory'] = project_folder\n",
        "output['Experiment Name'] = experiment.name\n",
        "pd.set_option('display.max_colwidth', -1)\n",
        "outputDf = pd.DataFrame(data = output, index = [''])\n",
        "outputDf.T"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "source": [
        "## Data\n",
        "\n",
        "This uses scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) method."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "iris = datasets.load_iris()\n",
        "X_train, X_test, y_train, y_test = train_test_split(iris.data, \n",
        "                                                    iris.target, \n",
        "                                                    test_size=0.2, \n",
        "                                                    random_state=0)\n",
        "\n",
        "\n"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "source": [
        "### Ensure the x_train and x_test are pandas DataFrame."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
        "# This is needed for initializing the input variable names of ONNX model, \n",
        "# and the prediction with the ONNX model using the inference helper.\n",
        "X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n",
        "X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "source": [
        "## Train with enable ONNX compatible models config on\n",
        "\n",
        "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
        "\n",
        "Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|classification or regression|\n",
        "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
        "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
        "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
        "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
        "|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
        "|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|\n",
        "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "### Set the preprocess=True,  currently the InferenceHelper only supports this mode."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "automl_config = AutoMLConfig(task = 'classification',\n",
        "                             debug_log = 'automl_errors.log',\n",
        "                             primary_metric = 'AUC_weighted',\n",
        "                             iteration_timeout_minutes = 60,\n",
        "                             iterations = 10,\n",
        "                             verbosity = logging.INFO,                             \n",
        "                             X = X_train, \n",
        "                             y = y_train,\n",
        "                             preprocess=True,\n",
        "                             enable_onnx_compatible_models=True,\n",
        "                             path = project_folder)"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "source": [
        "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
        "In this example, we specify `show_output = True` to print currently running iterations to the console."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "local_run = experiment.submit(automl_config, show_output = True)"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "local_run"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "source": [
        "## Results"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "#### Widget for Monitoring Runs\n",
        "\n",
        "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
        "\n",
        "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "from azureml.widgets import RunDetails\n",
        "RunDetails(local_run).show() "
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "source": [
        "### Retrieve the Best ONNX Model\n",
        "\n",
        "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing.  Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n",
        "\n",
        "Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "source": [
        "### Save the best ONNX model"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "from azureml.automl.core.onnx_convert import OnnxConverter\n",
        "onnx_fl_path = \"./best_model.onnx\"\n",
        "OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "source": [
        "### Predict with the ONNX model, using onnxruntime package"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [
        "import sys\n",
        "import json\n",
        "from azureml.automl.core.onnx_convert import OnnxConvertConstants\n",
        "\n",
        "if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n",
        "    python_version_compatible = True\n",
        "else:\n",
        "    python_version_compatible = False\n",
        "\n",
        "try:\n",
        "    import onnxruntime\n",
        "    from azureml.automl.core.onnx_convert import OnnxInferenceHelper    \n",
        "    onnxrt_present = True\n",
        "except ImportError:\n",
        "    onnxrt_present = False\n",
        "\n",
        "def get_onnx_res(run):\n",
        "    res_path = 'onnx_resource.json'\n",
        "    run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
        "    with open(res_path) as f:\n",
        "        onnx_res = json.load(f)\n",
        "    return onnx_res\n",
        "\n",
        "if onnxrt_present and python_version_compatible:    \n",
        "    mdl_bytes = onnx_mdl.SerializeToString()\n",
        "    onnx_res = get_onnx_res(best_run)\n",
        "\n",
        "    onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n",
        "    pred_onnx, pred_prob_onnx = onnxrt_helper.predict(X_test)\n",
        "\n",
        "    print(pred_onnx)\n",
        "    print(pred_prob_onnx)\n",
        "else:\n",
        "    if not python_version_compatible:\n",
        "        print('Please use Python version 3.6 or 3.7 to run the inference helper.')    \n",
        "    if not onnxrt_present:\n",
        "        print('Please install the onnxruntime package to do the prediction with ONNX model.')"
      ],
      "cell_type": "code"
    },
    {
      "metadata": {},
      "outputs": [],
      "execution_count": null,
      "source": [],
      "cell_type": "code"
    }
  ],
  "nbformat_minor": 2
}