{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Automated Machine Learning\n", "_**Remote Execution using AmlCompute**_\n", "\n", "## Contents\n", "1. [Introduction](#Introduction)\n", "1. [Setup](#Setup)\n", "1. [Data](#Data)\n", "1. [Train](#Train)\n", "1. [Results](#Results)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "In this example we use the scikit-learn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use AutoML for a simple classification problem.\n", "\n", "Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n", "\n", "In this notebook you would see\n", "1. Create an `Experiment` in an existing `Workspace`.\n", "2. Create or Attach existing AmlCompute to a workspace.\n", "3. Configure AutoML using `AutoMLConfig`.\n", "4. Train the model using AmlCompute with ONNX compatible config on.\n", "5. Explore the results and save the ONNX model.\n", "6. Inference with the ONNX model.\n", "\n", "In addition this notebook showcases the following features\n", "- **Parallel** executions for iterations\n", "- **Asynchronous** tracking of progress\n", "- **Cancellation** of individual iterations or the entire run\n", "- Retrieving models for any iteration or logged metric\n", "- Specifying AutoML settings as `**kwargs`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import logging\n", "import os\n", "\n", "import pandas as pd\n", "from sklearn import datasets\n", "from sklearn.model_selection import train_test_split\n", "\n", "import azureml.core\n", "from azureml.core.experiment import Experiment\n", "from azureml.core.workspace import Workspace\n", "from azureml.core.dataset import Dataset\n", "from azureml.train.automl import AutoMLConfig" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ws = Workspace.from_config()\n", "\n", "# Choose a name for the run history container in the workspace.\n", "experiment_name = 'automl-remote-amlcompute-with-onnx'\n", "project_folder = './project'\n", "\n", "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", "output['Project Directory'] = project_folder\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "outputDf = pd.DataFrame(data = output, index = [''])\n", "outputDf.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create or Attach existing AmlCompute\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n", "\n", "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", "\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import ComputeTarget\n", "\n", "# Choose a name for your cluster.\n", "amlcompute_cluster_name = \"automlc2\"\n", "\n", "found = False\n", "# Check if this compute target already exists in the workspace.\n", "cts = ws.compute_targets\n", "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " found = True\n", " print('Found existing compute target.')\n", " compute_target = cts[amlcompute_cluster_name]\n", "\n", "if not found:\n", " print('Creating a new compute target...')\n", " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", " #vm_priority = 'lowpriority', # optional\n", " max_nodes = 6)\n", "\n", " # Create the cluster.\\n\",\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", "\n", "print('Checking cluster status...')\n", "# Can poll for a minimum number of nodes and for a specific timeout.\n", "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", "\n", "# For a more detailed view of current AmlCompute status, use get_status()." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "For remote executions, you need to make the data accessible from the remote compute.\n", "This can be done by uploading the data to DataStore.\n", "In this example, we upload scikit-learn's [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "iris = datasets.load_iris()\n", "\n", "if not os.path.isdir('data'):\n", " os.mkdir('data')\n", "\n", "if not os.path.exists(project_folder):\n", " os.makedirs(project_folder)\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(iris.data, \n", " iris.target, \n", " test_size=0.2, \n", " random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ensure the x_train and x_test are pandas DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Convert the X_train and X_test to pandas DataFrame and set column names,\n", "# This is needed for initializing the input variable names of ONNX model, \n", "# and the prediction with the ONNX model using the inference helper.\n", "X_train = pd.DataFrame(X_train, columns=['c1', 'c2', 'c3', 'c4'])\n", "X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])\n", "y_train = pd.DataFrame(y_train, columns=['label'])\n", "\n", "X_train.to_csv(\"data/X_train.csv\", index=False)\n", "y_train.to_csv(\"data/y_train.csv\", index=False)\n", "\n", "ds = ws.get_default_datastore()\n", "ds.upload(src_dir='./data', target_path='irisdata', overwrite=True, show_progress=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core.runconfig import RunConfiguration\n", "from azureml.core.conda_dependencies import CondaDependencies\n", "import pkg_resources\n", "\n", "# create a new RunConfig object\n", "conda_run_config = RunConfiguration(framework=\"python\")\n", "\n", "# Set compute target to AmlCompute\n", "conda_run_config.target = compute_target\n", "conda_run_config.environment.docker.enabled = True\n", "\n", "cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n", "conda_run_config.environment.python.conda_dependencies = cd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating a TabularDataset\n", "\n", "Defined X and y as `TabularDataset`s, which are passed to automated machine learning in the AutoMLConfig." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X = Dataset.Tabular.from_delimited_files(path=ds.path('irisdata/X_train.csv'))\n", "y = Dataset.Tabular.from_delimited_files(path=ds.path('irisdata/y_train.csv'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train\n", "\n", "You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n", "\n", "**Note:** Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n", "\n", "**Note:** When using AmlCompute, you can't pass Numpy arrays directly to the fit method.\n", "\n", "|Property|Description|\n", "|-|-|\n", "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
average_precision_score_weighted
norm_macro_recall
precision_score_weighted|\n", "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", "|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n", "|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set the preprocess=True, currently the InferenceHelper only supports this mode." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "automl_settings = {\n", " \"iteration_timeout_minutes\": 10,\n", " \"iterations\": 10,\n", " \"n_cross_validations\": 5,\n", " \"primary_metric\": 'AUC_weighted',\n", " \"preprocess\": True,\n", " \"max_concurrent_iterations\": 5,\n", " \"verbosity\": logging.INFO\n", "}\n", "\n", "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", " path = project_folder,\n", " run_configuration=conda_run_config,\n", " X = X,\n", " y = y,\n", " enable_onnx_compatible_models=True, # This will generate ONNX compatible models.\n", " **automl_settings\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n", "In this example, we specify `show_output = False` to suppress console output while the run is in progress." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "remote_run = experiment.submit(automl_config, show_output = False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "remote_run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results\n", "\n", "#### Loading executed runs\n", "In case you need to load a previously executed run, enable the cell below and replace the `run_id` value." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "remote_run = AutoMLRun(experiment = experiment, run_id = 'AutoML_5db13491-c92a-4f1d-b622-8ab8d973a058')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Widget for Monitoring Runs\n", "\n", "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", "\n", "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`\n", "\n", "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "remote_run" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.widgets import RunDetails\n", "RunDetails(remote_run).show() " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Wait until the run finishes.\n", "remote_run.wait_for_completion(show_output = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cancelling Runs\n", "\n", "You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Cancel the ongoing experiment and stop scheduling new iterations.\n", "# remote_run.cancel()\n", "\n", "# Cancel iteration 1 and move onto iteration 2.\n", "# remote_run.cancel_iteration(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieve the Best ONNX Model\n", "\n", "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*.\n", "\n", "Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "best_run, onnx_mdl = remote_run.get_output(return_onnx_model=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save the best ONNX model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.automl.core.onnx_convert import OnnxConverter\n", "onnx_fl_path = \"./best_model.onnx\"\n", "OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predict with the ONNX model, using onnxruntime package" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import json\n", "from azureml.automl.core.onnx_convert import OnnxConvertConstants\n", "from azureml.train.automl import constants\n", "\n", "if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n", " python_version_compatible = True\n", "else:\n", " python_version_compatible = False\n", "\n", "try:\n", " import onnxruntime\n", " from azureml.automl.core.onnx_convert import OnnxInferenceHelper \n", " onnxrt_present = True\n", "except ImportError:\n", " onnxrt_present = False\n", "\n", "def get_onnx_res(run):\n", " res_path = 'onnx_resource.json'\n", " run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n", " with open(res_path) as f:\n", " return json.load(f)\n", "\n", "if onnxrt_present and python_version_compatible: \n", " mdl_bytes = onnx_mdl.SerializeToString()\n", " onnx_res = get_onnx_res(best_run)\n", "\n", " onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n", " pred_onnx, pred_prob_onnx = onnxrt_helper.predict(X_test)\n", "\n", " print(pred_onnx)\n", " print(pred_prob_onnx)\n", "else:\n", " if not python_version_compatible:\n", " print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n", " if not onnxrt_present:\n", " print('Please install the onnxruntime package to do the prediction with ONNX model.')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "authors": [ { "name": "savitam" } ], "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }