Update 12/2

2025-12-22 02:25:12 -05:00 · 2018-12-02 14:46:52 -05:00
parent 74309f91f7
commit 13a5d0baac
26 changed files with 87295 additions and 42 deletions
--- a/how-to-use-azureml/automated-machine-learning/classification_with_tensorflow/auto-ml-classification_with_tensorflow.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/classification_with_tensorflow/auto-ml-classification_with_tensorflow.ipynb
@@ -0,0 +1,390 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Copyright (c) Microsoft Corporation. All rights reserved.\n",
+    "\n",
+    "Licensed under the MIT License."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Automated Machine Learning: Classification with Local Compute with Tesnorflow DNNClassifier and LinearClassifier using whitelist models\n",
+    "\n",
+    "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
+    "\n",
+    "Make sure you have executed the [configuration](../configuration.ipynb) before running this notebook.\n",
+    "This notebooks shows how can automl can be trained on a a selected list of models,see the readme.md for the models.\n",
+    "This trains the model exclusively on tensorflow based models.\n",
+    "\n",
+    "In this notebook you will learn how to:\n",
+    "1. Create an `Experiment` in an existing `Workspace`.\n",
+    "2. Configure AutoML using `AutoMLConfig`.\n",
+    "3. Train the model on a whilelisted models using local compute. \n",
+    "4. Explore the results.\n",
+    "5. Test the best fitted model.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create an Experiment\n",
+    "\n",
+    "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import logging\n",
+    "import os\n",
+    "import random\n",
+    "\n",
+    "from matplotlib import pyplot as plt\n",
+    "from matplotlib.pyplot import imshow\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "import azureml.core\n",
+    "from azureml.core.experiment import Experiment\n",
+    "from azureml.core.workspace import Workspace\n",
+    "from azureml.train.automl import AutoMLConfig\n",
+    "from azureml.train.automl.run import AutoMLRun"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ws = Workspace.from_config()\n",
+    "\n",
+    "# Choose a name for the experiment and specify the project folder.\n",
+    "experiment_name = 'automl-local-classification'\n",
+    "project_folder = './sample_projects/automl-local-classification'\n",
+    "\n",
+    "experiment = Experiment(ws, experiment_name)\n",
+    "\n",
+    "output = {}\n",
+    "output['SDK version'] = azureml.core.VERSION\n",
+    "output['Subscription ID'] = ws.subscription_id\n",
+    "output['Workspace Name'] = ws.name\n",
+    "output['Resource Group'] = ws.resource_group\n",
+    "output['Location'] = ws.location\n",
+    "output['Project Directory'] = project_folder\n",
+    "output['Experiment Name'] = experiment.name\n",
+    "pd.set_option('display.max_colwidth', -1)\n",
+    "pd.DataFrame(data = output, index = ['']).T"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Diagnostics\n",
+    "\n",
+    "Opt-in diagnostics for better experience, quality, and security of future releases."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azureml.telemetry import set_diagnostics_collection\n",
+    "set_diagnostics_collection(send_diagnostics = True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load Training Data\n",
+    "\n",
+    "This uses scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn import datasets\n",
+    "\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# Exclude the first 100 rows from training so that they can be used for test.\n",
+    "X_train = digits.data[100:,:]\n",
+    "y_train = digits.target[100:]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Configure AutoML\n",
+    "\n",
+    "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
+    "\n",
+    "|Property|Description|\n",
+    "|-|-|\n",
+    "|**task**|classification or regression|\n",
+    "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i>|\n",
+    "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
+    "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
+    "|**n_cross_validations**|Number of cross validation splits.|\n",
+    "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
+    "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n",
+    "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "automl_config = AutoMLConfig(task = 'classification',\n",
+    "                             debug_log = 'automl_errors.log',\n",
+    "                             primary_metric = 'AUC_weighted',\n",
+    "                             iteration_timeout_minutes = 60,\n",
+    "                             iterations = 10,\n",
+    "                             n_cross_validations = 3,\n",
+    "                             verbosity = logging.INFO,\n",
+    "                             X = X_train, \n",
+    "                             y = y_train,\n",
+    "                             enable_tf=True,\n",
+    "                             whitelist_models=[\"TensorFlowLinearClassifier\", \"TensorFlowDNN\"],\n",
+    "                             path = project_folder)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train the Models\n",
+    "\n",
+    "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
+    "In this example, we specify `show_output = True` to print currently running iterations to the console."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "local_run = experiment.submit(automl_config, show_output = True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "local_run\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Explore the Results"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Widget for Monitoring Runs\n",
+    "\n",
+    "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
+    "\n",
+    "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azureml.widgets import RunDetails\n",
+    "RunDetails(local_run).show() "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "#### Retrieve All Child Runs\n",
+    "You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "children = list(local_run.get_children())\n",
+    "metricslist = {}\n",
+    "for run in children:\n",
+    "    properties = run.get_properties()\n",
+    "    metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
+    "    metricslist[int(properties['iteration'])] = metrics\n",
+    "\n",
+    "rundata = pd.DataFrame(metricslist).sort_index(1)\n",
+    "rundata"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Retrieve the Best Model\n",
+    "\n",
+    "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing.  Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "best_run, fitted_model = local_run.get_output()\n",
+    "print(best_run)\n",
+    "print(fitted_model)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Best Model Based on Any Other Metric\n",
+    "Show the run and the model that has the smallest `log_loss` value:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lookup_metric = \"log_loss\"\n",
+    "best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
+    "print(best_run)\n",
+    "print(fitted_model)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Model from a Specific Iteration\n",
+    "Show the run and the model from the third iteration:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "iteration = 3\n",
+    "third_run, third_model = local_run.get_output(iteration = iteration)\n",
+    "print(third_run)\n",
+    "print(third_model)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Test the Best Fitted Model\n",
+    "\n",
+    "#### Load Test Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "digits = datasets.load_digits()\n",
+    "X_test = digits.data[:10, :]\n",
+    "y_test = digits.target[:10]\n",
+    "images = digits.images[:10]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Testing Our Best Fitted Model\n",
+    "We will try to predict 2 digits and see how our model works."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Randomly select digits and test.\n",
+    "for index in np.random.choice(len(y_test), 2, replace = False):\n",
+    "    print(index)\n",
+    "    predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
+    "    label = y_test[index]\n",
+    "    title = \"Label value = %d  Predicted value = %d \" % (label, predicted)\n",
+    "    fig = plt.figure(1, figsize = (3,3))\n",
+    "    ax1 = fig.add_axes((0,0,.8,.8))\n",
+    "    ax1.set_title(title)\n",
+    "    plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
+    "    plt.show()"
+   ]
+  }
+ ],
+ "metadata": {
+  "authors": [
+   {
+    "name": "savitam"
+   }
+  ],
+  "kernelspec": {
+   "display_name": "Python 3.6",
+   "language": "python",
+   "name": "python36"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}