Files
MachineLearningNotebooks/automl/11.auto-ml-sample-weight.ipynb
2018-09-23 21:56:44 -04:00

252 lines
8.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoML 11: Sample weight\n",
"\n",
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use sample weight with the AutoML Classifier.\n",
"Sample weight is used where some sample values are more important than others.\n",
"\n",
"Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. How to specifying sample_weight\n",
"2. The difference that it makes to test results\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import random\n",
"\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib.pyplot import imshow\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import datasets\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.train.automl.run import AutoMLRun"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'non_sample_weight_experiment'\n",
"sample_weight_experiment_name = 'sample_weight_experiment'\n",
"\n",
"# project folder\n",
"project_folder = './sample_projects/automl-local-classification'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"pd.DataFrame(data = output, index = ['']).T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"\n",
"Opt-in diagnostics for better experience, quality, and security of future releases"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiate Auto ML Config\n",
"\n",
"Instantiate two AutoMLConfig Objects. One will be used with sample_weight and one without."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[100:,:]\n",
"y_digits = digits.target[100:]\n",
"\n",
"# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.\n",
"# This makes the model more likely to classify as 4 if the image it not clear.\n",
"sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_digits])\n",
"\n",
"automl_classifier = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" max_time_sec = 3600,\n",
" iterations = 10,\n",
" n_cross_validations = 2,\n",
" verbosity = logging.INFO,\n",
" X = X_digits, \n",
" y = y_digits,\n",
" path=project_folder)\n",
"\n",
"automl_sample_weight = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" max_time_sec = 3600,\n",
" iterations = 10,\n",
" n_cross_validations = 2,\n",
" verbosity = logging.INFO,\n",
" X = X_digits, \n",
" y = y_digits,\n",
" sample_weight = sample_weight,\n",
" path=project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Models\n",
"\n",
"Call the submit method on the experiment and pass the configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
"You will see the currently running iterations printing to the console."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_classifier, show_output=True)\n",
"sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output=True)\n",
"\n",
"best_run, fitted_model = local_run.get_output()\n",
"best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Models\n",
"\n",
"#### Load Test Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"digits = datasets.load_digits()\n",
"X_digits = digits.data[:100, :]\n",
"y_digits = digits.target[:100]\n",
"images = digits.images[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Compare the pipelines\n",
"The prediction from the sample weight model is more likely to correctly predict 4's. However, it is also more likely to predict 4 for some images that are not labelled as 4."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Randomly select digits and test\n",
"for index in range(0,len(y_digits)):\n",
" predicted = fitted_model.predict(X_digits[index:index + 1])[0]\n",
" predicted_sample_weight = fitted_model_sample_weight.predict(X_digits[index:index + 1])[0]\n",
" label = y_digits[index]\n",
" if predicted == 4 or predicted_sample_weight == 4 or label == 4:\n",
" title = \"Label value = %d Predicted value = %d Prediced with sample weight = %d\" % ( label,predicted,predicted_sample_weight)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap=plt.cm.gray_r, interpolation='nearest')\n",
" plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}