Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/sample-weight/auto-ml-sample-weight.png)

# Automated Machine Learning
_**Sample Weight**_

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Train](#Train)
1. [Test](#Test)


## Introduction
In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use sample weight with AutoML. Sample weight is used where some sample values are more important than others.

Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.

In this notebook you will learn how to configure AutoML to use `sample_weight` and you will see the difference sample weight makes to the test results.

## Setup

As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments.

In [None]:
import logging

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig

In [None]:
ws = Workspace.from_config()

# Choose names for the regular and the sample weight experiments.
experiment_name = 'non_sample_weight_experiment'
sample_weight_experiment_name = 'sample_weight_experiment'

project_folder = './sample_projects/sample_weight'

experiment = Experiment(ws, experiment_name)
sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)

output = {}
output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace Name'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Project Directory'] = project_folder
output['Experiment Name'] = experiment.name
pd.set_option('display.max_colwidth', -1)
outputDf = pd.DataFrame(data = output, index = [''])
outputDf.T

## Train

Instantiate two `AutoMLConfig` objects. One will be used with `sample_weight` and one without.

In [None]:
digits = datasets.load_digits()
X_train = digits.data[100:,:]
y_train = digits.target[100:]

# The example makes the sample weight 0.9 for the digit 4 and 0.1 for all other digits.
# This makes the model more likely to classify as 4 if the image it not clear.
sample_weight = np.array([(0.9 if x == 4 else 0.01) for x in y_train])

automl_classifier = AutoMLConfig(task = 'classification',
                                 debug_log = 'automl_errors.log',
                                 primary_metric = 'AUC_weighted',
                                 iteration_timeout_minutes = 60,
                                 iterations = 10,
                                 n_cross_validations = 2,
                                 verbosity = logging.INFO,
                                 X = X_train, 
                                 y = y_train,
                                 path = project_folder)

automl_sample_weight = AutoMLConfig(task = 'classification',
                                    debug_log = 'automl_errors.log',
                                    primary_metric = 'AUC_weighted',
                                    iteration_timeout_minutes = 60,
                                    iterations = 10,
                                    n_cross_validations = 2,
                                    verbosity = logging.INFO,
                                    X = X_train, 
                                    y = y_train,
                                    sample_weight = sample_weight,
                                    path = project_folder)

Call the `submit` method on the experiment objects and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.
In this example, we specify `show_output = True` to print currently running iterations to the console.

In [None]:
local_run = experiment.submit(automl_classifier, show_output = True)
sample_weight_run = sample_weight_experiment.submit(automl_sample_weight, show_output = True)

best_run, fitted_model = local_run.get_output()
best_run_sample_weight, fitted_model_sample_weight = sample_weight_run.get_output()

## Test

#### Load Test Data

In [None]:
digits = datasets.load_digits()
X_test = digits.data[:100, :]
y_test = digits.target[:100]
images = digits.images[:100]

#### Compare the Models
The prediction from the sample weight model is more likely to correctly predict 4's.  However, it is also more likely to predict 4 for some images that are not labelled as 4.

In [None]:
# Randomly select digits and test.
for index in range(0,len(y_test)):
    predicted = fitted_model.predict(X_test[index:index + 1])[0]
    predicted_sample_weight = fitted_model_sample_weight.predict(X_test[index:index + 1])[0]
    label = y_test[index]
    if predicted == 4 or predicted_sample_weight == 4 or label == 4:
        title = "Label value = %d  Predicted value = %d Prediced with sample weight = %d" % (label, predicted, predicted_sample_weight)
        fig = plt.figure(1, figsize=(3,3))
        ax1 = fig.add_axes((0,0,.8,.8))
        ax1.set_title(title)
        plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')
        plt.show()