Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/fairness/upload-fairness-dashboard.png)

# Upload a Fairness Dashboard to Azure Machine Learning Studio
**This notebook shows how to generate and upload a fairness assessment dashboard from Fairlearn to AzureML Studio**

## Table of Contents

1. [Introduction](#Introduction)
1. [Loading the Data](#LoadingData)
1. [Processing the Data](#ProcessingData)
1. [Training Models](#TrainingModels)
1. [Logging in to AzureML](#LoginAzureML)
1. [Registering the Models](#RegisterModels)
1. [Using the Fairlearn Dashboard](#LocalDashboard)
1. [Uploading a Fairness Dashboard to Azure](#AzureUpload)
    1. Computing Fairness Metrics
    1. Uploading to Azure
1. [Conclusion](#Conclusion)
    

<a id="Introduction"></a>
## Introduction

In this notebook, we walk through a simple example of using the `azureml-contrib-fairness` package to upload a collection of fairness statistics for a fairness dashboard. It is an example of integrating the [open source Fairlearn package](https://www.github.com/fairlearn/fairlearn) with Azure Machine Learning. This is not an example of fairness analysis or mitigation - this notebook simply shows how to get a fairness dashboard into the Azure Machine Learning portal. We will load the data and train a couple of simple models. We will then use Fairlearn to generate data for a Fairness dashboard, which we can upload to Azure Machine Learning portal and view there.

### Setup

To use this notebook, an Azure Machine Learning workspace is required.
Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.
This notebook also requires the following packages:
* `azureml-contrib-fairness`
* `fairlearn==0.4.6`
* `joblib`
* `shap`

Fairlearn relies on features introduced in v0.22.1 of `scikit-learn`. If you have an older version already installed, please uncomment and run the following cell:

In [None]:
# !pip install --upgrade scikit-learn>=0.22.1

<a id="LoadingData"></a>
## Loading the Data
We use the well-known `adult` census dataset, which we load using `shap` (for convenience). We start with a fairly unremarkable set of imports:

In [None]:
from sklearn import svm
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
import pandas as pd
import shap

Now we can load the data:

In [None]:
X_raw, Y = shap.datasets.adult()

We can take a look at some of the data. For example, the next cells shows the counts of the different races identified in the dataset:

In [None]:
print(X_raw["Race"].value_counts().to_dict())

<a id="ProcessingData"></a>
## Processing the Data

With the data loaded, we process it for our needs. First, we extract the sensitive features of interest into `A` (conventionally used in the literature) and put the rest of the feature data into `X`:

In [None]:
A = X_raw[['Sex','Race']]
X = X_raw.drop(labels=['Sex', 'Race'],axis = 1)
X = pd.get_dummies(X)

Next, we apply a standard set of scalings:

In [None]:
sc = StandardScaler()
X_scaled = sc.fit_transform(X)
X_scaled = pd.DataFrame(X_scaled, columns=X.columns)

le = LabelEncoder()
Y = le.fit_transform(Y)

Finally, we can then split our data into training and test sets, and also make the labels on our test portion of `A` human-readable:

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_scaled, 
                                                    Y, 
                                                    A,
                                                    test_size = 0.2,
                                                    random_state=0,
                                                    stratify=Y)

# Work around indexing issue
X_train = X_train.reset_index(drop=True)
A_train = A_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
A_test = A_test.reset_index(drop=True)

# Improve labels
A_test.Sex.loc[(A_test['Sex'] == 0)] = 'female'
A_test.Sex.loc[(A_test['Sex'] == 1)] = 'male'


A_test.Race.loc[(A_test['Race'] == 0)] = 'Amer-Indian-Eskimo'
A_test.Race.loc[(A_test['Race'] == 1)] = 'Asian-Pac-Islander'
A_test.Race.loc[(A_test['Race'] == 2)] = 'Black'
A_test.Race.loc[(A_test['Race'] == 3)] = 'Other'
A_test.Race.loc[(A_test['Race'] == 4)] = 'White'

<a id="TrainingModels"></a>
## Training Models

We now train a couple of different models on our data. The `adult` census dataset is a classification problem - the goal is to predict whether a particular individual exceeds an income threshold. For the purpose of generating a dashboard to upload, it is sufficient to train two basic classifiers. First, a logistic regression classifier:

In [None]:
lr_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)

lr_predictor.fit(X_train, Y_train)

And for comparison, a support vector classifier:

In [None]:
svm_predictor = svm.SVC()

svm_predictor.fit(X_train, Y_train)

<a id="LoginAzureML"></a>
## Logging in to AzureML

With our two classifiers trained, we can log into our AzureML workspace:

In [None]:
from azureml.core import Workspace, Experiment, Model

ws = Workspace.from_config()
ws.get_details()

<a id="RegisterModels"></a>
## Registering the Models

Next, we register our models. By default, the subroutine which uploads the models checks that the names provided correspond to registered models in the workspace. We define a utility routine to do the registering:

In [None]:
import joblib
import os

os.makedirs('models', exist_ok=True)
def register_model(name, model):
    print("Registering ", name)
    model_path = "models/{0}.pkl".format(name)
    joblib.dump(value=model, filename=model_path)
    registered_model = Model.register(model_path=model_path,
                                      model_name=name,
                                      workspace=ws)
    print("Registered ", registered_model.id)
    return registered_model.id

Now, we register the models. For convenience in subsequent method calls, we store the results in a dictionary, which maps the `id` of the registered model (a string in `name:version` format) to the predictor itself:

In [None]:
model_dict = {}

lr_reg_id = register_model("fairness_linear_regression", lr_predictor)
model_dict[lr_reg_id] = lr_predictor
svm_reg_id = register_model("fairness_svm", svm_predictor)
model_dict[svm_reg_id] = svm_predictor

<a id="LocalDashboard"></a>
## Using the Fairlearn Dashboard

We can now examine the fairness of the two models we have training, both as a function of race and (binary) sex. Before uploading the dashboard to the AzureML portal, we will first instantiate a local instance of the Fairlearn dashboard.

Regardless of the viewing location, the dashboard is based on three things - the true values, the model predictions and the sensitive feature values. The dashboard can use predictions from multiple models and multiple sensitive features if desired (as we are doing here).

Our first step is to generate a dictionary mapping the `id` of the registered model to the corresponding array of predictions:

In [None]:
ys_pred = {}
for n, p in model_dict.items():
    ys_pred[n] = p.predict(X_test)

We can examine these predictions in a locally invoked Fairlearn dashboard. This can be compared to the dashboard uploaded to the portal (in the next section):

In [None]:
from fairlearn.widget import FairlearnDashboard

FairlearnDashboard(sensitive_features=A_test, 
                   sensitive_feature_names=['Sex', 'Race'],
                   y_true=Y_test.tolist(),
                   y_pred=ys_pred)

<a id="AzureUpload"></a>
## Uploading a Fairness Dashboard to Azure

Uploading a fairness dashboard to Azure is a two stage process. The `FairlearnDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. The required stages are therefore:
1. Precompute all the required metrics
1. Upload to Azure


### Computing Fairness Metrics
We use Fairlearn to create a dictionary which contains all the data required to display a dashboard. This includes both the raw data (true values, predicted values and sensitive features), and also the fairness metrics. The API is similar to that used to invoke the Dashboard locally. However, there are a few minor changes to the API, and the type of problem being examined (binary classification, regression etc.) needs to be specified explicitly:

In [None]:
sf = { 'Race': A_test.Race, 'Sex': A_test.Sex }

from fairlearn.metrics._group_metric_set import _create_group_metric_set

dash_dict = _create_group_metric_set(y_true=Y_test,
                                     predictions=ys_pred,
                                     sensitive_features=sf,
                                     prediction_type='binary_classification')

The `_create_group_metric_set()` method is currently underscored since its exact design is not yet final in Fairlearn.

### Uploading to Azure

We can now import the `azureml.contrib.fairness` package itself. We will round-trip the data, so there are two required subroutines:

In [None]:
from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id

Finally, we can upload the generated dictionary to AzureML. The upload method requires a run, so we first create an experiment and a run. The uploaded dashboard can be seen on the corresponding Run Details page in AzureML Studio. For completeness, we also download the dashboard dictionary which we uploaded.

In [None]:
exp = Experiment(ws, "notebook-01")
print(exp)

run = exp.start_logging()
try:
    dashboard_title = "Sample notebook upload"
    upload_id = upload_dashboard_dictionary(run,
                                            dash_dict,
                                            dashboard_name=dashboard_title)
    print("\nUploaded to id: {0}\n".format(upload_id))

    downloaded_dict = download_dashboard_by_upload_id(run, upload_id)
finally:
    run.complete()

Finally, we can verify that the dashboard dictionary which we downloaded matches our upload:

In [None]:
print(dash_dict == downloaded_dict)

<a id="Conclusion"></a>
## Conclusion

In this notebook we have demonstrated how to generate and upload a fairness dashboard to AzureML Studio. We have not discussed how to analyse the results and apply mitigations. Those topics will be covered elsewhere.