MachineLearningNotebooks/contrib/fairness/upload-fairness-dashboard.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/fairness/upload-fairness-dashboard.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Upload a Fairness Dashboard to Azure Machine Learning Studio\n",
        "**This notebook shows how to generate and upload a fairness assessment dashboard from Fairlearn to AzureML Studio**\n",
        "\n",
        "## Table of Contents\n",
        "\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Loading the Data](#LoadingData)\n",
        "1. [Processing the Data](#ProcessingData)\n",
        "1. [Training Models](#TrainingModels)\n",
        "1. [Logging in to AzureML](#LoginAzureML)\n",
        "1. [Registering the Models](#RegisterModels)\n",
        "1. [Using the Fairlearn Dashboard](#LocalDashboard)\n",
        "1. [Uploading a Fairness Dashboard to Azure](#AzureUpload)\n",
        "    1. Computing Fairness Metrics\n",
        "    1. Uploading to Azure\n",
        "1. [Conclusion](#Conclusion)\n",
        "    \n",
        "\n",
        "<a id=\"Introduction\"></a>\n",
        "## Introduction\n",
        "\n",
        "In this notebook, we walk through a simple example of using the `azureml-contrib-fairness` package to upload a collection of fairness statistics for a fairness dashboard. It is an example of integrating the [open source Fairlearn package](https://www.github.com/fairlearn/fairlearn) with Azure Machine Learning. This is not an example of fairness analysis or mitigation - this notebook simply shows how to get a fairness dashboard into the Azure Machine Learning portal. We will load the data and train a couple of simple models. We will then use Fairlearn to generate data for a Fairness dashboard, which we can upload to Azure Machine Learning portal and view there.\n",
        "\n",
        "### Setup\n",
        "\n",
        "To use this notebook, an Azure Machine Learning workspace is required.\n",
        "Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
        "This notebook also requires the following packages:\n",
        "* `azureml-contrib-fairness`\n",
        "* `fairlearn==0.4.6`\n",
        "* `joblib`\n",
        "* `shap`\n",
        "\n",
        "Fairlearn relies on features introduced in v0.22.1 of `scikit-learn`. If you have an older version already installed, please uncomment and run the following cell:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# !pip install --upgrade scikit-learn>=0.22.1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"LoadingData\"></a>\n",
        "## Loading the Data\n",
        "We use the well-known `adult` census dataset, which we load using `shap` (for convenience). We start with a fairly unremarkable set of imports:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from sklearn import svm\n",
        "from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
        "from sklearn.linear_model import LogisticRegression\n",
        "import pandas as pd\n",
        "import shap"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we can load the data:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "X_raw, Y = shap.datasets.adult()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can take a look at some of the data. For example, the next cells shows the counts of the different races identified in the dataset:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(X_raw[\"Race\"].value_counts().to_dict())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"ProcessingData\"></a>\n",
        "## Processing the Data\n",
        "\n",
        "With the data loaded, we process it for our needs. First, we extract the sensitive features of interest into `A` (conventionally used in the literature) and put the rest of the feature data into `X`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "A = X_raw[['Sex','Race']]\n",
        "X = X_raw.drop(labels=['Sex', 'Race'],axis = 1)\n",
        "X = pd.get_dummies(X)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Next, we apply a standard set of scalings:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "sc = StandardScaler()\n",
        "X_scaled = sc.fit_transform(X)\n",
        "X_scaled = pd.DataFrame(X_scaled, columns=X.columns)\n",
        "\n",
        "le = LabelEncoder()\n",
        "Y = le.fit_transform(Y)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can then split our data into training and test sets, and also make the labels on our test portion of `A` human-readable:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from sklearn.model_selection import train_test_split\n",
        "X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_scaled, \n",
        "                                                    Y, \n",
        "                                                    A,\n",
        "                                                    test_size = 0.2,\n",
        "                                                    random_state=0,\n",
        "                                                    stratify=Y)\n",
        "\n",
        "# Work around indexing issue\n",
        "X_train = X_train.reset_index(drop=True)\n",
        "A_train = A_train.reset_index(drop=True)\n",
        "X_test = X_test.reset_index(drop=True)\n",
        "A_test = A_test.reset_index(drop=True)\n",
        "\n",
        "# Improve labels\n",
        "A_test.Sex.loc[(A_test['Sex'] == 0)] = 'female'\n",
        "A_test.Sex.loc[(A_test['Sex'] == 1)] = 'male'\n",
        "\n",
        "\n",
        "A_test.Race.loc[(A_test['Race'] == 0)] = 'Amer-Indian-Eskimo'\n",
        "A_test.Race.loc[(A_test['Race'] == 1)] = 'Asian-Pac-Islander'\n",
        "A_test.Race.loc[(A_test['Race'] == 2)] = 'Black'\n",
        "A_test.Race.loc[(A_test['Race'] == 3)] = 'Other'\n",
        "A_test.Race.loc[(A_test['Race'] == 4)] = 'White'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"TrainingModels\"></a>\n",
        "## Training Models\n",
        "\n",
        "We now train a couple of different models on our data. The `adult` census dataset is a classification problem - the goal is to predict whether a particular individual exceeds an income threshold. For the purpose of generating a dashboard to upload, it is sufficient to train two basic classifiers. First, a logistic regression classifier:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "lr_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
        "\n",
        "lr_predictor.fit(X_train, Y_train)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "And for comparison, a support vector classifier:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "svm_predictor = svm.SVC()\n",
        "\n",
        "svm_predictor.fit(X_train, Y_train)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"LoginAzureML\"></a>\n",
        "## Logging in to AzureML\n",
        "\n",
        "With our two classifiers trained, we can log into our AzureML workspace:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace, Experiment, Model\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "ws.get_details()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"RegisterModels\"></a>\n",
        "## Registering the Models\n",
        "\n",
        "Next, we register our models. By default, the subroutine which uploads the models checks that the names provided correspond to registered models in the workspace. We define a utility routine to do the registering:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import joblib\n",
        "import os\n",
        "\n",
        "os.makedirs('models', exist_ok=True)\n",
        "def register_model(name, model):\n",
        "    print(\"Registering \", name)\n",
        "    model_path = \"models/{0}.pkl\".format(name)\n",
        "    joblib.dump(value=model, filename=model_path)\n",
        "    registered_model = Model.register(model_path=model_path,\n",
        "                                      model_name=name,\n",
        "                                      workspace=ws)\n",
        "    print(\"Registered \", registered_model.id)\n",
        "    return registered_model.id"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we register the models. For convenience in subsequent method calls, we store the results in a dictionary, which maps the `id` of the registered model (a string in `name:version` format) to the predictor itself:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model_dict = {}\n",
        "\n",
        "lr_reg_id = register_model(\"fairness_linear_regression\", lr_predictor)\n",
        "model_dict[lr_reg_id] = lr_predictor\n",
        "svm_reg_id = register_model(\"fairness_svm\", svm_predictor)\n",
        "model_dict[svm_reg_id] = svm_predictor"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"LocalDashboard\"></a>\n",
        "## Using the Fairlearn Dashboard\n",
        "\n",
        "We can now examine the fairness of the two models we have training, both as a function of race and (binary) sex. Before uploading the dashboard to the AzureML portal, we will first instantiate a local instance of the Fairlearn dashboard.\n",
        "\n",
        "Regardless of the viewing location, the dashboard is based on three things - the true values, the model predictions and the sensitive feature values. The dashboard can use predictions from multiple models and multiple sensitive features if desired (as we are doing here).\n",
        "\n",
        "Our first step is to generate a dictionary mapping the `id` of the registered model to the corresponding array of predictions:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ys_pred = {}\n",
        "for n, p in model_dict.items():\n",
        "    ys_pred[n] = p.predict(X_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can examine these predictions in a locally invoked Fairlearn dashboard. This can be compared to the dashboard uploaded to the portal (in the next section):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from fairlearn.widget import FairlearnDashboard\n",
        "\n",
        "FairlearnDashboard(sensitive_features=A_test, \n",
        "                   sensitive_feature_names=['Sex', 'Race'],\n",
        "                   y_true=Y_test.tolist(),\n",
        "                   y_pred=ys_pred)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"AzureUpload\"></a>\n",
        "## Uploading a Fairness Dashboard to Azure\n",
        "\n",
        "Uploading a fairness dashboard to Azure is a two stage process. The `FairlearnDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. The required stages are therefore:\n",
        "1. Precompute all the required metrics\n",
        "1. Upload to Azure\n",
        "\n",
        "\n",
        "### Computing Fairness Metrics\n",
        "We use Fairlearn to create a dictionary which contains all the data required to display a dashboard. This includes both the raw data (true values, predicted values and sensitive features), and also the fairness metrics. The API is similar to that used to invoke the Dashboard locally. However, there are a few minor changes to the API, and the type of problem being examined (binary classification, regression etc.) needs to be specified explicitly:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "sf = { 'Race': A_test.Race, 'Sex': A_test.Sex }\n",
        "\n",
        "from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
        "\n",
        "dash_dict = _create_group_metric_set(y_true=Y_test,\n",
        "                                     predictions=ys_pred,\n",
        "                                     sensitive_features=sf,\n",
        "                                     prediction_type='binary_classification')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The `_create_group_metric_set()` method is currently underscored since its exact design is not yet final in Fairlearn."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Uploading to Azure\n",
        "\n",
        "We can now import the `azureml.contrib.fairness` package itself. We will round-trip the data, so there are two required subroutines:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can upload the generated dictionary to AzureML. The upload method requires a run, so we first create an experiment and a run. The uploaded dashboard can be seen on the corresponding Run Details page in AzureML Studio. For completeness, we also download the dashboard dictionary which we uploaded."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "exp = Experiment(ws, \"notebook-01\")\n",
        "print(exp)\n",
        "\n",
        "run = exp.start_logging()\n",
        "try:\n",
        "    dashboard_title = \"Sample notebook upload\"\n",
        "    upload_id = upload_dashboard_dictionary(run,\n",
        "                                            dash_dict,\n",
        "                                            dashboard_name=dashboard_title)\n",
        "    print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
        "\n",
        "    downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
        "finally:\n",
        "    run.complete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can verify that the dashboard dictionary which we downloaded matches our upload:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(dash_dict == downloaded_dict)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"Conclusion\"></a>\n",
        "## Conclusion\n",
        "\n",
        "In this notebook we have demonstrated how to generate and upload a fairness dashboard to AzureML Studio. We have not discussed how to analyse the results and apply mitigations. Those topics will be covered elsewhere."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "riedgar"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}