MachineLearningNotebooks/contrib/fairness/upload-fairness-dashboard.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/fairness/upload-fairness-dashboard.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Upload a Fairness Dashboard to Azure Machine Learning Studio\n",
        "**This notebook shows how to generate and upload a fairness assessment dashboard from Fairlearn to AzureML Studio**\n",
        "\n",
        "## Table of Contents\n",
        "\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Loading the Data](#LoadingData)\n",
        "1. [Processing the Data](#ProcessingData)\n",
        "1. [Training Models](#TrainingModels)\n",
        "1. [Logging in to AzureML](#LoginAzureML)\n",
        "1. [Registering the Models](#RegisterModels)\n",
        "1. [Using the Fairness Dashboard](#LocalDashboard)\n",
        "1. [Uploading a Fairness Dashboard to Azure](#AzureUpload)\n",
        "    1. Computing Fairness Metrics\n",
        "    1. Uploading to Azure\n",
        "1. [Conclusion](#Conclusion)\n",
        "    \n",
        "\n",
        "<a id=\"Introduction\"></a>\n",
        "## Introduction\n",
        "\n",
        "In this notebook, we walk through a simple example of using the `azureml-contrib-fairness` package to upload a collection of fairness statistics for a fairness dashboard. It is an example of integrating the [open source Fairlearn package](https://www.github.com/fairlearn/fairlearn) with Azure Machine Learning. This is not an example of fairness analysis or mitigation - this notebook simply shows how to get a fairness dashboard into the Azure Machine Learning portal. We will load the data and train a couple of simple models. We will then use Fairlearn to generate data for a Fairness dashboard, which we can upload to Azure Machine Learning portal and view there.\n",
        "\n",
        "### Setup\n",
        "\n",
        "To use this notebook, an Azure Machine Learning workspace is required.\n",
        "Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
        "This notebook also requires the following packages:\n",
        "* `azureml-contrib-fairness`\n",
        "* `fairlearn>=0.6.2` (also works for pre-v0.5.0 with slight modifications)\n",
        "* `joblib`\n",
        "* `liac-arff`\n",
        "* `raiwidgets`\n",
        "\n",
        "Fairlearn relies on features introduced in v0.22.1 of `scikit-learn`. If you have an older version already installed, please uncomment and run the following cell:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# !pip install --upgrade scikit-learn>=0.22.1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, please ensure that when you downloaded this notebook, you also downloaded the `fairness_nb_utils.py` file from the same location, and placed it in the same directory as this notebook."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"LoadingData\"></a>\n",
        "## Loading the Data\n",
        "We use the well-known `adult` census dataset, which we fetch from the OpenML website. We start with a fairly unremarkable set of imports:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from sklearn import svm\n",
        "from sklearn.compose import ColumnTransformer\n",
        "from sklearn.impute import SimpleImputer\n",
        "from sklearn.linear_model import LogisticRegression\n",
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
        "from sklearn.compose import make_column_selector as selector\n",
        "from sklearn.pipeline import Pipeline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we can load the data:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from fairness_nb_utils import fetch_census_dataset\n",
        "\n",
        "data = fetch_census_dataset()\n",
        "    \n",
        "# Extract the items we want\n",
        "X_raw = data.data\n",
        "y = (data.target == '>50K') * 1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can take a look at some of the data. For example, the next cells shows the counts of the different races identified in the dataset:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(X_raw[\"race\"].value_counts().to_dict())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"ProcessingData\"></a>\n",
        "## Processing the Data\n",
        "\n",
        "With the data loaded, we process it for our needs. First, we extract the sensitive features of interest into `A` (conventionally used in the literature) and leave the rest of the feature data in `X_raw`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "A = X_raw[['sex','race']]\n",
        "X_raw = X_raw.drop(labels=['sex', 'race'],axis = 1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now preprocess our data. To avoid the problem of data leakage, we split our data into training and test sets before performing any other transformations. Subsequent transformations (such as scalings) will be fit to the training data set, and then applied to the test dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "(X_train, X_test, y_train, y_test, A_train, A_test) = train_test_split(\n",
        "    X_raw, y, A, test_size=0.3, random_state=12345, stratify=y\n",
        ")\n",
        "\n",
        "# Ensure indices are aligned between X, y and A,\n",
        "# after all the slicing and splitting of DataFrames\n",
        "# and Series\n",
        "\n",
        "X_train = X_train.reset_index(drop=True)\n",
        "X_test = X_test.reset_index(drop=True)\n",
        "y_train = y_train.reset_index(drop=True)\n",
        "y_test = y_test.reset_index(drop=True)\n",
        "A_train = A_train.reset_index(drop=True)\n",
        "A_test = A_test.reset_index(drop=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We have two types of column in the dataset - categorical columns which will need to be one-hot encoded, and numeric ones which will need to be rescaled. We also need to take care of missing values. We use a simple approach here, but please bear in mind that this is another way that bias could be introduced (especially if one subgroup tends to have more missing values).\n",
        "\n",
        "For this preprocessing, we make use of `Pipeline` objects from `sklearn`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "numeric_transformer = Pipeline(\n",
        "    steps=[\n",
        "        (\"impute\", SimpleImputer()),\n",
        "        (\"scaler\", StandardScaler()),\n",
        "    ]\n",
        ")\n",
        "\n",
        "categorical_transformer = Pipeline(\n",
        "    [\n",
        "        (\"impute\", SimpleImputer(strategy=\"most_frequent\")),\n",
        "        (\"ohe\", OneHotEncoder(handle_unknown=\"ignore\", sparse_output=False)),\n",
        "    ]\n",
        ")\n",
        "\n",
        "preprocessor = ColumnTransformer(\n",
        "    transformers=[\n",
        "        (\"num\", numeric_transformer, selector(dtype_exclude=\"category\")),\n",
        "        (\"cat\", categorical_transformer, selector(dtype_include=\"category\")),\n",
        "    ]\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, the preprocessing pipeline is defined, we can run it on our training data, and apply the generated transform to our test data:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "X_train = preprocessor.fit_transform(X_train)\n",
        "X_test = preprocessor.transform(X_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"TrainingModels\"></a>\n",
        "## Training Models\n",
        "\n",
        "We now train a couple of different models on our data. The `adult` census dataset is a classification problem - the goal is to predict whether a particular individual exceeds an income threshold. For the purpose of generating a dashboard to upload, it is sufficient to train two basic classifiers. First, a logistic regression classifier:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "lr_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
        "\n",
        "lr_predictor.fit(X_train, y_train)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "And for comparison, a support vector classifier:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "svm_predictor = svm.SVC()\n",
        "\n",
        "svm_predictor.fit(X_train, y_train)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"LoginAzureML\"></a>\n",
        "## Logging in to AzureML\n",
        "\n",
        "With our two classifiers trained, we can log into our AzureML workspace:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace, Experiment, Model\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "ws.get_details()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"RegisterModels\"></a>\n",
        "## Registering the Models\n",
        "\n",
        "Next, we register our models. By default, the subroutine which uploads the models checks that the names provided correspond to registered models in the workspace. We define a utility routine to do the registering:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import joblib\n",
        "import os\n",
        "\n",
        "os.makedirs('models', exist_ok=True)\n",
        "def register_model(name, model):\n",
        "    print(\"Registering \", name)\n",
        "    model_path = \"models/{0}.pkl\".format(name)\n",
        "    joblib.dump(value=model, filename=model_path)\n",
        "    registered_model = Model.register(model_path=model_path,\n",
        "                                      model_name=name,\n",
        "                                      workspace=ws)\n",
        "    print(\"Registered \", registered_model.id)\n",
        "    return registered_model.id"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we register the models. For convenience in subsequent method calls, we store the results in a dictionary, which maps the `id` of the registered model (a string in `name:version` format) to the predictor itself:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model_dict = {}\n",
        "\n",
        "lr_reg_id = register_model(\"fairness_linear_regression\", lr_predictor)\n",
        "model_dict[lr_reg_id] = lr_predictor\n",
        "svm_reg_id = register_model(\"fairness_svm\", svm_predictor)\n",
        "model_dict[svm_reg_id] = svm_predictor"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"LocalDashboard\"></a>\n",
        "## Using the Fairlearn Dashboard\n",
        "\n",
        "We can now examine the fairness of the two models we have training, both as a function of race and (binary) sex. Before uploading the dashboard to the AzureML portal, we will first instantiate a local instance of the Fairlearn dashboard.\n",
        "\n",
        "Regardless of the viewing location, the dashboard is based on three things - the true values, the model predictions and the sensitive feature values. The dashboard can use predictions from multiple models and multiple sensitive features if desired (as we are doing here).\n",
        "\n",
        "Our first step is to generate a dictionary mapping the `id` of the registered model to the corresponding array of predictions:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ys_pred = {}\n",
        "for n, p in model_dict.items():\n",
        "    ys_pred[n] = p.predict(X_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can examine these predictions in a locally invoked Fairlearn dashboard. This can be compared to the dashboard uploaded to the portal (in the next section):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from raiwidgets import FairnessDashboard\n",
        "\n",
        "FairnessDashboard(sensitive_features=A_test, \n",
        "                  y_true=y_test.tolist(),\n",
        "                  y_pred=ys_pred)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"AzureUpload\"></a>\n",
        "## Uploading a Fairness Dashboard to Azure\n",
        "\n",
        "Uploading a fairness dashboard to Azure is a two stage process. The `FairnessDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. The required stages are therefore:\n",
        "1. Precompute all the required metrics\n",
        "1. Upload to Azure\n",
        "\n",
        "\n",
        "### Computing Fairness Metrics\n",
        "We use Fairlearn to create a dictionary which contains all the data required to display a dashboard. This includes both the raw data (true values, predicted values and sensitive features), and also the fairness metrics. The API is similar to that used to invoke the Dashboard locally. However, there are a few minor changes to the API, and the type of problem being examined (binary classification, regression etc.) needs to be specified explicitly:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "sf = { 'Race': A_test.race, 'Sex': A_test.sex }\n",
        "\n",
        "from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
        "\n",
        "dash_dict = _create_group_metric_set(y_true=y_test,\n",
        "                                     predictions=ys_pred,\n",
        "                                     sensitive_features=sf,\n",
        "                                     prediction_type='binary_classification')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The `_create_group_metric_set()` method is currently underscored since its exact design is not yet final in Fairlearn."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Uploading to Azure\n",
        "\n",
        "We can now import the `azureml.contrib.fairness` package itself. We will round-trip the data, so there are two required subroutines:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can upload the generated dictionary to AzureML. The upload method requires a run, so we first create an experiment and a run. The uploaded dashboard can be seen on the corresponding Run Details page in AzureML Studio. For completeness, we also download the dashboard dictionary which we uploaded."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "exp = Experiment(ws, \"notebook-01\")\n",
        "print(exp)\n",
        "\n",
        "run = exp.start_logging()\n",
        "try:\n",
        "    dashboard_title = \"Sample notebook upload\"\n",
        "    upload_id = upload_dashboard_dictionary(run,\n",
        "                                            dash_dict,\n",
        "                                            dashboard_name=dashboard_title)\n",
        "    print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
        "\n",
        "    downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
        "finally:\n",
        "    run.complete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can verify that the dashboard dictionary which we downloaded matches our upload:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(dash_dict == downloaded_dict)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id=\"Conclusion\"></a>\n",
        "## Conclusion\n",
        "\n",
        "In this notebook we have demonstrated how to generate and upload a fairness dashboard to AzureML Studio. We have not discussed how to analyse the results and apply mitigations. Those topics will be covered elsewhere."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "riedgar"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.10"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}