Files
MachineLearningNotebooks/how-to-use-azureml/responsible-ai/visualize-upload-loan-decision/rai-loan-decision.ipynb

718 lines
27 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/responsible-ai/visualize-upload-loan-decision/rai-loan-decision.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Assess Fairness, Explore Interpretability, and Mitigate Fairness Issues \n",
"\n",
"This notebook demonstrates how to use [InterpretML](interpret.ml), [Fairlearn](fairlearn.org), and the [Responsible AI Widget's](https://github.com/microsoft/responsible-ai-widgets/) Fairness and Interpretability dashboards to understand a model trained on the Census dataset. This dataset is a classification problem - given a range of data about 32,000 individuals, predict whether their annual income is above or below fifty thousand dollars per year.\n",
"\n",
"For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan.\n",
"\n",
"We will first train a fairness-unaware predictor, load its global and local explanations, and use the interpretability and fairness dashboards to demonstrate how this model leads to unfair decisions (under a specific notion of fairness called *demographic parity*). We then mitigate unfairness by applying the `GridSearch` algorithm from `Fairlearn` package.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install required packages\n",
"\n",
"This notebook works with Fairlearn v0.7.0, but not with versions pre-v0.5.0. If needed, please uncomment and run the following cell:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %pip install --upgrade fairlearn>=0.6.2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After installing packages, you must close and reopen the notebook as well as restarting the kernel."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load and preprocess the dataset\n",
"\n",
"For simplicity, we import the dataset from the `shap` package, which contains the data in a cleaned format. We start by importing the various modules we're going to use:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairlearn.reductions import GridSearch\n",
"from fairlearn.reductions import DemographicParity\n",
"\n",
"from sklearn.compose import ColumnTransformer, make_column_selector\n",
"from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.metrics import accuracy_score\n",
"\n",
"import pandas as pd\n",
"\n",
"# SHAP Tabular Explainer\n",
"from interpret.ext.blackbox import MimicExplainer\n",
"from interpret.ext.glassbox import LGBMExplainableModel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now load and inspect the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from utilities import fetch_census_dataset\n",
"\n",
"dataset = fetch_census_dataset()\n",
"X_raw, y = dataset['data'], dataset['target']\n",
"X_raw[\"race\"].value_counts().to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to treat the sex of each individual as a protected attribute (where 0 indicates female and 1 indicates male), and in this particular case we are going separate this attribute out and drop it from the main data. We then perform some standard data preprocessing steps to convert the data into a format suitable for the ML algorithms"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sensitive_features = X_raw[['sex','race']]\n",
"\n",
"le = LabelEncoder()\n",
"y = le.fit_transform(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we split the data into training and test sets:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test, sensitive_features_train, sensitive_features_test = \\\n",
" train_test_split(X_raw, y, sensitive_features,\n",
" test_size = 0.2, random_state=0, stratify=y)\n",
"\n",
"# Work around indexing bug\n",
"X_train = X_train.reset_index(drop=True)\n",
"sensitive_features_train = sensitive_features_train.reset_index(drop=True)\n",
"X_test = X_test.reset_index(drop=True)\n",
"sensitive_features_test = sensitive_features_test.reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training a fairness-unaware predictor\n",
"\n",
"To show the effect of `Fairlearn` we will first train a standard ML predictor that does not incorporate fairness. For speed of demonstration, we use a simple logistic regression estimator from `sklearn`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"numeric_transformer = Pipeline(\n",
" steps=[\n",
" (\"impute\", SimpleImputer()),\n",
" (\"scaler\", StandardScaler()),\n",
" ]\n",
")\n",
"categorical_transformer = Pipeline(\n",
" [\n",
" (\"impute\", SimpleImputer(strategy=\"most_frequent\")),\n",
" (\"ohe\", OneHotEncoder(handle_unknown=\"ignore\")),\n",
" ]\n",
")\n",
"preprocessor = ColumnTransformer(\n",
" transformers=[\n",
" (\"num\", numeric_transformer, make_column_selector(dtype_exclude=\"category\")),\n",
" (\"cat\", categorical_transformer, make_column_selector(dtype_include=\"category\")),\n",
" ]\n",
")\n",
"\n",
"model = Pipeline(\n",
" steps=[\n",
" (\"preprocessor\", preprocessor),\n",
" (\n",
" \"classifier\",\n",
" LogisticRegression(solver=\"liblinear\", fit_intercept=True),\n",
" ),\n",
" ]\n",
")\n",
"\n",
"model.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate model explanations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using SHAP KernelExplainer\n",
"# clf.steps[-1][1] returns the trained classification model\n",
"explainer = MimicExplainer(model.steps[-1][1], \n",
" X_train,\n",
" LGBMExplainableModel,\n",
" features=X_raw.columns, \n",
" classes=['Rejected', 'Approved'],\n",
" transformations=preprocessor)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate global explanations\n",
"Explain overall model predictions (global explanation)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Explain the model based on a subset of 1000 rows\n",
"global_explanation = explainer.explain_global(X_test[:1000])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"global_explanation.get_feature_importance_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate local explanations\n",
"Explain local data points (individual instances)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can pass a specific data point or a group of data points to the explain_local function\n",
"# E.g., Explain the first data point in the test set\n",
"instance_num = 1\n",
"local_explanation = explainer.explain_local(X_test[:instance_num])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
"prediction_value = model.predict(X_test)[instance_num]\n",
"\n",
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
"print('local importance names: {}'.format(sorted_local_importance_names))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize model explanations\n",
"Load the interpretability visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, model, dataset=X_test[:1000], true_y=y_test[:1000])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can load this predictor into the Fairness dashboard, and examine how it is unfair:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assess model fairness \n",
"Load the fairness visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import FairnessDashboard\n",
"\n",
"y_pred = model.predict(X_test)\n",
"\n",
"FairnessDashboard(sensitive_features=sensitive_features_test,\n",
" y_true=y_test,\n",
" y_pred=y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the disparity in accuracy, we see that males have an error rate about three times greater than the females. More interesting is the disparity in opportunitiy - males are offered loans at three times the rate of females.\n",
"\n",
"Despite the fact that we removed the feature from the training data, our predictor still discriminates based on sex. This demonstrates that simply ignoring a protected attribute when fitting a predictor rarely eliminates unfairness. There will generally be enough other features correlated with the removed attribute to lead to disparate impact."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mitigation with Fairlearn (GridSearch)\n",
"\n",
"The `GridSearch` class in `Fairlearn` implements a simplified version of the exponentiated gradient reduction of [Agarwal et al. 2018](https://arxiv.org/abs/1803.02453). The user supplies a standard ML estimator, which is treated as a blackbox. `GridSearch` works by generating a sequence of relabellings and reweightings, and trains a predictor for each.\n",
"\n",
"For this example, we specify demographic parity (on the protected attribute of sex) as the fairness metric. Demographic parity requires that individuals are offered the opportunity (are approved for a loan in this example) independent of membership in the protected class (i.e., females and males should be offered loans at the same rate). We are using this metric for the sake of simplicity; in general, the appropriate fairness metric will not be obvious."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Fairlearn is not yet fully compatible with Pipelines, so we have to pass the estimator only\n",
"X_train_prep = preprocessor.transform(X_train).toarray()\n",
"X_test_prep = preprocessor.transform(X_test).toarray()\n",
"\n",
"sweep = GridSearch(LogisticRegression(solver=\"liblinear\", fit_intercept=True),\n",
" constraints=DemographicParity(),\n",
" grid_size=70)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our algorithms provide `fit()` and `predict()` methods, so they behave in a similar manner to other ML packages in Python. We do however have to specify two extra arguments to `fit()` - the column of protected attribute labels, and also the number of predictors to generate in our sweep.\n",
"\n",
"After `fit()` completes, we extract the full set of predictors from the `GridSearch` object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sweep.fit(X_train_prep, y_train,\n",
" sensitive_features=sensitive_features_train.sex)\n",
"\n",
"predictors = sweep.predictors_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could load these predictors into the Fairness dashboard now. However, the plot would be somewhat confusing due to their number. In this case, we are going to remove the predictors which are dominated in the error-disparity space by others from the sweep (note that the disparity will only be calculated for the sensitive feature). In general, one might not want to do this, since there may be other considerations beyond the strict optimization of error and disparity (of the given protected attribute)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairlearn.metrics import demographic_parity_difference\n",
"\n",
"accuracies, disparities = [], []\n",
"\n",
"for predictor in predictors:\n",
" y_pred = predictor.predict(X_train_prep)\n",
" # accuracy_metric_frame = MetricFrame(accuracy_score, y_train, predictor.predict(X_train_prep), sensitive_features=sensitive_features_train.sex)\n",
" # selection_rate_metric_frame = MetricFrame(selection_rate, y_train, predictor.predict(X_train_prep), sensitive_features=sensitive_features_train.sex)\n",
" accuracies.append(accuracy_score(y_train, y_pred))\n",
" disparities.append(demographic_parity_difference(y_train,\n",
" y_pred,\n",
" sensitive_features=sensitive_features_train.sex))\n",
" \n",
"all_results = pd.DataFrame({\"predictor\": predictors, \"accuracy\": accuracies, \"disparity\": disparities})\n",
"\n",
"all_models_dict = {\"unmitigated\": model.steps[-1][1]}\n",
"dominant_models_dict = {\"unmitigated\": model.steps[-1][1]}\n",
"base_name_format = \"grid_{0}\"\n",
"row_id = 0\n",
"for row in all_results.itertuples():\n",
" model_name = base_name_format.format(row_id)\n",
" all_models_dict[model_name] = row.predictor\n",
" accuracy_for_lower_or_eq_disparity = all_results[\"accuracy\"][all_results[\"disparity\"] <= row.disparity]\n",
" if row.accuracy >= accuracy_for_lower_or_eq_disparity.max():\n",
" dominant_models_dict[model_name] = row.predictor\n",
" row_id = row_id + 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can construct predictions for all the models, and also for the dominant models:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dashboard_all = {}\n",
"for name, predictor in all_models_dict.items():\n",
" value = predictor.predict(X_test_prep)\n",
" dashboard_all[name] = value\n",
" \n",
"dominant_all = {}\n",
"for name, predictor in dominant_models_dict.items():\n",
" dominant_all[name] = predictor.predict(X_test_prep)\n",
"\n",
"FairnessDashboard(sensitive_features=sensitive_features_test, \n",
" y_true=y_test,\n",
" y_pred=dominant_all)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can look at just the dominant models in the dashboard:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see a Pareto front forming - the set of predictors which represent optimal tradeoffs between accuracy and disparity in predictions. In the ideal case, we would have a predictor at (1,0) - perfectly accurate and without any unfairness under demographic parity (with respect to the protected attribute \"sex\"). The Pareto front represents the closest we can come to this ideal based on our data and choice of estimator. Note the range of the axes - the disparity axis covers more values than the accuracy, so we can reduce disparity substantially for a small loss in accuracy.\n",
"\n",
"By clicking on individual models on the plot, we can inspect their metrics for disparity and accuracy in greater detail. In a real example, we would then pick the model which represented the best trade-off between accuracy and disparity given the relevant business constraints."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AzureML integration\n",
"\n",
"We will now go through a brief example of the AzureML integration.\n",
"\n",
"The required package can be installed via:\n",
"\n",
"```\n",
"pip install azureml-contrib-fairness\n",
"pip install azureml-interpret\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connect to workspace\n",
"\n",
"Just like in the previous tutorials, we will need to connect to a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py).\n",
"\n",
"The following code will allow you to create a workspace if you don't already have one created. You must have an Azure subscription to create a workspace:\n",
"\n",
"```python\n",
"from azureml.core import Workspace\n",
"ws = Workspace.create(name='myworkspace',\n",
" subscription_id='<azure-subscription-id>',\n",
" resource_group='myresourcegroup',\n",
" create_resource_group=True,\n",
" location='eastus2')\n",
"```\n",
"\n",
"**If you are running this on a Notebook VM, you can import the existing workspace.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Registering models\n",
"\n",
"The fairness dashboard is designed to integrate with registered models, so we need to do this for the models we want in the Studio portal. The assumption is that the names of the models specified in the dashboard dictionary correspond to the `id`s (i.e. `<name>:<version>` pairs) of registered models in the workspace."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we register each of the models in the `dominant_all` dictionary into the workspace. For this, we have to save each model to a file, and then register that file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import joblib\n",
"import os\n",
"from azureml.core import Model, Experiment\n",
"\n",
"os.makedirs('models', exist_ok=True)\n",
"def register_model(name, model):\n",
" print(\"Registering \", name)\n",
" model_path = \"models/{0}.pkl\".format(name)\n",
" joblib.dump(value=model, filename=model_path)\n",
" registered_model = Model.register(model_path=model_path,\n",
" model_name=name,\n",
" workspace=ws)\n",
" print(\"Registered \", registered_model.id)\n",
" return registered_model.id\n",
"\n",
"model_name_id_mapping = dict()\n",
"for name, model in dominant_all.items():\n",
" m_id = register_model(name, model)\n",
" model_name_id_mapping[name] = m_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, produce new predictions dictionaries, with the updated names:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dominant_all_ids = dict()\n",
"for name, y_pred in dominant_all.items():\n",
" dominant_all_ids[model_name_id_mapping[name]] = y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uploading a dashboard\n",
"\n",
"We create a _dashboard dictionary_ using Fairlearn's `metrics` package. The `_create_group_metric_set` method has arguments similar to the Dashboard constructor, except that the sensitive features are passed as a dictionary (to ensure that names are available), and we must specify the type of prediction. Note that we use the `dashboard_registered` dictionary we just created:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf = { 'sex': sensitive_features_test.sex, 'race': sensitive_features_test.race }\n",
"\n",
"from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
"\n",
"dash_dict_all = _create_group_metric_set(y_true=y_test,\n",
" predictions=dominant_all_ids,\n",
" sensitive_features=sf,\n",
" prediction_type='binary_classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we import our `contrib` package which contains the routine to perform the upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can create an Experiment, then a Run, and upload our dashboard to it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, 'responsible-ai-loan-decision')\n",
"print(exp)\n",
"\n",
"run = exp.start_logging()\n",
"try:\n",
" dashboard_title = \"Upload MultiAsset from Grid Search with Census Data Notebook\"\n",
" upload_id = upload_dashboard_dictionary(run,\n",
" dash_dict_all,\n",
" dashboard_name=dashboard_title)\n",
" print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
"\n",
" downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
"finally:\n",
" run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uploading explanations\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.interpret import ExplanationClient\n",
"\n",
"client = ExplanationClient.from_run(run)\n",
"client.upload_model_explanation(global_explanation, comment = \"census data global explanation\")"
]
}
],
"metadata": {
"authors": [
{
"name": "chgrego"
}
],
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}