MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Boston Housing Price Prediction with scikit-learn (run model explainer locally)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved.\n",
    "\n",
    "Licensed under the MIT License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explain a model with the AML explain-model package\n",
    "\n",
    "1. Train a GradientBoosting regression model using Scikit-learn\n",
    "2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.\n",
    "3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.\n",
    "4. Visualize the global and local explanations with the visualization dashboard."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import datasets\n",
    "from sklearn.ensemble import GradientBoostingRegressor\n",
    "from azureml.explain.model.tabular_explainer import TabularExplainer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Run model explainer locally with full data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the Boston house price data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "boston_data = datasets.load_boston()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Split data into train and test\n",
    "from sklearn.model_selection import train_test_split\n",
    "x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train a GradientBoosting Regression model, which you want to explain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,\n",
    "                                learning_rate=0.1, loss='huber',\n",
    "                                random_state=1)\n",
    "model = reg.fit(x_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explain predictions on your local machine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tabular_explainer = TabularExplainer(model, x_train, features = boston_data.feature_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explain overall model predictions (global explanation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
    "# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
    "global_explanation = tabular_explainer.explain_global(x_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sorted SHAP values \n",
    "print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))\n",
    "# Corresponding feature names\n",
    "print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))\n",
    "# feature ranks (based on original order of features)\n",
    "print('global importance rank: {}'.format(global_explanation.global_importance_rank))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explain overall model predictions as a collection of local (instance-level) explanations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# feature shap values for all features and all data points in the training data\n",
    "print('local importance values: {}'.format(global_explanation.local_importance_values))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explain local data points (individual instances)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "local_explanation = tabular_explainer.explain_local(x_test[0,:])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sorted local feature importance information; reflects the original feature order\n",
    "sorted_local_importance_names = local_explanation.get_ranked_local_names()\n",
    "sorted_local_importance_values = local_explanation.get_ranked_local_values()\n",
    "\n",
    "print('sorted local importance names: {}'.format(sorted_local_importance_names))\n",
    "print('sorted local importance values: {}'.format(sorted_local_importance_values))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load visualization dashboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
    "!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
    "!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
    "# Or, in Jupyter Labs, uncomment below\n",
    "# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
    "# jupyter labextension install microsoft-mli-widget"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.contrib.explain.model.visualize import ExplanationDashboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ExplanationDashboard(global_explanation, model, x_test)"
   ]
  }
 ],
 "metadata": {
  "authors": [
   {
    "name": "mesameki"
   }
  ],
  "kernelspec": {
   "display_name": "Python 3.6",
   "language": "python",
   "name": "python36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}