MachineLearningNotebooks/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Train and explain models locally and deploy model and scoring explainer\n",
        "\n",
        "\n",
        "_**This notebook illustrates how to use the Azure Machine Learning Interpretability SDK to deploy a locally-trained model and its corresponding scoring explainer to Azure Container Instances (ACI) as a web service.**_\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "Problem: IBM employee attrition classification with scikit-learn (train and explain a model locally and use Azure Container Instances (ACI) for deploying your model and its corresponding scoring explainer as a web service.)\n",
        "\n",
        "---\n",
        "\n",
        "## Table of Contents\n",
        "\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Run model explainer locally at training time](#Explain)\n",
        "    1. Apply feature transformations\n",
        "    1. Train a binary classification model\n",
        "    1. Explain the model on raw features\n",
        "        1. Generate global explanations\n",
        "        1. Generate local explanations\n",
        "1. [Visualize explanations](#Visualize)\n",
        "1. [Deploy model and scoring explainer](#Deploy)\n",
        "1. [Next steps](#Next)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "\n",
        "\n",
        "This notebook showcases how to train and explain a classification model locally, and deploy the trained model and its corresponding explainer to Azure Container Instances (ACI).\n",
        "It demonstrates the API calls that you need to make to submit a run for training and explaining a model to AMLCompute, download the compute explanations remotely, and visualizing the global and local explanations via a visualization dashboard that provides an interactive way of discovering patterns in model predictions and downloaded explanations. It also demonstrates how to use Azure Machine Learning MLOps capabilities to deploy your model and its corresponding explainer.\n",
        "\n",
        "We will showcase one of the tabular data explainers: TabularExplainer (SHAP) and follow these steps:\n",
        "1.\tDevelop a machine learning script in Python which involves the training script and the explanation script.\n",
        "2.\tRun the script locally.\n",
        "3.\tUse the interpretability toolkit\u00e2\u20ac\u2122s visualization dashboard to visualize predictions and their explanation. If the metrics and explanations don't indicate a desired outcome, loop back to step 1 and iterate on your scripts.\n",
        "5.\tAfter a satisfactory run is found, create a scoring explainer and register the persisted model and its corresponding explainer in the model registry.\n",
        "6.\tDevelop a scoring script.\n",
        "7.\tCreate an image and register it in the image registry.\n",
        "8.\tDeploy the image as a web service in Azure.\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n",
        "Make sure you go through the [configuration notebook](../../../../configuration.ipynb) first if you haven't."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Check core SDK version number\n",
        "import azureml.core\n",
        "\n",
        "print(\"SDK version:\", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initialize a Workspace\n",
        "\n",
        "Initialize a workspace object from persisted configuration"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "create workspace"
        ]
      },
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Explain\n",
        "Create An Experiment: **Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Experiment\n",
        "experiment_name = 'explain_model_at_scoring_time'\n",
        "experiment = Experiment(workspace=ws, name=experiment_name)\n",
        "run = experiment.start_logging()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# get IBM attrition data\n",
        "import os\n",
        "import pandas as pd\n",
        "\n",
        "outdirname = 'dataset.6.21.19'\n",
        "try:\n",
        "    from urllib import urlretrieve\n",
        "except ImportError:\n",
        "    from urllib.request import urlretrieve\n",
        "import zipfile\n",
        "zipfilename = outdirname + '.zip'\n",
        "urlretrieve('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)\n",
        "with zipfile.ZipFile(zipfilename, 'r') as unzip:\n",
        "    unzip.extractall('.')\n",
        "attritionData = pd.read_csv('./WA_Fn-UseC_-HR-Employee-Attrition.csv')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.externals import joblib\n",
        "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
        "from sklearn.impute import SimpleImputer\n",
        "from sklearn.pipeline import Pipeline\n",
        "from sklearn.linear_model import LogisticRegression\n",
        "from sklearn.ensemble import RandomForestClassifier\n",
        "from sklearn_pandas import DataFrameMapper\n",
        "\n",
        "from interpret.ext.blackbox import TabularExplainer\n",
        "\n",
        "os.makedirs('./outputs', exist_ok=True)\n",
        "\n",
        "# Dropping Employee count as all values are 1 and hence attrition is independent of this feature\n",
        "attritionData = attritionData.drop(['EmployeeCount'], axis=1)\n",
        "# Dropping Employee Number since it is merely an identifier\n",
        "attritionData = attritionData.drop(['EmployeeNumber'], axis=1)\n",
        "attritionData = attritionData.drop(['Over18'], axis=1)\n",
        "# Since all values are 80\n",
        "attritionData = attritionData.drop(['StandardHours'], axis=1)\n",
        "\n",
        "# Converting target variables from string to numerical values\n",
        "target_map = {'Yes': 1, 'No': 0}\n",
        "attritionData[\"Attrition_numerical\"] = attritionData[\"Attrition\"].apply(lambda x: target_map[x])\n",
        "target = attritionData[\"Attrition_numerical\"]\n",
        "\n",
        "attritionXData = attritionData.drop(['Attrition_numerical', 'Attrition'], axis=1)\n",
        "\n",
        "# Creating dummy columns for each categorical feature\n",
        "categorical = []\n",
        "for col, value in attritionXData.iteritems():\n",
        "    if value.dtype == 'object':\n",
        "        categorical.append(col)\n",
        "\n",
        "# Store the numerical columns in a list numerical\n",
        "numerical = attritionXData.columns.difference(categorical)\n",
        "\n",
        "numeric_transformations = [([f], Pipeline(steps=[\n",
        "    ('imputer', SimpleImputer(strategy='median')),\n",
        "    ('scaler', StandardScaler())])) for f in numerical]\n",
        "\n",
        "categorical_transformations = [([f], OneHotEncoder(handle_unknown='ignore', sparse=False)) for f in categorical]\n",
        "\n",
        "transformations = numeric_transformations + categorical_transformations\n",
        "\n",
        "# Append classifier to preprocessing pipeline.\n",
        "# Now we have a full prediction pipeline.\n",
        "clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n",
        "                      ('classifier', RandomForestClassifier())])\n",
        "\n",
        "# Split data into train and test\n",
        "from sklearn.model_selection import train_test_split\n",
        "x_train, x_test, y_train, y_test = train_test_split(attritionXData,\n",
        "                                                    target,\n",
        "                                                    test_size = 0.2,\n",
        "                                                    random_state=0,\n",
        "                                                    stratify=target)\n",
        "\n",
        "# preprocess the data and fit the classification model\n",
        "clf.fit(x_train, y_train)\n",
        "model = clf.steps[-1][1]\n",
        "\n",
        "model_file_name = 'log_reg.pkl'\n",
        "\n",
        "# save model in the outputs folder so it automatically get uploaded\n",
        "with open(model_file_name, 'wb') as file:\n",
        "    joblib.dump(value=clf, filename=os.path.join('./outputs/',\n",
        "                                                 model_file_name))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Explain predictions on your local machine\n",
        "tabular_explainer = TabularExplainer(model, \n",
        "                                     initialization_examples=x_train, \n",
        "                                     features=attritionXData.columns, \n",
        "                                     classes=[\"Not leaving\", \"leaving\"], \n",
        "                                     transformations=transformations)\n",
        "\n",
        "# Explain overall model predictions (global explanation)\n",
        "# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
        "# x_train can be passed as well, but with more examples explanations it will\n",
        "# take longer although they may be more accurate\n",
        "global_explanation = tabular_explainer.explain_global(x_test)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.interpret.scoring.scoring_explainer import TreeScoringExplainer, save\n",
        "# ScoringExplainer\n",
        "scoring_explainer = TreeScoringExplainer(tabular_explainer)\n",
        "# Pickle scoring explainer locally\n",
        "save(scoring_explainer, exist_ok=True)\n",
        "\n",
        "# Register original model\n",
        "run.upload_file('original_model.pkl', os.path.join('./outputs/', model_file_name))\n",
        "original_model = run.register_model(model_name='local_deploy_model', \n",
        "                                    model_path='original_model.pkl')\n",
        "\n",
        "# Register scoring explainer\n",
        "run.upload_file('IBM_attrition_explainer.pkl', 'scoring_explainer.pkl')\n",
        "scoring_explainer_model = run.register_model(model_name='IBM_attrition_explainer', model_path='IBM_attrition_explainer.pkl')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize\n",
        "Visualize the explanations"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from interpret_community.widget import ExplanationDashboard"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ExplanationDashboard(global_explanation, clf, datasetX=x_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Deploy \n",
        "\n",
        "Deploy Model and ScoringExplainer.\n",
        "\n",
        "Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.conda_dependencies import CondaDependencies \n",
        "\n",
        "# azureml-defaults is required to host the model as a web service.\n",
        "azureml_pip_packages = [\n",
        "    'azureml-defaults', 'azureml-contrib-interpret', 'azureml-core', 'azureml-telemetry',\n",
        "    'azureml-interpret'\n",
        "]\n",
        " \n",
        "\n",
        "# specify CondaDependencies obj\n",
        "myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas'],\n",
        "                                 pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n",
        "                                 pin_sdk_version=False)\n",
        "\n",
        "with open(\"myenv.yml\",\"w\") as f:\n",
        "    f.write(myenv.serialize_to_string())\n",
        "\n",
        "with open(\"myenv.yml\",\"r\") as f:\n",
        "    print(f.read())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.model import Model\n",
        "# retrieve scoring explainer for deployment\n",
        "scoring_explainer_model = Model(ws, 'IBM_attrition_explainer')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.webservice import Webservice\n",
        "from azureml.core.model import InferenceConfig\n",
        "from azureml.core.webservice import AciWebservice\n",
        "from azureml.core.model import Model\n",
        "from azureml.core.environment import Environment\n",
        "\n",
        "\n",
        "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
        "                                               memory_gb=1, \n",
        "                                               tags={\"data\": \"IBM_Attrition\",  \n",
        "                                                     \"method\" : \"local_explanation\"}, \n",
        "                                               description='Get local explanations for IBM Employee Attrition data')\n",
        "\n",
        "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
        "inference_config = InferenceConfig(entry_script=\"score_local_explain.py\", environment=myenv)\n",
        "\n",
        "# Use configs and models generated above\n",
        "service = Model.deploy(ws, 'model-scoring-deploy-local', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
        "service.wait_for_deployment(show_output=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import requests\n",
        "import json\n",
        "\n",
        "\n",
        "# Create data to test service with\n",
        "sample_data = '{\"Age\":{\"899\":49},\"BusinessTravel\":{\"899\":\"Travel_Rarely\"},\"DailyRate\":{\"899\":1098},\"Department\":{\"899\":\"Research & Development\"},\"DistanceFromHome\":{\"899\":4},\"Education\":{\"899\":2},\"EducationField\":{\"899\":\"Medical\"},\"EnvironmentSatisfaction\":{\"899\":1},\"Gender\":{\"899\":\"Male\"},\"HourlyRate\":{\"899\":85},\"JobInvolvement\":{\"899\":2},\"JobLevel\":{\"899\":5},\"JobRole\":{\"899\":\"Manager\"},\"JobSatisfaction\":{\"899\":3},\"MaritalStatus\":{\"899\":\"Married\"},\"MonthlyIncome\":{\"899\":18711},\"MonthlyRate\":{\"899\":12124},\"NumCompaniesWorked\":{\"899\":2},\"OverTime\":{\"899\":\"No\"},\"PercentSalaryHike\":{\"899\":13},\"PerformanceRating\":{\"899\":3},\"RelationshipSatisfaction\":{\"899\":3},\"StockOptionLevel\":{\"899\":1},\"TotalWorkingYears\":{\"899\":23},\"TrainingTimesLastYear\":{\"899\":2},\"WorkLifeBalance\":{\"899\":4},\"YearsAtCompany\":{\"899\":1},\"YearsInCurrentRole\":{\"899\":0},\"YearsSinceLastPromotion\":{\"899\":0},\"YearsWithCurrManager\":{\"899\":0}}'\n",
        "\n",
        "\n",
        "\n",
        "headers = {'Content-Type':'application/json'}\n",
        "\n",
        "# send request to service\n",
        "resp = requests.post(service.scoring_uri, sample_data, headers=headers)\n",
        "\n",
        "print(\"POST to url\", service.scoring_uri)\n",
        "# can covert back to Python objects from json string if desired\n",
        "print(\"prediction:\", resp.text)\n",
        "result = json.loads(resp.text)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#plot the feature importance for the prediction\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt; plt.rcdefaults()\n",
        "\n",
        "labels = json.loads(sample_data)\n",
        "labels = labels.keys()\n",
        "objects = labels\n",
        "y_pos = np.arange(len(objects))\n",
        "performance = result[\"local_importance_values\"][0][0]\n",
        "\n",
        "plt.bar(y_pos, performance, align='center', alpha=0.5)\n",
        "plt.xticks(y_pos, objects)\n",
        "locs, labels = plt.xticks()\n",
        "plt.setp(labels, rotation=90)\n",
        "plt.ylabel('Feature impact - leaving vs not leaving')\n",
        "plt.title('Local feature importance for prediction')\n",
        "\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "service.delete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next\n",
        "Learn about other use cases of the explain package on a:\n",
        "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb)       \n",
        "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n",
        "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n",
        "1. Explain models with engineered features:\n",
        "    1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n",
        "    1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n",
        "1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
        "1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
        "1. [Inferencing time: deploy a remotely-trained model and explainer](./train-explain-model-on-amlcompute-and-deploy.ipynb)\n",
        "1. [Inferencing time: deploy a locally-trained keras model and explainer](./train-explain-model-keras-locally-and-deploy.ipynb)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "mesameki"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}