MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/regression-car-price-model-explaination-and-featurization/auto-ml-regression.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning\n",
        "_**Regression with Aml Compute**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Data](#Data)\n",
        "1. [Train](#Train)\n",
        "1. [Results](#Results)\n",
        "1. [Test](#Test)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n",
        "After training AutoML models for this regression data set, we show how you can compute model explanations on your remote compute using a sample explainer script.\n",
        "\n",
        "If you are using an Azure Machine Learning Compute Instance, you are all set.  Otherwise, go through the [configuration](../../../configuration.ipynb)  notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
        "\n",
        "In this notebook you will learn how to:\n",
        "1. Create an `Experiment` in an existing `Workspace`.\n",
        "2. Instantiating AutoMLConfig with FeaturizationConfig for customization\n",
        "3. Train the model using remote compute.\n",
        "4. Explore the results and featurization transparency options\n",
        "5. Setup remote compute for computing the model explanations for a given AutoML model.\n",
        "6. Start an AzureML experiment on your remote compute to compute explanations for an AutoML model.\n",
        "7. Download the feature importance for engineered features and visualize the explanations for engineered features on azure portal. \n",
        "8. Download the feature importance for raw features and visualize the explanations for raw features on azure portal. \n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n",
        "\n",
        "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import logging\n",
        "\n",
        "from matplotlib import pyplot as plt\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "import azureml.dataprep as dprep\n",
        "from azureml.automl.core.featurization import FeaturizationConfig\n",
        "from azureml.train.automl import AutoMLConfig\n",
        "from azureml.core.dataset import Dataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"This notebook was created using version 1.21.0 of the Azure ML SDK\")\n",
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# Choose a name for the experiment.\n",
        "experiment_name = 'automl-regression-hardware-explain'\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output['Subscription ID'] = ws.subscription_id\n",
        "output['Workspace Name'] = ws.name\n",
        "output['Resource Group'] = ws.resource_group\n",
        "output['Location'] = ws.location\n",
        "output['Experiment Name'] = experiment.name\n",
        "pd.set_option('display.max_colwidth', -1)\n",
        "outputDf = pd.DataFrame(data = output, index = [''])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create or Attach existing AmlCompute\n",
        "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
        "\n",
        "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your cluster.\n",
        "amlcompute_cluster_name = \"hardware-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
        "    print('Found existing cluster, use it.')\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
        "                                                           max_nodes=4)\n",
        "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Setup Training and Test Data for AutoML experiment\n",
        "\n",
        "Load the hardware dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model.  We also register the datasets in your workspace using a name so that these datasets may be accessed from the remote compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data = 'https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv'\n",
        "\n",
        "dataset = Dataset.Tabular.from_delimited_files(data)\n",
        "\n",
        "# Split the dataset into train and test datasets\n",
        "train_data, test_data = dataset.random_split(percentage=0.8, seed=223)\n",
        "\n",
        "\n",
        "# Register the train dataset with your workspace\n",
        "train_data.register(workspace = ws, name = 'machineData_train_dataset',\n",
        "                       description = 'hardware performance training data',\n",
        "                      create_new_version=True)\n",
        "\n",
        "# Register the test dataset with your workspace\n",
        "test_data.register(workspace = ws, name = 'machineData_test_dataset', description = 'hardware performance test data', create_new_version=True)\n",
        "\n",
        "label =\"ERP\"\n",
        "\n",
        "train_data.to_pandas_dataframe().head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Train\n",
        "\n",
        "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|classification, regression or forecasting|\n",
        "|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
        "|**experiment_timeout_hours**| Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n",
        "|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
        "|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*. Note: If the input data is sparse, featurization cannot be turned on.|\n",
        "|**n_cross_validations**|Number of cross validation splits.|\n",
        "|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n",
        "|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Customization\n",
        "\n",
        "Supported customization includes:\n",
        "\n",
        "1. Column purpose update: Override feature type for the specified column.\n",
        "2. Transformer parameter update: Update parameters for the specified transformer. Currently supports Imputer and HashOneHotEncoder.\n",
        "3. Drop columns: Columns to drop from being featurized.\n",
        "4. Block transformers: Allow/Block transformers to be used on featurization process."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create FeaturizationConfig object using API calls"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "sample-featurizationconfig-remarks2"
        ]
      },
      "outputs": [],
      "source": [
        "featurization_config = FeaturizationConfig()\n",
        "featurization_config.blocked_transformers = ['LabelEncoder']\n",
        "#featurization_config.drop_columns = ['MMIN']\n",
        "featurization_config.add_column_purpose('MYCT', 'Numeric')\n",
        "featurization_config.add_column_purpose('VendorName', 'CategoricalHash')\n",
        "#default strategy mean, add transformer param for for 3 columns\n",
        "featurization_config.add_transformer_params('Imputer', ['CACH'], {\"strategy\": \"median\"})\n",
        "featurization_config.add_transformer_params('Imputer', ['CHMIN'], {\"strategy\": \"median\"})\n",
        "featurization_config.add_transformer_params('Imputer', ['PRP'], {\"strategy\": \"most_frequent\"})\n",
        "#featurization_config.add_transformer_params('HashOneHotEncoder', [], {\"number_of_bits\": 3})"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "sample-featurizationconfig-remarks3"
        ]
      },
      "outputs": [],
      "source": [
        "automl_settings = {\n",
        "    \"enable_early_stopping\": True, \n",
        "    \"experiment_timeout_hours\" : 0.25,\n",
        "    \"max_concurrent_iterations\": 4,\n",
        "    \"max_cores_per_iteration\": -1,\n",
        "    \"n_cross_validations\": 5,\n",
        "    \"primary_metric\": 'normalized_root_mean_squared_error',\n",
        "    \"verbosity\": logging.INFO\n",
        "}\n",
        "\n",
        "automl_config = AutoMLConfig(task = 'regression',\n",
        "                             debug_log = 'automl_errors.log',\n",
        "                             compute_target=compute_target,\n",
        "                             featurization=featurization_config,\n",
        "                             training_data = train_data,\n",
        "                             label_column_name = label,\n",
        "                             **automl_settings\n",
        "                            )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
        "In this example, we specify `show_output = True` to print currently running iterations to the console."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run = experiment.submit(automl_config, show_output = False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run the following cell to access previous runs. Uncomment the cell below and update the run_id."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#from azureml.train.automl.run import AutoMLRun\n",
        "#remote_run = AutoMLRun(experiment=experiment, run_id='<run_ID_goes_here')\n",
        "#remote_run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run, fitted_model = remote_run.get_output()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run_customized, fitted_model_customized = remote_run.get_output()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Transparency\n",
        "\n",
        "View updated featurization summary"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "custom_featurizer = fitted_model_customized.named_steps['datatransformer']"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "custom_featurizer.get_featurization_summary()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "is_user_friendly=False allows for more detailed summary for transforms being applied"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "custom_featurizer.get_featurization_summary(is_user_friendly=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "custom_featurizer.get_stats_feature_type_summary()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Results"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Widget for Monitoring Runs\n",
        "\n",
        "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
        "\n",
        "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "RunDetails(remote_run).show() "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Explanations\n",
        "This section will walk you through the workflow to compute model explanations for an AutoML model on your remote compute.\n",
        "\n",
        "### Retrieve any AutoML Model for explanations\n",
        "\n",
        "Below we select the some AutoML pipeline from our iterations. The `get_output` method returns the a AutoML run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "automl_run, fitted_model = remote_run.get_output(metric='r2_score')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Setup model explanation run on the remote compute\n",
        "The following section provides details on how to setup an AzureML experiment to run model explanations for an AutoML model on your remote compute."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Sample script used for computing explanations\n",
        "View the sample script for computing the model explanations for your AutoML model on remote compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "with open('train_explainer.py', 'r') as cefr:\n",
        "    print(cefr.read())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Substitute values in your sample script\n",
        "The following cell shows how you change the values in the sample script so that you can change the sample script according to your experiment and dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import shutil\n",
        "import os\n",
        "\n",
        "# create script folder\n",
        "script_folder = './sample_projects/automl-regression-hardware'\n",
        "if not os.path.exists(script_folder):\n",
        "    os.makedirs(script_folder)\n",
        "\n",
        "# Copy the sample script to script folder.\n",
        "shutil.copy('train_explainer.py', script_folder)\n",
        "\n",
        "# Create the explainer script that will run on the remote compute.\n",
        "script_file_name = script_folder + '/train_explainer.py'\n",
        "\n",
        "# Open the sample script for modification\n",
        "with open(script_file_name, 'r') as cefr:\n",
        "    content = cefr.read()\n",
        "\n",
        "# Replace the values in train_explainer.py file with the appropriate values\n",
        "content = content.replace('<<experiment_name>>', automl_run.experiment.name) # your experiment name.\n",
        "content = content.replace('<<run_id>>', automl_run.id) # Run-id of the AutoML run for which you want to explain the model.\n",
        "content = content.replace('<<target_column_name>>', 'ERP') # Your target column name\n",
        "content = content.replace('<<task>>', 'regression') # Training task type\n",
        "# Name of your training dataset register with your workspace\n",
        "content = content.replace('<<train_dataset_name>>', 'machineData_train_dataset') \n",
        "# Name of your test dataset register with your workspace\n",
        "content = content.replace('<<test_dataset_name>>', 'machineData_test_dataset')\n",
        "\n",
        "# Write sample file into your script folder.\n",
        "with open(script_file_name, 'w') as cefw:\n",
        "    cefw.write(content)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Create conda configuration for model explanations experiment from automl_run object"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.runconfig import RunConfiguration\n",
        "from azureml.core.conda_dependencies import CondaDependencies\n",
        "import pkg_resources\n",
        "\n",
        "# create a new RunConfig object\n",
        "conda_run_config = RunConfiguration(framework=\"python\")\n",
        "\n",
        "# Set compute target to AmlCompute\n",
        "conda_run_config.target = compute_target\n",
        "conda_run_config.environment.docker.enabled = True\n",
        "\n",
        "# specify CondaDependencies obj\n",
        "conda_run_config.environment.python.conda_dependencies = automl_run.get_environment().python.conda_dependencies"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Submit the experiment for model explanations\n",
        "Submit the experiment with the above `run_config` and the sample script for computing explanations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Now submit a run on AmlCompute for model explanations\n",
        "from azureml.core.script_run_config import ScriptRunConfig\n",
        "\n",
        "script_run_config = ScriptRunConfig(source_directory=script_folder,\n",
        "                                    script='train_explainer.py',\n",
        "                                    run_config=conda_run_config)\n",
        "\n",
        "run = experiment.submit(script_run_config)\n",
        "\n",
        "# Show run details\n",
        "run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%time\n",
        "# Shows output of the run on stdout.\n",
        "run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Feature importance  and  visualizing explanation dashboard\n",
        "In this section we describe how you can download the explanation results from the explanations experiment and visualize the feature importance for your AutoML model on the azure portal."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Download engineered feature importance from artifact store\n",
        "You can use *ExplanationClient* to download the engineered feature explanations from the artifact store of the *automl_run*. You can also use azure portal url to view the dash board visualization of the feature importance values of the engineered features."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.interpret import ExplanationClient\n",
        "client = ExplanationClient.from_run(automl_run)\n",
        "engineered_explanations = client.download_model_explanation(raw=False, comment='engineered explanations')\n",
        "print(engineered_explanations.get_feature_importance_dict())\n",
        "print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Download raw feature importance from artifact store\n",
        "You can use *ExplanationClient* to download the raw feature explanations from the artifact store of the *automl_run*. You can also use azure portal url to view the dash board visualization of the feature importance values of the raw features."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "raw_explanations = client.download_model_explanation(raw=True, comment='raw explanations')\n",
        "print(raw_explanations.get_feature_importance_dict())\n",
        "print(\"You can visualize the raw explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Operationalize\n",
        "In this section we will show how you can operationalize an AutoML model and the explainer which was used to compute the explanations in the previous section.\n",
        "\n",
        "### Register the AutoML model and the scoring explainer\n",
        "We use the *TreeScoringExplainer* from *azureml-interpret* package to create the scoring explainer which will be used to compute the raw and engineered feature importances at the inference time. \n",
        "In the cell below, we register the AutoML model and the scoring explainer with the Model Management Service."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Register trained automl model present in the 'outputs' folder in the artifacts\n",
        "original_model = automl_run.register_model(model_name='automl_model', \n",
        "                                           model_path='outputs/model.pkl')\n",
        "scoring_explainer_model = automl_run.register_model(model_name='scoring_explainer',\n",
        "                                                    model_path='outputs/scoring_explainer.pkl')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create the conda dependencies for setting up the service\n",
        "We need to create the conda dependencies comprising of the *azureml* packages using the training environment from the *automl_run*."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "conda_dep = automl_run.get_environment().python.conda_dependencies\n",
        "\n",
        "with open(\"myenv.yml\",\"w\") as f:\n",
        "    f.write(conda_dep.serialize_to_string())\n",
        "\n",
        "with open(\"myenv.yml\",\"r\") as f:\n",
        "    print(f.read())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View your scoring file"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "with open(\"score_explain.py\",\"r\") as f:\n",
        "    print(f.read())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Deploy the service\n",
        "In the cell below, we deploy the service using the conda file and the scoring file from the previous steps. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.webservice import Webservice\n",
        "from azureml.core.model import InferenceConfig\n",
        "from azureml.core.webservice import AciWebservice\n",
        "from azureml.core.model import Model\n",
        "from azureml.core.environment import Environment\n",
        "\n",
        "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
        "                                               memory_gb=1, \n",
        "                                               tags={\"data\": \"Machine Data\",  \n",
        "                                                     \"method\" : \"local_explanation\"}, \n",
        "                                               description='Get local explanations for Machine test data')\n",
        "\n",
        "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n",
        "inference_config = InferenceConfig(entry_script=\"score_explain.py\", environment=myenv)\n",
        "\n",
        "# Use configs and models generated above\n",
        "service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
        "service.wait_for_deployment(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View the service logs"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "service.get_logs()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Inference using some test data\n",
        "Inference using some test data to see the predicted value from autml model, view the engineered feature importance for the predicted value and raw feature importance for the predicted value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "if service.state == 'Healthy':\n",
        "    X_test = test_data.drop_columns([label]).to_pandas_dataframe()\n",
        "    # Serialize the first row of the test data into json\n",
        "    X_test_json = X_test[:1].to_json(orient='records')\n",
        "    print(X_test_json)\n",
        "    # Call the service to get the predictions and the engineered and raw explanations\n",
        "    output = service.run(X_test_json)\n",
        "    # Print the predicted value\n",
        "    print(output['predictions'])\n",
        "    # Print the engineered feature importances for the predicted value\n",
        "    print(output['engineered_local_importance_values'])\n",
        "    # Print the raw feature importances for the predicted value\n",
        "    print(output['raw_local_importance_values'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Delete the service\n",
        "Delete the service once you have finished inferencing."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "service.delete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Test"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# preview the first 3 rows of the dataset\n",
        "\n",
        "test_data = test_data.to_pandas_dataframe()\n",
        "y_test = test_data['ERP'].fillna(0)\n",
        "test_data = test_data.drop('ERP', 1)\n",
        "test_data = test_data.fillna(0)\n",
        "\n",
        "\n",
        "train_data = train_data.to_pandas_dataframe()\n",
        "y_train = train_data['ERP'].fillna(0)\n",
        "train_data = train_data.drop('ERP', 1)\n",
        "train_data = train_data.fillna(0)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "y_pred_train = fitted_model.predict(train_data)\n",
        "y_residual_train = y_train - y_pred_train\n",
        "\n",
        "y_pred_test = fitted_model.predict(test_data)\n",
        "y_residual_test = y_test - y_pred_test"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
        "from sklearn.metrics import mean_squared_error, r2_score\n",
        "\n",
        "# Set up a multi-plot chart.\n",
        "f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
        "f.suptitle('Regression Residual Values', fontsize = 18)\n",
        "f.set_figheight(6)\n",
        "f.set_figwidth(16)\n",
        "\n",
        "# Plot residual values of training set.\n",
        "a0.axis([0, 360, -100, 100])\n",
        "a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
        "a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
        "a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
        "a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)),fontsize = 12)\n",
        "a0.set_xlabel('Training samples', fontsize = 12)\n",
        "a0.set_ylabel('Residual Values', fontsize = 12)\n",
        "\n",
        "# Plot residual values of test set.\n",
        "a1.axis([0, 90, -100, 100])\n",
        "a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
        "a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
        "a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
        "a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)),fontsize = 12)\n",
        "a1.set_xlabel('Test samples', fontsize = 12)\n",
        "a1.set_yticklabels([])\n",
        "\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
        "test_pred = plt.scatter(y_test, y_pred_test, color='')\n",
        "test_test = plt.scatter(y_test, y_test, color='g')\n",
        "plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
        "plt.show()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "anshirga"
      }
    ],
    "categories": [
      "how-to-use-azureml",
      "automated-machine-learning"
    ],
    "category": "tutorial",
    "compute": [
      "AML"
    ],
    "datasets": [
      "MachineData"
    ],
    "deployment": [
      "ACI"
    ],
    "exclude_from_index": false,
    "framework": [
      "None"
    ],
    "friendly_name": "Automated ML run with featurization and model explainability.",
    "index_order": 5,
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.7"
    },
    "tags": [
      "featurization",
      "explainability",
      "remote_run",
      "AutomatedML"
    ],
    "task": "Regression"
  },
  "nbformat": 4,
  "nbformat_minor": 2
}