Files
MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb

632 lines
24 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Explain classification model, visualize the explanation and operationalize the explainer along with AutoML model**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Explanations](#Explanations)\n",
"1. [Operationailze](#Operationailze)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we use the sklearn's [iris dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to showcase how you can use the AutoML Classifier for a simple classification problem.\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you would see\n",
"1. Creating an Experiment in an existing Workspace\n",
"2. Instantiating AutoMLConfig\n",
"3. Training the Model using local compute and explain the model\n",
"4. Visualization model's feature importance in widget\n",
"5. Explore any model's explanation\n",
"6. Operationalize the AutoML model and the explaination model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"import pandas as pd\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.explain.model._internal.explanation_client import ExplanationClient"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-model-explanation'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
"train_dataset = Dataset.Tabular.from_delimited_files(train_data)\n",
"X_train = train_dataset.drop_columns(columns=['y']).to_pandas_dataframe()\n",
"y_train = train_dataset.keep_columns(columns=['y'], validate=True).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_test.csv\"\n",
"test_dataset = Dataset.Tabular.from_delimited_files(test_data)\n",
"X_test = test_dataset.drop_columns(columns=['y']).to_pandas_dataframe()\n",
"y_test = test_dataset.keep_columns(columns=['y'], validate=True).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**max_time_sec**|Time limit in minutes for each iterations|\n",
"|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n",
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
"|**model_explainability**|Indicate to explain each trained pipeline or not |"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" primary_metric = 'AUC_weighted',\n",
" iteration_timeout_minutes = 200,\n",
" iterations = 10,\n",
" verbosity = logging.INFO,\n",
" preprocess = True,\n",
" X = X_train, \n",
" y = y_train,\n",
" n_cross_validations = 5,\n",
" model_explainability=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n",
"You will see the currently running iterations printing to the console."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_config, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Widget for monitoring runs\n",
"\n",
"The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Best Model 's explanation\n",
"\n",
"Retrieve the explanation from the *best_run* which includes explanations for engineered features and raw features."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download engineered feature importance from artifact store\n",
"You can use *ExplanationClient* to download the engineered feature explanations from the artifact store of the *best_run*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = ExplanationClient.from_run(best_run)\n",
"engineered_explanations = client.download_model_explanation(raw=False)\n",
"print(engineered_explanations.get_feature_importance_dict())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download raw feature importance from artifact store\n",
"You can use *ExplanationClient* to download the raw feature explanations from the artifact store of the *best_run*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = ExplanationClient.from_run(best_run)\n",
"raw_explanations = client.download_model_explanation(raw=True)\n",
"print(raw_explanations.get_feature_importance_dict())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explanations\n",
"In this section, we will show how to compute model explanations and visualize the explanations using azureml-explain-model package. Besides retrieving an existing model explanation for an AutoML model, you can also explain your AutoML model with different test data. The following steps will allow you to compute and visualize engineered feature importance and raw feature importance based on your test data. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve any other AutoML model from training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_run, fitted_model = local_run.get_output(iteration=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Setup the model explanations for AutoML models\n",
"The *fitted_model* can generate the following which will be used for getting the engineered and raw feature explanations using *automl_setup_model_explanations*:-\n",
"1. Featurized data from train samples/test samples \n",
"2. Gather engineered and raw feature name lists\n",
"3. Find the classes in your labeled column in classification scenarios\n",
"\n",
"The *automl_explainer_setup_obj* contains all the structures from above list. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n",
"\n",
"automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, \n",
" X_test=X_test, y=y_train, \n",
" task='classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize the Mimic Explainer for feature importance\n",
"For explaining the AutoML models, use the *MimicWrapper* from *azureml.explain.model* package. The *MimicWrapper* can be initialized with fields in *automl_explainer_setup_obj*, your workspace and a LightGBM model which acts as a surrogate model to explain the AutoML model (*fitted_model* here). The *MimicWrapper* also takes the *automl_run* object where the raw and engineered explanations will be uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
"from azureml.explain.model.mimic_wrapper import MimicWrapper\n",
"explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, \n",
" init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,\n",
" features=automl_explainer_setup_obj.engineered_feature_names, \n",
" feature_maps=[automl_explainer_setup_obj.feature_map],\n",
" classes=automl_explainer_setup_obj.classes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Mimic Explainer for computing and visualizing engineered feature importance\n",
"The *explain()* method in *MimicWrapper* can be called with the transformed test samples to get the feature importance for the generated engineered features. You can also use *ExplanationDashboard* to view the dash board visualization of the feature importance values of the generated engineered features by AutoML featurizers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engineered_explanations = explainer.explain(['local', 'global'], eval_dataset=automl_explainer_setup_obj.X_test_transform)\n",
"print(engineered_explanations.get_feature_importance_dict())\n",
"from azureml.contrib.explain.model.visualize import ExplanationDashboard\n",
"ExplanationDashboard(engineered_explanations, automl_explainer_setup_obj.automl_estimator, automl_explainer_setup_obj.X_test_transform)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Mimic Explainer for computing and visualizing raw feature importance\n",
"The *explain()* method in *MimicWrapper* can be again called with the transformed test samples and setting *get_raw* to *True* to get the feature importance for the raw features. You can also use *ExplanationDashboard* to view the dash board visualization of the feature importance values of the raw features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"raw_explanations = explainer.explain(['local', 'global'], get_raw=True, \n",
" raw_feature_names=automl_explainer_setup_obj.raw_feature_names,\n",
" eval_dataset=automl_explainer_setup_obj.X_test_transform)\n",
"print(raw_explanations.get_feature_importance_dict())\n",
"from azureml.contrib.explain.model.visualize import ExplanationDashboard\n",
"ExplanationDashboard(raw_explanations, automl_explainer_setup_obj.automl_pipeline, automl_explainer_setup_obj.X_test_raw)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Operationailze\n",
"In this section we will show how you can operationalize an AutoML model and the explainer which was used to compute the explanations in the previous section.\n",
"\n",
"#### Register the AutoML model and the scoring explainer\n",
"We use the *TreeScoringExplainer* from *azureml.explain.model* package to create the scoring explainer which will be used to compute the raw and engineered feature importances at the inference time. Note that, we initialize the scoring explainer with the *feature_map* that was computed previously. The *feature_map* will be used by the scoring explainer to return the raw feature importance.\n",
"\n",
"In the cell below, we pickle the scoring explainer and register the AutoML model and the scoring explainer with the Model Management Service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer, save\n",
"\n",
"# Initialize the ScoringExplainer\n",
"scoring_explainer = TreeScoringExplainer(explainer._internal_explainer, feature_maps=[automl_explainer_setup_obj.feature_map])\n",
"\n",
"# Pickle scoring explainer locally\n",
"save(scoring_explainer, exist_ok=True)\n",
"\n",
"# Register trained automl model present in the 'outputs' folder in the artifacts\n",
"original_model = automl_run.register_model(model_name='automl_model', \n",
" model_path='outputs/model.pkl')\n",
"\n",
"# Register scoring explainer\n",
"automl_run.upload_file('scoring_explainer.pkl', 'scoring_explainer.pkl')\n",
"scoring_explainer_model = automl_run.register_model(model_name='scoring_explainer', model_path='scoring_explainer.pkl')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create the conda dependencies for setting up the service\n",
"We need to create the conda dependencies comprising of the *azureml-explain-model*, *azureml-train-automl* and *azureml-defaults* packages. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-explain-model', 'azureml-train-automl', 'azureml-defaults'\n",
"]\n",
" \n",
"\n",
"# specify CondaDependencies obj\n",
"myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas', 'numpy', 'py-xgboost<=0.80'],\n",
" pip_packages=azureml_pip_packages,\n",
" pin_sdk_version=True)\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())\n",
"\n",
"with open(\"myenv.yml\",\"r\") as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### View your scoring file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open(\"score_local_explain.py\",\"r\") as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deploy the service\n",
"In the cell below, we deploy the service using the conda file and the scoring file from the previous steps. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n",
"from azureml.core.model import Model\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
" memory_gb=1, \n",
" tags={\"data\": \"Bank Marketing\", \n",
" \"method\" : \"local_explanation\"}, \n",
" description='Get local explanations for Bank marketing test data')\n",
"\n",
"inference_config = InferenceConfig(runtime= \"python\", \n",
" entry_script=\"score_local_explain.py\",\n",
" conda_file=\"myenv.yml\")\n",
"\n",
"# Use configs and models generated above\n",
"service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n",
"service.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### View the service logs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"service.get_logs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Inference using some test data\n",
"Inference using some test data to see the predicted value from autml model, view the engineered feature importance for the predicted value and raw feature importance for the predicted value."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if service.state == 'Healthy':\n",
" # Serialize the first row of the test data into json\n",
" X_test_json = X_test[:1].to_json(orient='records')\n",
" print(X_test_json)\n",
" # Call the service to get the predictions and the engineered and raw explanations\n",
" output = service.run(X_test_json)\n",
" # Print the predicted value\n",
" print(output['predictions'])\n",
" # Print the engineered feature importances for the predicted value\n",
" print(output['engineered_local_importance_values'])\n",
" # Print the raw feature importances for the predicted value\n",
" print(output['raw_local_importance_values'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Delete the service\n",
"Delete the service once you have finished inferencing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"service.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "xif"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}