Merge pull request #1022 from Azure/release_update/Release-56

update samples from Release-56 as a part of  SDK release
This commit is contained in:
Harneet Virk
2020-06-22 11:12:31 -07:00
committed by GitHub
32 changed files with 1551 additions and 27 deletions

View File

@@ -58,7 +58,7 @@ Visit this [community repository](https://github.com/microsoft/MLOps/tree/master
## Projects using Azure Machine Learning
Visit following repos to see projects contributed by Azure ML users:
- [AMLSamples](https://github.com/microsoft/MLOps) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis.
- [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis.
- [Learn about Natural Language Processing best practices using Azure Machine Learning service](https://github.com/microsoft/nlp)
- [Pre-Train BERT models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)

View File

@@ -103,7 +103,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -0,0 +1,549 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/fairness/fairlearn-azureml-mitigation.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unfairness Mitigation with Fairlearn and Azure Machine Learning\n",
"**This notebook shows how to upload results from Fairlearn's GridSearch mitigation algorithm into a dashboard in Azure Machine Learning Studio**\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Loading the Data](#LoadingData)\n",
"1. [Training an Unmitigated Model](#UnmitigatedModel)\n",
"1. [Mitigation with GridSearch](#Mitigation)\n",
"1. [Uploading a Fairness Dashboard to Azure](#AzureUpload)\n",
" 1. Registering models\n",
" 1. Computing Fairness Metrics\n",
" 1. Uploading to Azure\n",
"1. [Conclusion](#Conclusion)\n",
"\n",
"<a id=\"Introduction\"></a>\n",
"## Introduction\n",
"This notebook shows how to use [Fairlearn (an open source fairness assessment and unfairness mitigation package)](http://fairlearn.github.io) and Azure Machine Learning Studio for a binary classification problem. This example uses the well-known adult census dataset. For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan. Its purpose is purely illustrative of a workflow including a fairness dashboard - in particular, we do **not** include a full discussion of the detailed issues which arise when considering fairness in machine learning. For such discussions, please [refer to the Fairlearn website](http://fairlearn.github.io/).\n",
"\n",
"We will apply the [grid search algorithm](https://fairlearn.github.io/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
"\n",
"### Setup\n",
"\n",
"To use this notebook, an Azure Machine Learning workspace is required.\n",
"Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
"This notebook also requires the following packages:\n",
"* `azureml-contrib-fairness`\n",
"* `fairlearn==0.4.6`\n",
"* `joblib`\n",
"* `shap`\n",
"\n",
"\n",
"<a id=\"LoadingData\"></a>\n",
"## Loading the Data\n",
"We use the well-known `adult` census dataset, which we load using `shap` (for convenience). We start with a fairly unremarkable set of imports:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairlearn.reductions import GridSearch, DemographicParity, ErrorRate\n",
"from fairlearn.widget import FairlearnDashboard\n",
"from sklearn import svm\n",
"from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
"from sklearn.linear_model import LogisticRegression\n",
"import pandas as pd\n",
"import shap"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now load and inspect the data from the `shap` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_raw, Y = shap.datasets.adult()\n",
"X_raw[\"Race\"].value_counts().to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to treat the sex of each individual as a protected attribute (where 0 indicates female and 1 indicates male), and in this particular case we are going separate this attribute out and drop it from the main data (this is not always the best option - see the [Fairlearn website](http://fairlearn.github.io/) for further discussion). We also separate out the Race column, but we will not perform any mitigation based on it. Finally, we perform some standard data preprocessing steps to convert the data into a format suitable for the ML algorithms"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A = X_raw[['Sex','Race']]\n",
"X = X_raw.drop(labels=['Sex', 'Race'],axis = 1)\n",
"X = pd.get_dummies(X)\n",
"\n",
"\n",
"le = LabelEncoder()\n",
"Y = le.fit_transform(Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With our data prepared, we can make the conventional split in to 'test' and 'train' subsets:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_raw, \n",
" Y, \n",
" A,\n",
" test_size = 0.2,\n",
" random_state=0,\n",
" stratify=Y)\n",
"\n",
"# Work around indexing issue\n",
"X_train = X_train.reset_index(drop=True)\n",
"A_train = A_train.reset_index(drop=True)\n",
"X_test = X_test.reset_index(drop=True)\n",
"A_test = A_test.reset_index(drop=True)\n",
"\n",
"# Improve labels\n",
"A_test.Sex.loc[(A_test['Sex'] == 0)] = 'female'\n",
"A_test.Sex.loc[(A_test['Sex'] == 1)] = 'male'\n",
"\n",
"\n",
"A_test.Race.loc[(A_test['Race'] == 0)] = 'Amer-Indian-Eskimo'\n",
"A_test.Race.loc[(A_test['Race'] == 1)] = 'Asian-Pac-Islander'\n",
"A_test.Race.loc[(A_test['Race'] == 2)] = 'Black'\n",
"A_test.Race.loc[(A_test['Race'] == 3)] = 'Other'\n",
"A_test.Race.loc[(A_test['Race'] == 4)] = 'White'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"UnmitigatedModel\"></a>\n",
"## Training an Unmitigated Model\n",
"\n",
"So we have a point of comparison, we first train a model (specifically, logistic regression from scikit-learn) on the raw data, without applying any mitigation algorithm:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"unmitigated_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
"\n",
"unmitigated_predictor.fit(X_train, Y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can view this model in the fairness dashboard, and see the disparities which appear:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"FairlearnDashboard(sensitive_features=A_test, sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=Y_test,\n",
" y_pred={\"unmitigated\": unmitigated_predictor.predict(X_test)})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the disparity in accuracy when we select 'Sex' as the sensitive feature, we see that males have an error rate about three times greater than the females. More interesting is the disparity in opportunitiy - males are offered loans at three times the rate of females.\n",
"\n",
"Despite the fact that we removed the feature from the training data, our predictor still discriminates based on sex. This demonstrates that simply ignoring a protected attribute when fitting a predictor rarely eliminates unfairness. There will generally be enough other features correlated with the removed attribute to lead to disparate impact."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"Mitigation\"></a>\n",
"## Mitigation with GridSearch\n",
"\n",
"The `GridSearch` class in `Fairlearn` implements a simplified version of the exponentiated gradient reduction of [Agarwal et al. 2018](https://arxiv.org/abs/1803.02453). The user supplies a standard ML estimator, which is treated as a blackbox - for this simple example, we shall use the logistic regression estimator from scikit-learn. `GridSearch` works by generating a sequence of relabellings and reweightings, and trains a predictor for each.\n",
"\n",
"For this example, we specify demographic parity (on the protected attribute of sex) as the fairness metric. Demographic parity requires that individuals are offered the opportunity (a loan in this example) independent of membership in the protected class (i.e., females and males should be offered loans at the same rate). *We are using this metric for the sake of simplicity* in this example; the appropriate fairness metric can only be selected after *careful examination of the broader context* in which the model is to be used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sweep = GridSearch(LogisticRegression(solver='liblinear', fit_intercept=True),\n",
" constraints=DemographicParity(),\n",
" grid_size=71)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With our estimator created, we can fit it to the data. After `fit()` completes, we extract the full set of predictors from the `GridSearch` object.\n",
"\n",
"The following cell trains a many copies of the underlying estimator, and may take a minute or two to run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sweep.fit(X_train, Y_train,\n",
" sensitive_features=A_train.Sex)\n",
"\n",
"predictors = sweep._predictors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could load these predictors into the Fairness dashboard now. However, the plot would be somewhat confusing due to their number. In this case, we are going to remove the predictors which are dominated in the error-disparity space by others from the sweep (note that the disparity will only be calculated for the protected attribute; other potentially protected attributes will *not* be mitigated). In general, one might not want to do this, since there may be other considerations beyond the strict optimisation of error and disparity (of the given protected attribute)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"errors, disparities = [], []\n",
"for m in predictors:\n",
" classifier = lambda X: m.predict(X)\n",
" \n",
" error = ErrorRate()\n",
" error.load_data(X_train, pd.Series(Y_train), sensitive_features=A_train.Sex)\n",
" disparity = DemographicParity()\n",
" disparity.load_data(X_train, pd.Series(Y_train), sensitive_features=A_train.Sex)\n",
" \n",
" errors.append(error.gamma(classifier)[0])\n",
" disparities.append(disparity.gamma(classifier).max())\n",
" \n",
"all_results = pd.DataFrame( {\"predictor\": predictors, \"error\": errors, \"disparity\": disparities})\n",
"\n",
"dominant_models_dict = dict()\n",
"base_name_format = \"census_gs_model_{0}\"\n",
"row_id = 0\n",
"for row in all_results.itertuples():\n",
" model_name = base_name_format.format(row_id)\n",
" errors_for_lower_or_eq_disparity = all_results[\"error\"][all_results[\"disparity\"]<=row.disparity]\n",
" if row.error <= errors_for_lower_or_eq_disparity.min():\n",
" dominant_models_dict[model_name] = row.predictor\n",
" row_id = row_id + 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can construct predictions for the dominant models (we include the unmitigated predictor as well, for comparison):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictions_dominant = {\"census_unmitigated\": unmitigated_predictor.predict(X_test)}\n",
"models_dominant = {\"census_unmitigated\": unmitigated_predictor}\n",
"for name, predictor in dominant_models_dict.items():\n",
" value = predictor.predict(X_test)\n",
" predictions_dominant[name] = value\n",
" models_dominant[name] = predictor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These predictions may then be viewed in the fairness dashboard. We include the race column from the dataset, as an alternative basis for assessing the models. However, since we have not based our mitigation on it, the variation in the models with respect to race can be large."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"FairlearnDashboard(sensitive_features=A_test, \n",
" sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=Y_test.tolist(),\n",
" y_pred=predictions_dominant)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using sex as the sensitive feature, we see a Pareto front forming - the set of predictors which represent optimal tradeoffs between accuracy and disparity in predictions. In the ideal case, we would have a predictor at (1,0) - perfectly accurate and without any unfairness under demographic parity (with respect to the protected attribute \"sex\"). The Pareto front represents the closest we can come to this ideal based on our data and choice of estimator. Note the range of the axes - the disparity axis covers more values than the accuracy, so we can reduce disparity substantially for a small loss in accuracy. Finally, we also see that the unmitigated model is towards the top right of the plot, with high accuracy, but worst disparity.\n",
"\n",
"By clicking on individual models on the plot, we can inspect their metrics for disparity and accuracy in greater detail. In a real example, we would then pick the model which represented the best trade-off between accuracy and disparity given the relevant business constraints."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"AzureUpload\"></a>\n",
"## Uploading a Fairness Dashboard to Azure\n",
"\n",
"Uploading a fairness dashboard to Azure is a two stage process. The `FairlearnDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. By default, the dashboard in Azure Machine Learning Studio also requires the models to be registered. The required stages are therefore:\n",
"1. Register the dominant models\n",
"1. Precompute all the required metrics\n",
"1. Upload to Azure\n",
"\n",
"Before that, we need to connect to Azure Machine Learning Studio:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace, Experiment, Model\n",
"\n",
"ws = Workspace.from_config()\n",
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"RegisterModels\"></a>\n",
"### Registering Models\n",
"\n",
"The fairness dashboard is designed to integrate with registered models, so we need to do this for the models we want in the Studio portal. The assumption is that the names of the models specified in the dashboard dictionary correspond to the `id`s (i.e. `<name>:<version>` pairs) of registered models in the workspace. We register each of the models in the `models_dominant` dictionary into the workspace. For this, we have to save each model to a file, and then register that file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import joblib\n",
"import os\n",
"\n",
"os.makedirs('models', exist_ok=True)\n",
"def register_model(name, model):\n",
" print(\"Registering \", name)\n",
" model_path = \"models/{0}.pkl\".format(name)\n",
" joblib.dump(value=model, filename=model_path)\n",
" registered_model = Model.register(model_path=model_path,\n",
" model_name=name,\n",
" workspace=ws)\n",
" print(\"Registered \", registered_model.id)\n",
" return registered_model.id\n",
"\n",
"model_name_id_mapping = dict()\n",
"for name, model in models_dominant.items():\n",
" m_id = register_model(name, model)\n",
" model_name_id_mapping[name] = m_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, produce new predictions dictionaries, with the updated names:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictions_dominant_ids = dict()\n",
"for name, y_pred in predictions_dominant.items():\n",
" predictions_dominant_ids[model_name_id_mapping[name]] = y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"PrecomputeMetrics\"></a>\n",
"### Precomputing Metrics\n",
"\n",
"We create a _dashboard dictionary_ using Fairlearn's `metrics` package. The `_create_group_metric_set` method has arguments similar to the Dashboard constructor, except that the sensitive features are passed as a dictionary (to ensure that names are available), and we must specify the type of prediction. Note that we use the `predictions_dominant_ids` dictionary we just created:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf = { 'sex': A_test.Sex, 'race': A_test.Race }\n",
"\n",
"from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
"\n",
"\n",
"dash_dict = _create_group_metric_set(y_true=Y_test,\n",
" predictions=predictions_dominant_ids,\n",
" sensitive_features=sf,\n",
" prediction_type='binary_classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"DashboardUpload\"></a>\n",
"### Uploading the Dashboard\n",
"\n",
"Now, we import our `contrib` package which contains the routine to perform the upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can create an Experiment, then a Run, and upload our dashboard to it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, \"Test_Fairlearn_GridSearch_Census_Demo\")\n",
"print(exp)\n",
"\n",
"run = exp.start_logging()\n",
"try:\n",
" dashboard_title = \"Dominant Models from GridSearch\"\n",
" upload_id = upload_dashboard_dictionary(run,\n",
" dash_dict,\n",
" dashboard_name=dashboard_title)\n",
" print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
"\n",
" downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
"finally:\n",
" run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dashboard can be viewed in the Run Details page.\n",
"\n",
"Finally, we can verify that the dashboard dictionary which we downloaded matches our upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(dash_dict == downloaded_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"Conclusion\"></a>\n",
"## Conclusion\n",
"\n",
"In this notebook we have demonstrated how to use the `GridSearch` algorithm from Fairlearn to generate a collection of models, and then present them in the fairness dashboard in Azure Machine Learning Studio. Please remember that this notebook has not attempted to discuss the many considerations which should be part of any approach to unfairness mitigation. The [Fairlearn website](http://fairlearn.github.io/) provides that discussion"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "riedgar"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,8 @@
name: fairlearn-azureml-mitigation
dependencies:
- pip:
- azureml-sdk
- azureml-contrib-fairness
- fairlearn==0.4.6
- joblib
- shap

View File

@@ -0,0 +1,494 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/fairness/upload-fairness-dashboard.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Upload a Fairness Dashboard to Azure Machine Learning Studio\n",
"**This notebook shows how to generate and upload a fairness assessment dashboard from Fairlearn to AzureML Studio**\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Loading the Data](#LoadingData)\n",
"1. [Processing the Data](#ProcessingData)\n",
"1. [Training Models](#TrainingModels)\n",
"1. [Logging in to AzureML](#LoginAzureML)\n",
"1. [Registering the Models](#RegisterModels)\n",
"1. [Using the Fairlearn Dashboard](#LocalDashboard)\n",
"1. [Uploading a Fairness Dashboard to Azure](#AzureUpload)\n",
" 1. Computing Fairness Metrics\n",
" 1. Uploading to Azure\n",
"1. [Conclusion](#Conclusion)\n",
" \n",
"\n",
"<a id=\"Introduction\"></a>\n",
"## Introduction\n",
"\n",
"In this notebook, we walk through a simple example of using the `azureml-contrib-fairness` package to upload a collection of fairness statistics for a fairness dashboard. It is an example of integrating the [open source Fairlearn package](https://www.github.com/fairlearn/fairlearn) with Azure Machine Learning. This is not an example of fairness analysis or mitigation - this notebook simply shows how to get a fairness dashboard into the Azure Machine Learning portal. We will load the data and train a couple of simple models. We will then use Fairlearn to generate data for a Fairness dashboard, which we can upload to Azure Machine Learning portal and view there.\n",
"\n",
"### Setup\n",
"\n",
"To use this notebook, an Azure Machine Learning workspace is required.\n",
"Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
"This notebook also requires the following packages:\n",
"* `azureml-contrib-fairness`\n",
"* `fairlearn==0.4.6`\n",
"* `joblib`\n",
"* `shap`\n",
"\n",
"\n",
"\n",
"\n",
"<a id=\"LoadingData\"></a>\n",
"## Loading the Data\n",
"We use the well-known `adult` census dataset, which we load using `shap` (for convenience). We start with a fairly unremarkable set of imports:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn import svm\n",
"from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
"from sklearn.linear_model import LogisticRegression\n",
"import pandas as pd\n",
"import shap"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can load the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_raw, Y = shap.datasets.adult()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can take a look at some of the data. For example, the next cells shows the counts of the different races identified in the dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(X_raw[\"Race\"].value_counts().to_dict())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ProcessingData\"></a>\n",
"## Processing the Data\n",
"\n",
"With the data loaded, we process it for our needs. First, we extract the sensitive features of interest into `A` (conventionally used in the literature) and put the rest of the feature data into `X`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A = X_raw[['Sex','Race']]\n",
"X = X_raw.drop(labels=['Sex', 'Race'],axis = 1)\n",
"X = pd.get_dummies(X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we apply a standard set of scalings:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sc = StandardScaler()\n",
"X_scaled = sc.fit_transform(X)\n",
"X_scaled = pd.DataFrame(X_scaled, columns=X.columns)\n",
"\n",
"le = LabelEncoder()\n",
"Y = le.fit_transform(Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can then split our data into training and test sets, and also make the labels on our test portion of `A` human-readable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_scaled, \n",
" Y, \n",
" A,\n",
" test_size = 0.2,\n",
" random_state=0,\n",
" stratify=Y)\n",
"\n",
"# Work around indexing issue\n",
"X_train = X_train.reset_index(drop=True)\n",
"A_train = A_train.reset_index(drop=True)\n",
"X_test = X_test.reset_index(drop=True)\n",
"A_test = A_test.reset_index(drop=True)\n",
"\n",
"# Improve labels\n",
"A_test.Sex.loc[(A_test['Sex'] == 0)] = 'female'\n",
"A_test.Sex.loc[(A_test['Sex'] == 1)] = 'male'\n",
"\n",
"\n",
"A_test.Race.loc[(A_test['Race'] == 0)] = 'Amer-Indian-Eskimo'\n",
"A_test.Race.loc[(A_test['Race'] == 1)] = 'Asian-Pac-Islander'\n",
"A_test.Race.loc[(A_test['Race'] == 2)] = 'Black'\n",
"A_test.Race.loc[(A_test['Race'] == 3)] = 'Other'\n",
"A_test.Race.loc[(A_test['Race'] == 4)] = 'White'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"TrainingModels\"></a>\n",
"## Training Models\n",
"\n",
"We now train a couple of different models on our data. The `adult` census dataset is a classification problem - the goal is to predict whether a particular individual exceeds an income threshold. For the purpose of generating a dashboard to upload, it is sufficient to train two basic classifiers. First, a logistic regression classifier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lr_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
"\n",
"lr_predictor.fit(X_train, Y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And for comparison, a support vector classifier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"svm_predictor = svm.SVC()\n",
"\n",
"svm_predictor.fit(X_train, Y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"LoginAzureML\"></a>\n",
"## Logging in to AzureML\n",
"\n",
"With our two classifiers trained, we can log into our AzureML workspace:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace, Experiment, Model\n",
"\n",
"ws = Workspace.from_config()\n",
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"RegisterModels\"></a>\n",
"## Registering the Models\n",
"\n",
"Next, we register our models. By default, the subroutine which uploads the models checks that the names provided correspond to registered models in the workspace. We define a utility routine to do the registering:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import joblib\n",
"import os\n",
"\n",
"os.makedirs('models', exist_ok=True)\n",
"def register_model(name, model):\n",
" print(\"Registering \", name)\n",
" model_path = \"models/{0}.pkl\".format(name)\n",
" joblib.dump(value=model, filename=model_path)\n",
" registered_model = Model.register(model_path=model_path,\n",
" model_name=name,\n",
" workspace=ws)\n",
" print(\"Registered \", registered_model.id)\n",
" return registered_model.id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we register the models. For convenience in subsequent method calls, we store the results in a dictionary, which maps the `id` of the registered model (a string in `name:version` format) to the predictor itself:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_dict = {}\n",
"\n",
"lr_reg_id = register_model(\"fairness_linear_regression\", lr_predictor)\n",
"model_dict[lr_reg_id] = lr_predictor\n",
"svm_reg_id = register_model(\"fairness_svm\", svm_predictor)\n",
"model_dict[svm_reg_id] = svm_predictor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"LocalDashboard\"></a>\n",
"## Using the Fairlearn Dashboard\n",
"\n",
"We can now examine the fairness of the two models we have training, both as a function of race and (binary) sex. Before uploading the dashboard to the AzureML portal, we will first instantiate a local instance of the Fairlearn dashboard.\n",
"\n",
"Regardless of the viewing location, the dashboard is based on three things - the true values, the model predictions and the sensitive feature values. The dashboard can use predictions from multiple models and multiple sensitive features if desired (as we are doing here).\n",
"\n",
"Our first step is to generate a dictionary mapping the `id` of the registered model to the corresponding array of predictions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ys_pred = {}\n",
"for n, p in model_dict.items():\n",
" ys_pred[n] = p.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can examine these predictions in a locally invoked Fairlearn dashboard. This can be compared to the dashboard uploaded to the portal (in the next section):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairlearn.widget import FairlearnDashboard\n",
"\n",
"FairlearnDashboard(sensitive_features=A_test, \n",
" sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=Y_test.tolist(),\n",
" y_pred=ys_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"AzureUpload\"></a>\n",
"## Uploading a Fairness Dashboard to Azure\n",
"\n",
"Uploading a fairness dashboard to Azure is a two stage process. The `FairlearnDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. The required stages are therefore:\n",
"1. Precompute all the required metrics\n",
"1. Upload to Azure\n",
"\n",
"\n",
"### Computing Fairness Metrics\n",
"We use Fairlearn to create a dictionary which contains all the data required to display a dashboard. This includes both the raw data (true values, predicted values and sensitive features), and also the fairness metrics. The API is similar to that used to invoke the Dashboard locally. However, there are a few minor changes to the API, and the type of problem being examined (binary classification, regression etc.) needs to be specified explicitly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf = { 'Race': A_test.Race, 'Sex': A_test.Sex }\n",
"\n",
"from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
"\n",
"dash_dict = _create_group_metric_set(y_true=Y_test,\n",
" predictions=ys_pred,\n",
" sensitive_features=sf,\n",
" prediction_type='binary_classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `_create_group_metric_set()` method is currently underscored since its exact design is not yet final in Fairlearn."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Uploading to Azure\n",
"\n",
"We can now import the `azureml.contrib.fairness` package itself. We will round-trip the data, so there are two required subroutines:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can upload the generated dictionary to AzureML. The upload method requires a run, so we first create an experiment and a run. The uploaded dashboard can be seen on the corresponding Run Details page in AzureML Studio. For completeness, we also download the dashboard dictionary which we uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, \"notebook-01\")\n",
"print(exp)\n",
"\n",
"run = exp.start_logging()\n",
"try:\n",
" dashboard_title = \"Sample notebook upload\"\n",
" upload_id = upload_dashboard_dictionary(run,\n",
" dash_dict,\n",
" dashboard_name=dashboard_title)\n",
" print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
"\n",
" downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
"finally:\n",
" run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can verify that the dashboard dictionary which we downloaded matches our upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(dash_dict == downloaded_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"Conclusion\"></a>\n",
"## Conclusion\n",
"\n",
"In this notebook we have demonstrated how to generate and upload a fairness dashboard to AzureML Studio. We have not discussed how to analyse the results and apply mitigations. Those topics will be covered elsewhere."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "riedgar"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,8 @@
name: upload-fairness-dashboard
dependencies:
- pip:
- azureml-sdk
- azureml-contrib-fairness
- fairlearn==0.4.6
- joblib
- shap

View File

@@ -105,7 +105,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -93,7 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -88,7 +88,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -114,7 +114,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -87,7 +87,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -94,7 +94,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -82,7 +82,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -96,7 +96,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -98,7 +98,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -92,7 +92,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -716,7 +716,6 @@
"\n",
"trainWithAutomlStep = AutoMLStep(name='AutoML_Regression',\n",
" automl_config=automl_config,\n",
" passthru_automl_config=False,\n",
" allow_reuse=True)\n",
"print(\"trainWithAutomlStep created.\")"
]

View File

@@ -13,7 +13,7 @@ def init():
global g_tf_sess
# pull down model from workspace
model_path = Model.get_model_path("mnist")
model_path = Model.get_model_path("mnist-prs")
# contruct graph to execute
tf.reset_default_graph()

View File

@@ -274,7 +274,7 @@
"\n",
"# register downloaded model \n",
"model = Model.register(model_path = \"models/\",\n",
" model_name = \"mnist\", # this is the name the model is registered as\n",
" model_name = \"mnist-prs\", # this is the name the model is registered as\n",
" tags = {'pretrained': \"mnist\"},\n",
" description = \"Mnist trained tensorflow model\",\n",
" workspace = ws)"
@@ -474,8 +474,7 @@
"outputs": [],
"source": [
"path_on_datastore = mnist_data.path('mnist/0.png')\n",
"single_image_ds = Dataset.File.from_files(path=path_on_datastore, validate=False)\n",
"single_image_ds._ensure_saved(ws)"
"single_image_ds = Dataset.File.from_files(path=path_on_datastore, validate=False)"
]
},
{

View File

@@ -227,7 +227,7 @@
"\n",
"# register downloaded model\n",
"model = Model.register(model_path = \"iris_model.pkl/iris_model.pkl\",\n",
" model_name = \"iris\", # this is the name the model is registered as\n",
" model_name = \"iris-prs\", # this is the name the model is registered as\n",
" tags = {'pretrained': \"iris\"},\n",
" workspace = ws)"
]
@@ -356,7 +356,7 @@
" inputs=[named_iris_ds],\n",
" output=output_folder,\n",
" parallel_run_config=parallel_run_config,\n",
" arguments=['--model_name', 'iris'],\n",
" arguments=['--model_name', 'iris-prs'],\n",
" allow_reuse=True\n",
")"
]
@@ -380,7 +380,7 @@
"\n",
"pipeline = Pipeline(workspace=ws, steps=[distributed_csv_iris_step])\n",
"\n",
"pipeline_run = Experiment(ws, 'iris').submit(pipeline)"
"pipeline_run = Experiment(ws, 'iris-prs').submit(pipeline)"
]
},
{

View File

@@ -456,6 +456,24 @@
"monitor.enable_schedule()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete the DataDriftDetector\n",
"\n",
"Invoking the `delete()` method on the object deletes the the drift monitor permanently and cannot be undone. You will no longer be able to find it in the UI and the `list()` or `get()` methods. The object on which delete() was called will have its state set to deleted and name suffixed with deleted. The baseline and target datasets and model data that was collected, if any, are not deleted. The compute is not deleted. The DataDrift schedule pipeline is disabled and archived."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"monitor.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -100,7 +100,7 @@
"\n",
"# Check core SDK version number\n",
"\n",
"print(\"This notebook was created using SDK version 1.7.0, you are currently running version\", azureml.core.VERSION)"
"print(\"This notebook was created using SDK version 1.8.0, you are currently running version\", azureml.core.VERSION)"
]
},
{

View File

@@ -0,0 +1,374 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-amlcompute/train-on-computeinstance.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train using Azure Machine Learning Compute Instance\n",
"\n",
"* Initialize Workspace\n",
"* Introduction to ComputeInstance\n",
"* Create an Experiment\n",
"* Submit ComputeInstance run\n",
"* Additional operations to perform on ComputeInstance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning ComputeInstance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to ComputeInstance\n",
"\n",
"\n",
"Azure Machine Learning compute instance is a fully-managed cloud-based workstation optimized for your machine learning development environment. It is created **within your workspace region**.\n",
"\n",
"For more information on ComputeInstance, please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-instance)\n",
"\n",
"**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create ComputeInstance\n",
"First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since ComputeInstance is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D3_V2') is supported.\n",
"\n",
"You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "check_region"
},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, ComputeInstance\n",
"\n",
"ComputeInstance.supported_vmsizes(workspace = ws)\n",
"# ComputeInstance.supported_vmsizes(workspace = ws, location='eastus')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "create_instance"
},
"outputs": [],
"source": [
"import datetime\n",
"import time\n",
"\n",
"from azureml.core.compute import ComputeTarget, ComputeInstance\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your instance\n",
"compute_name = \"compute-instance\"\n",
"\n",
"# Verify that instance does not exist already\n",
"try:\n",
" instance = ComputeInstance(workspace=ws, name=compute_name)\n",
" print('Found existing instance, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = ComputeInstance.provisioning_configuration(\n",
" vm_size='STANDARD_D3_V2',\n",
" ssh_public_access=False,\n",
" # vnet_resourcegroup_name='<my-resource-group>',\n",
" # vnet_name='<my-vnet-name>',\n",
" # subnet_name='default',\n",
" # admin_user_ssh_public_key='<my-sshkey>'\n",
" )\n",
" instance = ComputeInstance.create(ws, compute_name, compute_config)\n",
" instance.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create An Experiment\n",
"\n",
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"experiment_name = 'train-on-computeinstance'\n",
"experiment = Experiment(workspace = ws, name = experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit ComputeInstance run\n",
"The training script `train.py` is already created for you"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create environment\n",
"\n",
"Create Docker based environment with scikit-learn installed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"myenv = Environment(\"myenv\")\n",
"\n",
"myenv.docker.enabled = True\n",
"myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
"\n",
"src = ScriptRunConfig(source_directory='', script='train.py')\n",
"\n",
"# Set compute target to the one created in previous step\n",
"src.run_config.target = instance\n",
"\n",
"# Set environment\n",
"src.run_config.environment = myenv\n",
" \n",
"run = experiment.submit(config=src)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(run.get_metrics())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional operations to perform on ComputeInstance\n",
"\n",
"You can perform more operations on ComputeInstance such as get status, change the state or deleting the compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "get_status"
},
"outputs": [],
"source": [
"# get_status() gets the latest status of the ComputeInstance target\n",
"instance.get_status()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "stop"
},
"outputs": [],
"source": [
"# stop() is used to stop the ComputeInstance\n",
"# Stopping ComputeInstance will stop the billing meter and persist the state on the disk.\n",
"# Available Quota will not be changed with this operation.\n",
"instance.stop(wait_for_completion=True, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "start"
},
"outputs": [],
"source": [
"# start() is used to start the ComputeInstance if it is in stopped state\n",
"instance.start(wait_for_completion=True, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# restart() is used to restart the ComputeInstance\n",
"instance.restart(wait_for_completion=True, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# delete() is used to delete the ComputeInstance target. Useful if you want to re-use the compute name \n",
"# instance.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "ramagott"
}
],
"category": "training",
"compute": [
"Compute Instance"
],
"datasets": [
"Diabetes"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"None"
],
"friendly_name": "Train on Azure Machine Learning Compute Instance",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"tags": [
"None"
],
"task": "Submit a run on Azure Machine Learning Compute Instance."
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,6 @@
name: train-on-computeinstance
dependencies:
- scikit-learn
- pip:
- azureml-sdk
- azureml-widgets

View File

@@ -0,0 +1,44 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from azureml.core.run import Run
from sklearn.externals import joblib
import os
import numpy as np
os.makedirs('./outputs', exist_ok=True)
X, y = load_diabetes(return_X_y=True)
run = Run.get_context()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=0)
data = {"train": {"X": X_train, "y": y_train},
"test": {"X": X_test, "y": y_test}}
# list of numbers from 0.0 to 1.0 with a 0.05 interval
alphas = np.arange(0.0, 1.0, 0.05)
for alpha in alphas:
# Use Ridge algorithm to create a regression model
reg = Ridge(alpha=alpha)
reg.fit(data["train"]["X"], data["train"]["y"])
preds = reg.predict(data["test"]["X"])
mse = mean_squared_error(preds, data["test"]["y"])
run.log('alpha', alpha)
run.log('mse', mse)
model_file_name = 'ridge_{0:.2f}.pkl'.format(alpha)
# save model in the outputs folder so it automatically get uploaded
with open(model_file_name, "wb") as file:
joblib.dump(value=reg, filename=os.path.join('./outputs/',
model_file_name))
print('alpha is {0:.2f}, and mse is {1:0.2f}'.format(alpha, mse))

View File

@@ -380,6 +380,24 @@
"#compute_target.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete the DataDriftDetector\n",
"\n",
"Invoking the `delete()` method on the object deletes the the drift monitor permanently and cannot be undone. You will no longer be able to find it in the UI and the `list()` or `get()` methods. The object on which delete() was called will have its state set to deleted and name suffixed with deleted. The baseline and target datasets and model data that was collected, if any, are not deleted. The compute is not deleted. The DataDrift schedule pipeline is disabled and archived."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"monitor.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -65,6 +65,7 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
| [Resuming a model](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb) | Resume a model in TensorFlow from a previously submitted run | MNIST | AML Compute | None | TensorFlow | None |
| [Training in Spark](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb) | Submiting a run on a spark cluster | None | HDI cluster | None | PySpark | None |
| [Train on Azure Machine Learning Compute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb) | Submit a run on Azure Machine Learning Compute. | Diabetes | AML Compute | None | None | None |
| [Train on Azure Machine Learning Compute Instance](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-computeinstance/train-on-computeinstance.ipynb) | Submit a run on Azure Machine Learning Compute Instance. | Diabetes | Compute Instance | None | None | None |
| [Train on local compute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-local/train-on-local.ipynb) | Train a model locally | Diabetes | Local | None | None | None |
| [Train in a remote Linux virtual machine](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) | Configure and execute a run | Diabetes | Data Science Virtual Machine | None | None | None |
| [Using Tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb) | Export the run history as Tensorboard logs | None | None | None | TensorFlow | None |
@@ -95,6 +96,8 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:|
| [DNN Text Featurization](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.ipynb) | Text featurization using DNNs for classification | None | AML Compute | None | None | None |
| [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) | | | | | | |
| [fairlearn-azureml-mitigation](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/fairness/fairlearn-azureml-mitigation.ipynb) | | | | | | |
| [upload-fairness-dashboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/fairness/upload-fairness-dashboard.ipynb) | | | | | | |
| [lightgbm-example](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/gbdt/lightgbm/lightgbm-example.ipynb) | | | | | | |
| [azure-ml-with-nvidia-rapids](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb) | | | | | | |
| [auto-ml-continuous-retraining](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/continuous-retraining/auto-ml-continuous-retraining.ipynb) | | | | | | |

View File

@@ -102,7 +102,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.7.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.8.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -217,6 +217,7 @@
"outputs": [],
"source": [
"%%time\n",
"import uuid\n",
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
@@ -230,8 +231,9 @@
"myenv = Environment.get(workspace=ws, name=\"tutorial-env\", version=\"1\")\n",
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n",
"\n",
"service_name = 'sklearn-mnist-svc-' + str(uuid.uuid4())[:4]\n",
"service = Model.deploy(workspace=ws, \n",
" name='sklearn-mnist-svc', \n",
" name=service_name, \n",
" models=[model], \n",
" inference_config=inference_config, \n",
" deployment_config=aciconfig)\n",

View File

@@ -272,6 +272,7 @@
"outputs": [],
"source": [
"%%time\n",
"import uuid\n",
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
@@ -284,8 +285,9 @@
"myenv = Environment.get(workspace=ws, name=\"tutorial-env\")\n",
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n",
"\n",
"service_name = 'sklearn-mnist-svc-' + str(uuid.uuid4())[:4]\n",
"service = Model.deploy(workspace=ws, \n",
" name='sklearn-mnist-svc', \n",
" name=service_name, \n",
" models=[model], \n",
" inference_config=inference_config, \n",
" deployment_config=aciconfig)\n",
@@ -489,7 +491,7 @@
"import json\n",
"from azureml.core import Webservice\n",
"\n",
"service = Webservice(ws, 'sklearn-mnist-svc')\n",
"service = Webservice(ws, service_name)\n",
"\n",
"#pass the connection string for blob storage to give the server access to the uploaded public keys \n",
"conn_str_template = 'DefaultEndpointsProtocol={};AccountName={};AccountKey={};EndpointSuffix=core.windows.net'\n",