Compare commits

...

13 Commits

Author SHA1 Message Date
jeff-shepherd
f1aff553c4 Merge pull request #1980 from Man-MSFT/mafong/fairness-dep
Remove fairness notebooks
2025-03-14 09:42:02 -07:00
Man Fong
d195a673e2 Remove fairness notebooks 2025-03-13 14:25:59 -07:00
jeff-shepherd
8dce0fa6fe Merge pull request #1977 from Azure/jeffshep/windowsonnx
Pin onnx on Windows
2024-12-16 08:44:42 -08:00
Jeff Shepherd
4e8a240a71 Pin onnx on Windows 2024-12-13 15:51:10 -08:00
jeff-shepherd
5b019e28de Merge pull request #1976 from Azure/release_update_stablev2/Release-247
update samples from Release-247 as a part of 1.59.0 SDK stable release
2024-12-13 08:50:52 -08:00
amlrelsa-ms
bf4cb1e86c update samples from Release-247 as a part of 1.59.0 SDK stable release 2024-12-10 17:34:41 +00:00
jeff-shepherd
eaa7c56590 Merge pull request #1974 from Azure/jeffshep/post158sync
Remove deprecated sample notebooks
2024-11-04 09:20:56 -08:00
Jeff Shepherd
8fc0fa040d Remove deprecated sample notebooks 2024-11-01 11:49:20 -07:00
jeff-shepherd
56e13b0b9a Merge pull request #1972 from Azure/release_update_stablev2/Release-243
update samples from Release-243 as a part of 1.58.0 SDK stable release
2024-10-21 09:03:36 -07:00
amlrelsa-ms
785fe3c962 update samples from Release-243 as a part of 1.58.0 SDK stable release 2024-10-16 17:50:12 +00:00
jeff-shepherd
3c341f6e9a Merge pull request #1968 from Azure/release_update_stablev2/Release-240
update samples from Release-240 as a part of 1.57.0 SDK stable release
2024-08-08 08:36:05 -07:00
amlrelsa-ms
aae88e87ea update samples from Release-240 as a part of 1.57.0 SDK stable release 2024-08-05 21:57:46 +00:00
jeff-shepherd
2352e458c7 Merge pull request #1963 from Azure/release_update_stablev2/Release-209
update samples from Release-209 as a part of 1.56.0 SDK stable release
2024-05-16 09:15:57 -07:00
187 changed files with 224 additions and 8854 deletions

View File

@@ -103,7 +103,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.56.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.59.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -1,621 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/fairness/fairlearn-azureml-mitigation.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unfairness Mitigation with Fairlearn and Azure Machine Learning\n",
"**This notebook shows how to upload results from Fairlearn's GridSearch mitigation algorithm into a dashboard in Azure Machine Learning Studio**\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Loading the Data](#LoadingData)\n",
"1. [Training an Unmitigated Model](#UnmitigatedModel)\n",
"1. [Mitigation with GridSearch](#Mitigation)\n",
"1. [Uploading a Fairness Dashboard to Azure](#AzureUpload)\n",
" 1. Registering models\n",
" 1. Computing Fairness Metrics\n",
" 1. Uploading to Azure\n",
"1. [Conclusion](#Conclusion)\n",
"\n",
"<a id=\"Introduction\"></a>\n",
"## Introduction\n",
"This notebook shows how to use [Fairlearn (an open source fairness assessment and unfairness mitigation package)](http://fairlearn.org) and Azure Machine Learning Studio for a binary classification problem. This example uses the well-known adult census dataset. For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan. Its purpose is purely illustrative of a workflow including a fairness dashboard - in particular, we do **not** include a full discussion of the detailed issues which arise when considering fairness in machine learning. For such discussions, please [refer to the Fairlearn website](http://fairlearn.org/).\n",
"\n",
"We will apply the [grid search algorithm](https://fairlearn.org/v0.4.6/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
"\n",
"### Setup\n",
"\n",
"To use this notebook, an Azure Machine Learning workspace is required.\n",
"Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
"This notebook also requires the following packages:\n",
"* `azureml-contrib-fairness`\n",
"* `fairlearn>=0.6.2` (pre-v0.5.0 will work with minor modifications)\n",
"* `joblib`\n",
"* `liac-arff`\n",
"* `raiwidgets`\n",
"\n",
"Fairlearn relies on features introduced in v0.22.1 of `scikit-learn`. If you have an older version already installed, please uncomment and run the following cell:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# !pip install --upgrade scikit-learn>=0.22.1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, please ensure that when you downloaded this notebook, you also downloaded the `fairness_nb_utils.py` file from the same location, and placed it in the same directory as this notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"LoadingData\"></a>\n",
"## Loading the Data\n",
"We use the well-known `adult` census dataset, which we will fetch from the OpenML website. We start with a fairly unremarkable set of imports:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairlearn.reductions import GridSearch, DemographicParity, ErrorRate\n",
"from raiwidgets import FairnessDashboard\n",
"\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.compose import make_column_selector as selector\n",
"from sklearn.pipeline import Pipeline\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now load and inspect the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairness_nb_utils import fetch_census_dataset\n",
"\n",
"data = fetch_census_dataset()\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
"y = (data.target == '>50K') * 1\n",
"\n",
"X_raw[\"race\"].value_counts().to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to treat the sex and race of each individual as protected attributes, and in this particular case we are going to remove these attributes from the main data (this is not always the best option - see the [Fairlearn website](http://fairlearn.github.io/) for further discussion). Protected attributes are often denoted by 'A' in the literature, and we follow that convention here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A = X_raw[['sex','race']]\n",
"X_raw = X_raw.drop(labels=['sex', 'race'], axis = 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now preprocess our data. To avoid the problem of data leakage, we split our data into training and test sets before performing any other transformations. Subsequent transformations (such as scalings) will be fit to the training data set, and then applied to the test dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(X_train, X_test, y_train, y_test, A_train, A_test) = train_test_split(\n",
" X_raw, y, A, test_size=0.3, random_state=12345, stratify=y\n",
")\n",
"\n",
"# Ensure indices are aligned between X, y and A,\n",
"# after all the slicing and splitting of DataFrames\n",
"# and Series\n",
"\n",
"X_train = X_train.reset_index(drop=True)\n",
"X_test = X_test.reset_index(drop=True)\n",
"y_train = y_train.reset_index(drop=True)\n",
"y_test = y_test.reset_index(drop=True)\n",
"A_train = A_train.reset_index(drop=True)\n",
"A_test = A_test.reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have two types of column in the dataset - categorical columns which will need to be one-hot encoded, and numeric ones which will need to be rescaled. We also need to take care of missing values. We use a simple approach here, but please bear in mind that this is another way that bias could be introduced (especially if one subgroup tends to have more missing values).\n",
"\n",
"For this preprocessing, we make use of `Pipeline` objects from `sklearn`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"numeric_transformer = Pipeline(\n",
" steps=[\n",
" (\"impute\", SimpleImputer()),\n",
" (\"scaler\", StandardScaler()),\n",
" ]\n",
")\n",
"\n",
"categorical_transformer = Pipeline(\n",
" [\n",
" (\"impute\", SimpleImputer(strategy=\"most_frequent\")),\n",
" (\"ohe\", OneHotEncoder(handle_unknown=\"ignore\", sparse=False)),\n",
" ]\n",
")\n",
"\n",
"preprocessor = ColumnTransformer(\n",
" transformers=[\n",
" (\"num\", numeric_transformer, selector(dtype_exclude=\"category\")),\n",
" (\"cat\", categorical_transformer, selector(dtype_include=\"category\")),\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, the preprocessing pipeline is defined, we can run it on our training data, and apply the generated transform to our test data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train = preprocessor.fit_transform(X_train)\n",
"X_test = preprocessor.transform(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"UnmitigatedModel\"></a>\n",
"## Training an Unmitigated Model\n",
"\n",
"So we have a point of comparison, we first train a model (specifically, logistic regression from scikit-learn) on the raw data, without applying any mitigation algorithm:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"unmitigated_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
"\n",
"unmitigated_predictor.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can view this model in the fairness dashboard, and see the disparities which appear:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"FairnessDashboard(sensitive_features=A_test,\n",
" y_true=y_test,\n",
" y_pred={\"unmitigated\": unmitigated_predictor.predict(X_test)})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the disparity in accuracy when we select 'Sex' as the sensitive feature, we see that males have an error rate about three times greater than the females. More interesting is the disparity in opportunitiy - males are offered loans at three times the rate of females.\n",
"\n",
"Despite the fact that we removed the feature from the training data, our predictor still discriminates based on sex. This demonstrates that simply ignoring a protected attribute when fitting a predictor rarely eliminates unfairness. There will generally be enough other features correlated with the removed attribute to lead to disparate impact."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"Mitigation\"></a>\n",
"## Mitigation with GridSearch\n",
"\n",
"The `GridSearch` class in `Fairlearn` implements a simplified version of the exponentiated gradient reduction of [Agarwal et al. 2018](https://arxiv.org/abs/1803.02453). The user supplies a standard ML estimator, which is treated as a blackbox - for this simple example, we shall use the logistic regression estimator from scikit-learn. `GridSearch` works by generating a sequence of relabellings and reweightings, and trains a predictor for each.\n",
"\n",
"For this example, we specify demographic parity (on the protected attribute of sex) as the fairness metric. Demographic parity requires that individuals are offered the opportunity (a loan in this example) independent of membership in the protected class (i.e., females and males should be offered loans at the same rate). *We are using this metric for the sake of simplicity* in this example; the appropriate fairness metric can only be selected after *careful examination of the broader context* in which the model is to be used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sweep = GridSearch(LogisticRegression(solver='liblinear', fit_intercept=True),\n",
" constraints=DemographicParity(),\n",
" grid_size=71)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With our estimator created, we can fit it to the data. After `fit()` completes, we extract the full set of predictors from the `GridSearch` object.\n",
"\n",
"The following cell trains a many copies of the underlying estimator, and may take a minute or two to run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sweep.fit(X_train, y_train,\n",
" sensitive_features=A_train.sex)\n",
"\n",
"# For Fairlearn pre-v0.5.0, need sweep._predictors\n",
"predictors = sweep.predictors_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could load these predictors into the Fairness dashboard now. However, the plot would be somewhat confusing due to their number. In this case, we are going to remove the predictors which are dominated in the error-disparity space by others from the sweep (note that the disparity will only be calculated for the protected attribute; other potentially protected attributes will *not* be mitigated). In general, one might not want to do this, since there may be other considerations beyond the strict optimisation of error and disparity (of the given protected attribute)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"errors, disparities = [], []\n",
"for predictor in predictors:\n",
" error = ErrorRate()\n",
" error.load_data(X_train, pd.Series(y_train), sensitive_features=A_train.sex)\n",
" disparity = DemographicParity()\n",
" disparity.load_data(X_train, pd.Series(y_train), sensitive_features=A_train.sex)\n",
" \n",
" errors.append(error.gamma(predictor.predict)[0])\n",
" disparities.append(disparity.gamma(predictor.predict).max())\n",
" \n",
"all_results = pd.DataFrame( {\"predictor\": predictors, \"error\": errors, \"disparity\": disparities})\n",
"\n",
"dominant_models_dict = dict()\n",
"base_name_format = \"census_gs_model_{0}\"\n",
"row_id = 0\n",
"for row in all_results.itertuples():\n",
" model_name = base_name_format.format(row_id)\n",
" errors_for_lower_or_eq_disparity = all_results[\"error\"][all_results[\"disparity\"]<=row.disparity]\n",
" if row.error <= errors_for_lower_or_eq_disparity.min():\n",
" dominant_models_dict[model_name] = row.predictor\n",
" row_id = row_id + 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can construct predictions for the dominant models (we include the unmitigated predictor as well, for comparison):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictions_dominant = {\"census_unmitigated\": unmitigated_predictor.predict(X_test)}\n",
"models_dominant = {\"census_unmitigated\": unmitigated_predictor}\n",
"for name, predictor in dominant_models_dict.items():\n",
" value = predictor.predict(X_test)\n",
" predictions_dominant[name] = value\n",
" models_dominant[name] = predictor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These predictions may then be viewed in the fairness dashboard. We include the race column from the dataset, as an alternative basis for assessing the models. However, since we have not based our mitigation on it, the variation in the models with respect to race can be large."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"FairnessDashboard(sensitive_features=A_test, \n",
" y_true=y_test.tolist(),\n",
" y_pred=predictions_dominant)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using sex as the sensitive feature and accuracy as the metric, we see a Pareto front forming - the set of predictors which represent optimal tradeoffs between accuracy and disparity in predictions. In the ideal case, we would have a predictor at (1,0) - perfectly accurate and without any unfairness under demographic parity (with respect to the protected attribute \"sex\"). The Pareto front represents the closest we can come to this ideal based on our data and choice of estimator. Note the range of the axes - the disparity axis covers more values than the accuracy, so we can reduce disparity substantially for a small loss in accuracy. Finally, we also see that the unmitigated model is towards the top right of the plot, with high accuracy, but worst disparity.\n",
"\n",
"By clicking on individual models on the plot, we can inspect their metrics for disparity and accuracy in greater detail. In a real example, we would then pick the model which represented the best trade-off between accuracy and disparity given the relevant business constraints."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"AzureUpload\"></a>\n",
"## Uploading a Fairness Dashboard to Azure\n",
"\n",
"Uploading a fairness dashboard to Azure is a two stage process. The `FairnessDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. By default, the dashboard in Azure Machine Learning Studio also requires the models to be registered. The required stages are therefore:\n",
"1. Register the dominant models\n",
"1. Precompute all the required metrics\n",
"1. Upload to Azure\n",
"\n",
"Before that, we need to connect to Azure Machine Learning Studio:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace, Experiment, Model\n",
"\n",
"ws = Workspace.from_config()\n",
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"RegisterModels\"></a>\n",
"### Registering Models\n",
"\n",
"The fairness dashboard is designed to integrate with registered models, so we need to do this for the models we want in the Studio portal. The assumption is that the names of the models specified in the dashboard dictionary correspond to the `id`s (i.e. `<name>:<version>` pairs) of registered models in the workspace. We register each of the models in the `models_dominant` dictionary into the workspace. For this, we have to save each model to a file, and then register that file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import joblib\n",
"import os\n",
"\n",
"os.makedirs('models', exist_ok=True)\n",
"def register_model(name, model):\n",
" print(\"Registering \", name)\n",
" model_path = \"models/{0}.pkl\".format(name)\n",
" joblib.dump(value=model, filename=model_path)\n",
" registered_model = Model.register(model_path=model_path,\n",
" model_name=name,\n",
" workspace=ws)\n",
" print(\"Registered \", registered_model.id)\n",
" return registered_model.id\n",
"\n",
"model_name_id_mapping = dict()\n",
"for name, model in models_dominant.items():\n",
" m_id = register_model(name, model)\n",
" model_name_id_mapping[name] = m_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, produce new predictions dictionaries, with the updated names:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictions_dominant_ids = dict()\n",
"for name, y_pred in predictions_dominant.items():\n",
" predictions_dominant_ids[model_name_id_mapping[name]] = y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"PrecomputeMetrics\"></a>\n",
"### Precomputing Metrics\n",
"\n",
"We create a _dashboard dictionary_ using Fairlearn's `metrics` package. The `_create_group_metric_set` method has arguments similar to the Dashboard constructor, except that the sensitive features are passed as a dictionary (to ensure that names are available), and we must specify the type of prediction. Note that we use the `predictions_dominant_ids` dictionary we just created:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf = { 'sex': A_test.sex, 'race': A_test.race }\n",
"\n",
"from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
"\n",
"\n",
"dash_dict = _create_group_metric_set(y_true=y_test,\n",
" predictions=predictions_dominant_ids,\n",
" sensitive_features=sf,\n",
" prediction_type='binary_classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"DashboardUpload\"></a>\n",
"### Uploading the Dashboard\n",
"\n",
"Now, we import our `contrib` package which contains the routine to perform the upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can create an Experiment, then a Run, and upload our dashboard to it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, \"Test_Fairlearn_GridSearch_Census_Demo\")\n",
"print(exp)\n",
"\n",
"run = exp.start_logging()\n",
"try:\n",
" dashboard_title = \"Dominant Models from GridSearch\"\n",
" upload_id = upload_dashboard_dictionary(run,\n",
" dash_dict,\n",
" dashboard_name=dashboard_title)\n",
" print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
"\n",
" downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
"finally:\n",
" run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dashboard can be viewed in the Run Details page.\n",
"\n",
"Finally, we can verify that the dashboard dictionary which we downloaded matches our upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(dash_dict == downloaded_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"Conclusion\"></a>\n",
"## Conclusion\n",
"\n",
"In this notebook we have demonstrated how to use the `GridSearch` algorithm from Fairlearn to generate a collection of models, and then present them in the fairness dashboard in Azure Machine Learning Studio. Please remember that this notebook has not attempted to discuss the many considerations which should be part of any approach to unfairness mitigation. The [Fairlearn website](http://fairlearn.org/) provides that discussion"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "riedgar"
}
],
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,13 +0,0 @@
name: fairlearn-azureml-mitigation
dependencies:
- pip:
- azureml-sdk
- azureml-contrib-fairness
- fairlearn>=0.6.2,<=0.7.0
- joblib
- liac-arff
- raiwidgets~=0.33.0
- itsdangerous==2.0.1
- markupsafe<2.1.0
- protobuf==3.20.0
- numpy<1.24.0

View File

@@ -1,111 +0,0 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
"""Utilities for azureml-contrib-fairness notebooks."""
import arff
from collections import OrderedDict
from contextlib import closing
import gzip
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.utils import Bunch
import time
def fetch_openml_with_retries(data_id, max_retries=4, retry_delay=60):
"""Fetch a given dataset from OpenML with retries as specified."""
for i in range(max_retries):
try:
print("Download attempt {0} of {1}".format(i + 1, max_retries))
data = fetch_openml(data_id=data_id, as_frame=True)
break
except Exception as e: # noqa: B902
print("Download attempt failed with exception:")
print(e)
if i + 1 != max_retries:
print("Will retry after {0} seconds".format(retry_delay))
time.sleep(retry_delay)
retry_delay = retry_delay * 2
else:
raise RuntimeError("Unable to download dataset from OpenML")
return data
_categorical_columns = [
'workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country'
]
def fetch_census_dataset():
"""Fetch the Adult Census Dataset.
This uses a particular URL for the Adult Census dataset. The code
is a simplified version of fetch_openml() in sklearn.
The data are copied from:
https://openml.org/data/v1/download/1595261.gz
(as of 2021-03-31)
"""
try:
from urllib import urlretrieve
except ImportError:
from urllib.request import urlretrieve
filename = "1595261.gz"
data_url = "https://rainotebookscdn.blob.core.windows.net/datasets/"
remaining_attempts = 5
sleep_duration = 10
while remaining_attempts > 0:
try:
urlretrieve(data_url + filename, filename)
http_stream = gzip.GzipFile(filename=filename, mode='rb')
with closing(http_stream):
def _stream_generator(response):
for line in response:
yield line.decode('utf-8')
stream = _stream_generator(http_stream)
data = arff.load(stream)
except Exception as exc: # noqa: B902
remaining_attempts -= 1
print("Error downloading dataset from {} ({} attempt(s) remaining)"
.format(data_url, remaining_attempts))
print(exc)
time.sleep(sleep_duration)
sleep_duration *= 2
continue
else:
# dataset successfully downloaded
break
else:
raise Exception("Could not retrieve dataset from {}.".format(data_url))
attributes = OrderedDict(data['attributes'])
arff_columns = list(attributes)
raw_df = pd.DataFrame(data=data['data'], columns=arff_columns)
target_column_name = 'class'
target = raw_df.pop(target_column_name)
for col_name in _categorical_columns:
dtype = pd.api.types.CategoricalDtype(attributes[col_name])
raw_df[col_name] = raw_df[col_name].astype(dtype, copy=False)
result = Bunch()
result.data = raw_df
result.target = target
return result

View File

@@ -1,545 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/fairness/upload-fairness-dashboard.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Upload a Fairness Dashboard to Azure Machine Learning Studio\n",
"**This notebook shows how to generate and upload a fairness assessment dashboard from Fairlearn to AzureML Studio**\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Loading the Data](#LoadingData)\n",
"1. [Processing the Data](#ProcessingData)\n",
"1. [Training Models](#TrainingModels)\n",
"1. [Logging in to AzureML](#LoginAzureML)\n",
"1. [Registering the Models](#RegisterModels)\n",
"1. [Using the Fairness Dashboard](#LocalDashboard)\n",
"1. [Uploading a Fairness Dashboard to Azure](#AzureUpload)\n",
" 1. Computing Fairness Metrics\n",
" 1. Uploading to Azure\n",
"1. [Conclusion](#Conclusion)\n",
" \n",
"\n",
"<a id=\"Introduction\"></a>\n",
"## Introduction\n",
"\n",
"In this notebook, we walk through a simple example of using the `azureml-contrib-fairness` package to upload a collection of fairness statistics for a fairness dashboard. It is an example of integrating the [open source Fairlearn package](https://www.github.com/fairlearn/fairlearn) with Azure Machine Learning. This is not an example of fairness analysis or mitigation - this notebook simply shows how to get a fairness dashboard into the Azure Machine Learning portal. We will load the data and train a couple of simple models. We will then use Fairlearn to generate data for a Fairness dashboard, which we can upload to Azure Machine Learning portal and view there.\n",
"\n",
"### Setup\n",
"\n",
"To use this notebook, an Azure Machine Learning workspace is required.\n",
"Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
"This notebook also requires the following packages:\n",
"* `azureml-contrib-fairness`\n",
"* `fairlearn>=0.6.2` (also works for pre-v0.5.0 with slight modifications)\n",
"* `joblib`\n",
"* `liac-arff`\n",
"* `raiwidgets`\n",
"\n",
"Fairlearn relies on features introduced in v0.22.1 of `scikit-learn`. If you have an older version already installed, please uncomment and run the following cell:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# !pip install --upgrade scikit-learn>=0.22.1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, please ensure that when you downloaded this notebook, you also downloaded the `fairness_nb_utils.py` file from the same location, and placed it in the same directory as this notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"LoadingData\"></a>\n",
"## Loading the Data\n",
"We use the well-known `adult` census dataset, which we fetch from the OpenML website. We start with a fairly unremarkable set of imports:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn import svm\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.compose import make_column_selector as selector\n",
"from sklearn.pipeline import Pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can load the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairness_nb_utils import fetch_census_dataset\n",
"\n",
"data = fetch_census_dataset()\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
"y = (data.target == '>50K') * 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can take a look at some of the data. For example, the next cells shows the counts of the different races identified in the dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(X_raw[\"race\"].value_counts().to_dict())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ProcessingData\"></a>\n",
"## Processing the Data\n",
"\n",
"With the data loaded, we process it for our needs. First, we extract the sensitive features of interest into `A` (conventionally used in the literature) and leave the rest of the feature data in `X_raw`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A = X_raw[['sex','race']]\n",
"X_raw = X_raw.drop(labels=['sex', 'race'],axis = 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now preprocess our data. To avoid the problem of data leakage, we split our data into training and test sets before performing any other transformations. Subsequent transformations (such as scalings) will be fit to the training data set, and then applied to the test dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(X_train, X_test, y_train, y_test, A_train, A_test) = train_test_split(\n",
" X_raw, y, A, test_size=0.3, random_state=12345, stratify=y\n",
")\n",
"\n",
"# Ensure indices are aligned between X, y and A,\n",
"# after all the slicing and splitting of DataFrames\n",
"# and Series\n",
"\n",
"X_train = X_train.reset_index(drop=True)\n",
"X_test = X_test.reset_index(drop=True)\n",
"y_train = y_train.reset_index(drop=True)\n",
"y_test = y_test.reset_index(drop=True)\n",
"A_train = A_train.reset_index(drop=True)\n",
"A_test = A_test.reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have two types of column in the dataset - categorical columns which will need to be one-hot encoded, and numeric ones which will need to be rescaled. We also need to take care of missing values. We use a simple approach here, but please bear in mind that this is another way that bias could be introduced (especially if one subgroup tends to have more missing values).\n",
"\n",
"For this preprocessing, we make use of `Pipeline` objects from `sklearn`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"numeric_transformer = Pipeline(\n",
" steps=[\n",
" (\"impute\", SimpleImputer()),\n",
" (\"scaler\", StandardScaler()),\n",
" ]\n",
")\n",
"\n",
"categorical_transformer = Pipeline(\n",
" [\n",
" (\"impute\", SimpleImputer(strategy=\"most_frequent\")),\n",
" (\"ohe\", OneHotEncoder(handle_unknown=\"ignore\", sparse=False)),\n",
" ]\n",
")\n",
"\n",
"preprocessor = ColumnTransformer(\n",
" transformers=[\n",
" (\"num\", numeric_transformer, selector(dtype_exclude=\"category\")),\n",
" (\"cat\", categorical_transformer, selector(dtype_include=\"category\")),\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, the preprocessing pipeline is defined, we can run it on our training data, and apply the generated transform to our test data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train = preprocessor.fit_transform(X_train)\n",
"X_test = preprocessor.transform(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"TrainingModels\"></a>\n",
"## Training Models\n",
"\n",
"We now train a couple of different models on our data. The `adult` census dataset is a classification problem - the goal is to predict whether a particular individual exceeds an income threshold. For the purpose of generating a dashboard to upload, it is sufficient to train two basic classifiers. First, a logistic regression classifier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lr_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
"\n",
"lr_predictor.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And for comparison, a support vector classifier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"svm_predictor = svm.SVC()\n",
"\n",
"svm_predictor.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"LoginAzureML\"></a>\n",
"## Logging in to AzureML\n",
"\n",
"With our two classifiers trained, we can log into our AzureML workspace:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace, Experiment, Model\n",
"\n",
"ws = Workspace.from_config()\n",
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"RegisterModels\"></a>\n",
"## Registering the Models\n",
"\n",
"Next, we register our models. By default, the subroutine which uploads the models checks that the names provided correspond to registered models in the workspace. We define a utility routine to do the registering:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import joblib\n",
"import os\n",
"\n",
"os.makedirs('models', exist_ok=True)\n",
"def register_model(name, model):\n",
" print(\"Registering \", name)\n",
" model_path = \"models/{0}.pkl\".format(name)\n",
" joblib.dump(value=model, filename=model_path)\n",
" registered_model = Model.register(model_path=model_path,\n",
" model_name=name,\n",
" workspace=ws)\n",
" print(\"Registered \", registered_model.id)\n",
" return registered_model.id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we register the models. For convenience in subsequent method calls, we store the results in a dictionary, which maps the `id` of the registered model (a string in `name:version` format) to the predictor itself:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_dict = {}\n",
"\n",
"lr_reg_id = register_model(\"fairness_linear_regression\", lr_predictor)\n",
"model_dict[lr_reg_id] = lr_predictor\n",
"svm_reg_id = register_model(\"fairness_svm\", svm_predictor)\n",
"model_dict[svm_reg_id] = svm_predictor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"LocalDashboard\"></a>\n",
"## Using the Fairlearn Dashboard\n",
"\n",
"We can now examine the fairness of the two models we have training, both as a function of race and (binary) sex. Before uploading the dashboard to the AzureML portal, we will first instantiate a local instance of the Fairlearn dashboard.\n",
"\n",
"Regardless of the viewing location, the dashboard is based on three things - the true values, the model predictions and the sensitive feature values. The dashboard can use predictions from multiple models and multiple sensitive features if desired (as we are doing here).\n",
"\n",
"Our first step is to generate a dictionary mapping the `id` of the registered model to the corresponding array of predictions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ys_pred = {}\n",
"for n, p in model_dict.items():\n",
" ys_pred[n] = p.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can examine these predictions in a locally invoked Fairlearn dashboard. This can be compared to the dashboard uploaded to the portal (in the next section):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import FairnessDashboard\n",
"\n",
"FairnessDashboard(sensitive_features=A_test, \n",
" y_true=y_test.tolist(),\n",
" y_pred=ys_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"AzureUpload\"></a>\n",
"## Uploading a Fairness Dashboard to Azure\n",
"\n",
"Uploading a fairness dashboard to Azure is a two stage process. The `FairnessDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. The required stages are therefore:\n",
"1. Precompute all the required metrics\n",
"1. Upload to Azure\n",
"\n",
"\n",
"### Computing Fairness Metrics\n",
"We use Fairlearn to create a dictionary which contains all the data required to display a dashboard. This includes both the raw data (true values, predicted values and sensitive features), and also the fairness metrics. The API is similar to that used to invoke the Dashboard locally. However, there are a few minor changes to the API, and the type of problem being examined (binary classification, regression etc.) needs to be specified explicitly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf = { 'Race': A_test.race, 'Sex': A_test.sex }\n",
"\n",
"from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
"\n",
"dash_dict = _create_group_metric_set(y_true=y_test,\n",
" predictions=ys_pred,\n",
" sensitive_features=sf,\n",
" prediction_type='binary_classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `_create_group_metric_set()` method is currently underscored since its exact design is not yet final in Fairlearn."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Uploading to Azure\n",
"\n",
"We can now import the `azureml.contrib.fairness` package itself. We will round-trip the data, so there are two required subroutines:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can upload the generated dictionary to AzureML. The upload method requires a run, so we first create an experiment and a run. The uploaded dashboard can be seen on the corresponding Run Details page in AzureML Studio. For completeness, we also download the dashboard dictionary which we uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, \"notebook-01\")\n",
"print(exp)\n",
"\n",
"run = exp.start_logging()\n",
"try:\n",
" dashboard_title = \"Sample notebook upload\"\n",
" upload_id = upload_dashboard_dictionary(run,\n",
" dash_dict,\n",
" dashboard_name=dashboard_title)\n",
" print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
"\n",
" downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
"finally:\n",
" run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can verify that the dashboard dictionary which we downloaded matches our upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(dash_dict == downloaded_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"Conclusion\"></a>\n",
"## Conclusion\n",
"\n",
"In this notebook we have demonstrated how to generate and upload a fairness dashboard to AzureML Studio. We have not discussed how to analyse the results and apply mitigations. Those topics will be covered elsewhere."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "riedgar"
}
],
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,13 +0,0 @@
name: upload-fairness-dashboard
dependencies:
- pip:
- azureml-sdk
- azureml-contrib-fairness
- fairlearn>=0.6.2,<=0.7.0
- joblib
- liac-arff
- raiwidgets~=0.33.0
- itsdangerous==2.0.1
- markupsafe<2.1.0
- protobuf==3.20.0
- numpy<1.24.0

View File

@@ -14,14 +14,13 @@ dependencies:
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.56.0
- azureml-defaults~=1.56.0
- -r https://automlsdkdataresources.blob.core.windows.net/validated-requirements/1.56.0/validated_win32_requirements.txt [--no-deps]
- azureml-widgets~=1.59.0
- azureml-defaults~=1.59.0
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.59.0/validated_win32_requirements.txt [--no-deps]
- matplotlib==3.7.1
- xgboost==1.5.2
- prophet==1.1.4
- pandas==1.3.5
- cmdstanpy==1.1.0
- onnx==1.16.1
- setuptools-git==1.2
- spacy==3.4.4
- https://aka.ms/automl-resources/packages/en_core_web_sm-3.4.1.tar.gz
- spacy==3.7.4
- https://aka.ms/automl-resources/packages/en_core_web_sm-3.7.1.tar.gz

View File

@@ -12,7 +12,7 @@ dependencies:
- numpy>=1.21.6,<=1.23.5
- urllib3==1.26.7
- scipy==1.10.1
- scikit-learn==1.1.3
- scikit-learn==1.5.1
- holidays==0.29
- pytorch::pytorch=1.11.0
- cudatoolkit=10.1.243
@@ -20,11 +20,11 @@ dependencies:
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.56.0
- azureml-defaults~=1.56.0
- azureml-widgets~=1.59.0
- azureml-defaults~=1.59.0
- pytorch-transformers==1.0.0
- spacy==3.4.4
- spacy==3.7.4
- xgboost==1.5.2
- prophet==1.1.4
- https://aka.ms/automl-resources/packages/en_core_web_sm-3.4.1.tar.gz
- -r https://automlsdkdataresources.blob.core.windows.net/validated-requirements/1.56.0/validated_linux_requirements.txt [--no-deps]
- https://aka.ms/automl-resources/packages/en_core_web_sm-3.7.1.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.59.0/validated_linux_requirements.txt [--no-deps]

View File

@@ -10,17 +10,17 @@ dependencies:
- python>=3.10,<3.11
- numpy>=1.21.6,<=1.23.5
- scipy==1.10.1
- scikit-learn==1.1.3
- scikit-learn==1.5.1
- holidays==0.29
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.56.0
- azureml-defaults~=1.56.0
- azureml-widgets~=1.59.0
- azureml-defaults~=1.59.0
- pytorch-transformers==1.0.0
- prophet==1.1.4
- xgboost==1.5.2
- spacy==3.4.4
- spacy==3.7.4
- matplotlib==3.7.1
- https://aka.ms/automl-resources/packages/en_core_web_sm-3.4.1.tar.gz
- -r https://automlsdkdataresources.blob.core.windows.net/validated-requirements/1.56.0/validated_darwin_requirements.txt [--no-deps]
- https://aka.ms/automl-resources/packages/en_core_web_sm-3.7.1.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.59.0/validated_darwin_requirements.txt [--no-deps]

View File

@@ -93,7 +93,8 @@
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.interpret import ExplanationClient"
"from azureml.interpret import ExplanationClient\n",
"from azureml.data.datapath import DataPath"
]
},
{
@@ -266,10 +267,12 @@
"pd.DataFrame(data).to_csv(\"data/train_data.csv\", index=False)\n",
"\n",
"ds = ws.get_default_datastore()\n",
"ds.upload(\n",
" src_dir=\"./data\", target_path=\"bankmarketing\", overwrite=True, show_progress=True\n",
"target = DataPath(\n",
" datastore=ds, path_on_datastore=\"bankmarketing/train_data.csv\", name=\"bankmarketing\"\n",
")\n",
"Dataset.File.upload_directory(\n",
" src_dir=\"./data\", target=target, overwrite=True, show_progress=True\n",
")\n",
"\n",
"\n",
"# Upload the training data as a tabular dataset for access during training on remote compute\n",
"train_data = Dataset.Tabular.from_delimited_files(\n",
@@ -1090,7 +1093,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.10.14"
},
"nteract": {
"version": "nteract-front-end@1.0.0"
@@ -1104,5 +1107,5 @@
"task": "Classification"
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}

View File

@@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.56.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.59.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.56.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.59.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -1,420 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/experimental/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Classification of credit card fraudulent transactions on local managed compute **_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)\n",
"1. [Acknowledgements](#Acknowledgements)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"In this example we use the associated credit card dataset to showcase how you can use AutoML for a simple classification problem. The goal is to predict if a credit card transaction is considered a fraudulent charge.\n",
"\n",
"This notebook is using local managed compute to train the model.\n",
"\n",
"If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an experiment using an existing workspace.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using local managed compute.\n",
"4. Explore the results.\n",
"5. Test the fitted model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.compute_target import LocalTarget\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.56.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-local-managed'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', None)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Determine if local docker is configured for Linux images\n",
"\n",
"Local managed runs will leverage a Linux docker container to submit the run to. Due to this, the docker needs to be configured to use Linux containers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check if Docker is installed and Linux containers are enabled\n",
"import subprocess\n",
"from subprocess import CalledProcessError\n",
"try:\n",
" assert subprocess.run(\"docker -v\", shell=True).returncode == 0, 'Local Managed runs require docker to be installed.'\n",
" out = subprocess.check_output(\"docker system info\", shell=True).decode('ascii')\n",
" assert \"OSType: linux\" in out, 'Docker engine needs to be configured to use Linux containers.' \\\n",
" 'https://docs.docker.com/docker-for-windows/#switch-between-windows-and-linux-containers'\n",
"except CalledProcessError as ex:\n",
" raise Exception('Local Managed runs require docker to be installed.') from ex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Data\n",
"\n",
"Load the credit card dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"training_data, validation_data = dataset.random_split(percentage=0.8, seed=223)\n",
"label_column_name = 'Class'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**enable_early_stopping**|Stop the run if the metric score is not showing improvement.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**training_data**|Input dataset, containing both features and label column.|\n",
"|**label_column_name**|The name of the label column.|\n",
"|**enable_local_managed**|Enable the experimental local-managed scenario.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'average_precision_score_weighted',\n",
" \"enable_early_stopping\": True,\n",
" \"experiment_timeout_hours\": 0.3, #for real scenarios we recommend a timeout of at least one hour \n",
" \"verbosity\": logging.INFO,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" compute_target = LocalTarget(),\n",
" enable_local_managed = True,\n",
" training_data = training_data,\n",
" label_column_name = label_column_name,\n",
" **automl_settings\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"parent_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If you need to retrieve a run that already started, use the following code\n",
"#from azureml.train.automl.run import AutoMLRun\n",
"#parent_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"parent_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Explain model\n",
"\n",
"Automated ML models can be explained and visualized using the SDK Explainability library. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyze results\n",
"\n",
"### Retrieve the Best Child Run\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_best_child` method returns the best run. Overloads on `get_best_child` allow you to retrieve the best run for *any* logged metric."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run = parent_run.get_best_child()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test the fitted model\n",
"\n",
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_test_df = validation_data.drop_columns(columns=[label_column_name])\n",
"y_test_df = validation_data.keep_columns(columns=[label_column_name], validate=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Creating ModelProxy for submitting prediction runs to the training environment.\n",
"We will create a ModelProxy for the best child run, which will allow us to submit a run that does the prediction in the training environment. Unlike the local client, which can have different versions of some libraries, the training environment will have all the compatible libraries for the model already."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.model_proxy import ModelProxy\n",
"best_model_proxy = ModelProxy(best_run)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# call the predict functions on the model proxy\n",
"y_pred = best_model_proxy.predict(X_test_df).to_pandas_dataframe()\n",
"y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acknowledgements"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This Credit Card fraud Detection dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ and is available at: https://www.kaggle.com/mlg-ulb/creditcardfraud\n",
"\n",
"\n",
"The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Universit\u00c3\u0192\u00c2\u00a9 Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net and the page of the DefeatFraud project\n",
"Please cite the following works: \n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tAndrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015\n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tDal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon\n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tDal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE\n",
"o\tDal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)\n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tCarcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-A\u00c3\u0192\u00c2\u00abl; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier\n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tCarcillo, Fabrizio; Le Borgne, Yann-A\u00c3\u0192\u00c2\u00abl; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing"
]
}
],
"metadata": {
"authors": [
{
"name": "sekrupa"
}
],
"category": "tutorial",
"compute": [
"AML Compute"
],
"datasets": [
"Creditcard"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"file_extension": ".py",
"framework": [
"None"
],
"friendly_name": "Classification of credit card fraudulent transactions using Automated ML",
"index_order": 5,
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"tags": [
"AutomatedML"
],
"task": "Classification",
"version": "3.6.7"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -91,7 +91,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.56.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.59.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -366,7 +366,7 @@
"USE_CURATED_ENV = True\n",
"if USE_CURATED_ENV:\n",
" curated_environment = Environment.get(\n",
" workspace=ws, name=\"AzureML-sklearn-0.24-ubuntu18.04-py37-cpu\"\n",
" workspace=ws, name=\"AzureML-sklearn-1.5\"\n",
" )\n",
" aml_run_config.environment = curated_environment\n",
"else:\n",

View File

@@ -53,7 +53,7 @@
"\n",
"We will showcase one of the tabular data explainers: TabularExplainer (SHAP).\n",
"\n",
"Problem: Boston Housing Price Prediction with scikit-learn (train a model and run an explainer remotely via AMLCompute, and download and visualize the remotely-calculated explanations.)\n",
"Problem: Housing Price Prediction with scikit-learn (train a model and run an explainer remotely via AMLCompute, and download and visualize the remotely-calculated explanations.)\n",
"\n",
"| ![explanations-run-history](./img/explanations-run-history.png) |\n",
"|:--:|\n"
@@ -429,8 +429,8 @@
"outputs": [],
"source": [
"# Retrieve x_test for visualization\n",
"x_test_path = './x_test_boston_housing.pkl'\n",
"run.download_file('x_test_boston_housing.pkl', output_file_path=x_test_path)"
"x_test_path = './x_test_california_housing.pkl'\n",
"run.download_file('x_test_california_housing.pkl', output_file_path=x_test_path)"
]
},
{
@@ -439,7 +439,7 @@
"metadata": {},
"outputs": [],
"source": [
"x_test = joblib.load('x_test_boston_housing.pkl')"
"x_test = joblib.load('x_test_california_housing.pkl')"
]
},
{

View File

@@ -1,7 +1,7 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
from sklearn import datasets
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Ridge
from interpret.ext.blackbox import TabularExplainer
from azureml.interpret import ExplanationClient
@@ -14,20 +14,20 @@ import numpy as np
OUTPUT_DIR = './outputs/'
os.makedirs(OUTPUT_DIR, exist_ok=True)
boston_data = datasets.load_boston()
california_data = fetch_california_housing()
run = Run.get_context()
client = ExplanationClient.from_run(run)
X_train, X_test, y_train, y_test = train_test_split(boston_data.data,
boston_data.target,
X_train, X_test, y_train, y_test = train_test_split(california_data.data,
california_data.target,
test_size=0.2,
random_state=0)
# write x_test out as a pickle file for later visualization
x_test_pkl = 'x_test.pkl'
with open(x_test_pkl, 'wb') as file:
joblib.dump(value=X_test, filename=os.path.join(OUTPUT_DIR, x_test_pkl))
run.upload_file('x_test_boston_housing.pkl', os.path.join(OUTPUT_DIR, x_test_pkl))
run.upload_file('x_test_california_housing.pkl', os.path.join(OUTPUT_DIR, x_test_pkl))
alpha = 0.5
@@ -50,7 +50,7 @@ original_model = run.register_model(model_name='model_explain_model_on_amlcomp',
model_path='original_model.pkl')
# Explain predictions on your local machine
tabular_explainer = TabularExplainer(model, X_train, features=boston_data.feature_names)
tabular_explainer = TabularExplainer(model, X_train, features=california_data.feature_names)
# Explain overall model predictions (global explanation)
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
@@ -60,5 +60,5 @@ global_explanation = tabular_explainer.explain_global(X_test)
# Uploading model explanation data for storage or visualization in webUX
# The explanation can then be downloaded on any compute
comment = 'Global explanation on regression model trained on boston dataset'
comment = 'Global explanation on regression model trained on california dataset'
client.upload_model_explanation(global_explanation, comment=comment, model_id=original_model.id)

View File

@@ -125,29 +125,29 @@
},
"outputs": [],
"source": [
"from azureml.exceptions import UserErrorException\n",
"\n",
"blob_datastore_name='MyBlobDatastore'\n",
"account_name=os.getenv(\"BLOB_ACCOUNTNAME_62\", \"<my-account-name>\") # Storage account name\n",
"container_name=os.getenv(\"BLOB_CONTAINER_62\", \"<my-container-name>\") # Name of Azure blob container\n",
"account_key=os.getenv(\"BLOB_ACCOUNT_KEY_62\", \"<my-account-key>\") # Storage account key\n",
"\n",
"try:\n",
" blob_datastore = Datastore.get(ws, blob_datastore_name)\n",
" print(\"Found Blob Datastore with name: %s\" % blob_datastore_name)\n",
"except UserErrorException:\n",
" blob_datastore = Datastore.register_azure_blob_container(\n",
" workspace=ws,\n",
" datastore_name=blob_datastore_name,\n",
" account_name=account_name, # Storage account name\n",
" container_name=container_name, # Name of Azure blob container\n",
" account_key=account_key) # Storage account key\n",
" print(\"Registered blob datastore with name: %s\" % blob_datastore_name)\n",
"\n",
"blob_data_ref = DataReference(\n",
" datastore=blob_datastore,\n",
" data_reference_name=\"blob_test_data\",\n",
" path_on_datastore=\"testdata\")"
"# from azureml.exceptions import UserErrorException\n",
"#\n",
"# blob_datastore_name='MyBlobDatastore'\n",
"# account_name=os.getenv(\"BLOB_ACCOUNTNAME_62\", \"<my-account-name>\") # Storage account name\n",
"# container_name=os.getenv(\"BLOB_CONTAINER_62\", \"<my-container-name>\") # Name of Azure blob container\n",
"# account_key=os.getenv(\"BLOB_ACCOUNT_KEY_62\", \"<my-account-key>\") # Storage account key\n",
"#\n",
"# try:\n",
"# blob_datastore = Datastore.get(ws, blob_datastore_name)\n",
"# print(\"Found Blob Datastore with name: %s\" % blob_datastore_name)\n",
"# except UserErrorException:\n",
"# blob_datastore = Datastore.register_azure_blob_container(\n",
"# workspace=ws,\n",
"# datastore_name=blob_datastore_name,\n",
"# account_name=account_name, # Storage account name\n",
"# container_name=container_name, # Name of Azure blob container\n",
"# account_key=account_key) # Storage account key\n",
"# print(\"Registered blob datastore with name: %s\" % blob_datastore_name)\n",
"#\n",
"# blob_data_ref = DataReference(\n",
"# datastore=blob_datastore,\n",
"# data_reference_name=\"blob_test_data\",\n",
"# path_on_datastore=\"testdata\")"
]
},
{
@@ -341,24 +341,24 @@
"metadata": {},
"outputs": [],
"source": [
"data_factory_name = 'adftest'\n",
"\n",
"def get_or_create_data_factory(workspace, factory_name):\n",
" try:\n",
" return DataFactoryCompute(workspace, factory_name)\n",
" except ComputeTargetException as e:\n",
" if 'ComputeTargetNotFound' in e.message:\n",
" print('Data factory not found, creating...')\n",
" provisioning_config = DataFactoryCompute.provisioning_configuration()\n",
" data_factory = ComputeTarget.create(workspace, factory_name, provisioning_config)\n",
" data_factory.wait_for_completion()\n",
" return data_factory\n",
" else:\n",
" raise e\n",
" \n",
"data_factory_compute = get_or_create_data_factory(ws, data_factory_name)\n",
"\n",
"print(\"Setup Azure Data Factory account complete\")"
"# data_factory_name = 'adftest'\n",
"#\n",
"# def get_or_create_data_factory(workspace, factory_name):\n",
"# try:\n",
"# return DataFactoryCompute(workspace, factory_name)\n",
"# except ComputeTargetException as e:\n",
"# if 'ComputeTargetNotFound' in e.message:\n",
"# print('Data factory not found, creating...')\n",
"# provisioning_config = DataFactoryCompute.provisioning_configuration()\n",
"# data_factory = ComputeTarget.create(workspace, factory_name, provisioning_config)\n",
"# data_factory.wait_for_completion()\n",
"# return data_factory\n",
"# else:\n",
"# raise e\n",
"#\n",
"# data_factory_compute = get_or_create_data_factory(ws, data_factory_name)\n",
"#\n",
"# print(\"Setup Azure Data Factory account complete\")"
]
},
{
@@ -392,19 +392,21 @@
"metadata": {},
"outputs": [],
"source": [
"# TODO: 3012801 - Use ADLS Gen2 datastore.\n",
"blob_data_ref2 = DataReference(\n",
" datastore=blob_datastore,\n",
" data_reference_name=\"blob_test_data2\",\n",
" path_on_datastore=\"testdata2\")\n",
"\n",
"transfer_adls_to_blob = DataTransferStep(\n",
" name=\"transfer_adls_to_blob\",\n",
" source_data_reference=blob_data_ref,\n",
" destination_data_reference=blob_data_ref2,\n",
" compute_target=data_factory_compute)\n",
"\n",
"print(\"Data transfer step created\")"
"# # TODO: 3012801 - Use ADLS Gen2 datastore.\n",
"# blob_data_ref2 = DataReference(\n",
"# datastore=blob_datastore,\n",
"# data_reference_name=\"blob_test_data2\",\n",
"# path_on_datastore=\"testdata2\")\n",
"#\n",
"# transfer_adls_to_blob = DataTransferStep(\n",
"# name=\"transfer_adls_to_blob\",\n",
"# source_data_reference=blob_data_ref,\n",
"# destination_data_reference=blob_data_ref2,\n",
"# compute_target=data_factory_compute,\n",
"# source_reference_type='file',\n",
"# destination_reference_type=\"file\")\n",
"#\n",
"# print(\"Data transfer step created\")"
]
},
{
@@ -455,13 +457,13 @@
"metadata": {},
"outputs": [],
"source": [
"pipeline_01 = Pipeline(\n",
" description=\"data_transfer_01\",\n",
" workspace=ws,\n",
" steps=[transfer_adls_to_blob])\n",
"\n",
"pipeline_run_01 = Experiment(ws, \"Data_Transfer_example_01\").submit(pipeline_01)\n",
"pipeline_run_01.wait_for_completion()"
"# pipeline_01 = Pipeline(\n",
"# description=\"data_transfer_01\",\n",
"# workspace=ws,\n",
"# steps=[transfer_adls_to_blob])\n",
"#\n",
"# pipeline_run_01 = Experiment(ws, \"Data_Transfer_example_01\").submit(pipeline_01)\n",
"# pipeline_run_01.wait_for_completion()"
]
},
{
@@ -492,8 +494,8 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(pipeline_run_01).show()"
"# from azureml.widgets import RunDetails\n",
"# RunDetails(pipeline_run_01).show()"
]
},
{

View File

@@ -292,7 +292,7 @@
"metadata": {},
"outputs": [],
"source": [
"tf_env = Environment.get(ws, name='AzureML-tensorflow-2.12-cuda11')"
"tf_env = Environment.get(ws, name='AzureML-tensorflow-2.16-cuda12')"
]
},
{

View File

@@ -178,7 +178,7 @@ os.makedirs('./outputs/model', exist_ok=True)
# files saved in the "./outputs" folder are automatically uploaded into run history
# this is workaround for https://github.com/tensorflow/tensorflow/issues/33913 and will be fixed once we move to >tf2.1
neural_net._set_inputs(X_train)
# neural_net._set_inputs(X_train)
tf.saved_model.save(neural_net, './outputs/model/')
stop_time = time.perf_counter()

View File

@@ -1,753 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer-parallel-run.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Neural style transfer on video\n",
"Using modified code from `pytorch`'s neural style [example](https://pytorch.org/tutorials/advanced/neural_style_tutorial.html), we show how to setup a pipeline for doing style transfer on video. The pipeline has following steps:\n",
"1. Split a video into images\n",
"2. Run neural style on each image using one of the provided models (from `pytorch` pretrained models for this example).\n",
"3. Stitch the image back into a video.\n",
"\n",
"> **Tip**\n",
"If your system requires low-latency processing (to process a single document or small set of documents quickly), use [real-time scoring](https://docs.microsoft.com/en-us/azure/machine-learning/v1/how-to-consume-web-service) instead of batch prediction."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace, Experiment\n",
"\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"from azureml.core import Datastore, Dataset\n",
"from azureml.pipeline.core import Pipeline\n",
"from azureml.pipeline.steps import PythonScriptStep\n",
"from azureml.core.runconfig import CondaDependencies, RunConfiguration\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"from azureml.data import OutputFileDatasetConfig"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Download models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# create directory for model\n",
"model_dir = 'models'\n",
"if not os.path.isdir(model_dir):\n",
" os.mkdir(model_dir)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import urllib.request\n",
"\n",
"def download_model(model_name):\n",
" # downloaded models from https://pytorch.org/tutorials/advanced/neural_style_tutorial.html are kept here\n",
" url = \"https://pipelinedata.blob.core.windows.net/styletransfer/saved_models/\" + model_name\n",
" local_path = os.path.join(model_dir, model_name)\n",
" urllib.request.urlretrieve(url, local_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Register all Models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import Model\n",
"mosaic_model = None\n",
"candy_model = None\n",
"\n",
"models = Model.list(workspace=ws, tags=['scenario'])\n",
"for m in models:\n",
" print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)\n",
" if m.name == 'mosaic' and mosaic_model is None:\n",
" mosaic_model = m\n",
" elif m.name == 'candy' and candy_model is None:\n",
" candy_model = m\n",
"\n",
"if mosaic_model is None:\n",
" print('Mosaic model does not exist, registering it')\n",
" download_model('mosaic.pth')\n",
" mosaic_model = Model.register(model_path = os.path.join(model_dir, \"mosaic.pth\"),\n",
" model_name = \"mosaic\",\n",
" tags = {'type': \"mosaic\", 'scenario': \"Style transfer using batch inference\"},\n",
" description = \"Style transfer - Mosaic\",\n",
" workspace = ws)\n",
"else:\n",
" print('Reusing existing mosaic model')\n",
" \n",
"\n",
"if candy_model is None:\n",
" print('Candy model does not exist, registering it')\n",
" download_model('candy.pth')\n",
" candy_model = Model.register(model_path = os.path.join(model_dir, \"candy.pth\"),\n",
" model_name = \"candy\",\n",
" tags = {'type': \"candy\", 'scenario': \"Style transfer using batch inference\"},\n",
" description = \"Style transfer - Candy\",\n",
" workspace = ws)\n",
"else:\n",
" print('Reusing existing candy model')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create or use existing compute\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# AmlCompute\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"try:\n",
" cpu_cluster = AmlCompute(ws, cpu_cluster_name)\n",
" print(\"found existing cluster.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new cluster\")\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_v2\",\n",
" max_nodes = 1)\n",
"\n",
" # create the cluster\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, provisioning_config)\n",
" cpu_cluster.wait_for_completion(show_output=True)\n",
" \n",
"# AmlCompute\n",
"gpu_cluster_name = \"gpu-cluster\"\n",
"try:\n",
" gpu_cluster = AmlCompute(ws, gpu_cluster_name)\n",
" print(\"found existing cluster.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new cluster\")\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"Standard_NC6s_v3\",\n",
" max_nodes = 3)\n",
"\n",
" # create the cluster\n",
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n",
" gpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python Scripts\n",
"We use an edited version of `neural_style_mpi.py` (original is [here](https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style.py)). Scripts to split and stitch the video are thin wrappers to calls to `ffmpeg`. \n",
"\n",
"We install `ffmpeg` through conda dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"scripts_folder = \"scripts\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"process_video_script_file = \"process_video.py\"\n",
"\n",
"# peek at contents\n",
"with open(os.path.join(scripts_folder, process_video_script_file)) as process_video_file:\n",
" print(process_video_file.read())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"stitch_video_script_file = \"stitch_video.py\"\n",
"\n",
"# peek at contents\n",
"with open(os.path.join(scripts_folder, stitch_video_script_file)) as stitch_video_file:\n",
" print(stitch_video_file.read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The sample video **organutan.mp4** is stored at a publicly shared datastore. We are registering the datastore below. If you want to take a look at the original video, click here. (https://pipelinedata.blob.core.windows.net/sample-videos/orangutan.mp4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# datastore for input video\n",
"account_name = \"pipelinedata\"\n",
"video_ds = Datastore.register_azure_blob_container(ws, \"videos\", \"sample-videos\",\n",
" account_name=account_name, overwrite=True)\n",
"\n",
"# the default blob store attached to a workspace\n",
"default_datastore = ws.get_default_datastore()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sample video"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"video_name=os.getenv(\"STYLE_TRANSFER_VIDEO_NAME\", \"orangutan.mp4\") \n",
"orangutan_video = Dataset.File.from_files((video_ds,video_name))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cd = CondaDependencies.create(python_version=\"3.8\", conda_packages=['pip==20.2.4'])\n",
"\n",
"cd.add_channel(\"conda-forge\")\n",
"cd.add_conda_package(\"ffmpeg==4.0.2\")\n",
"\n",
"# Runconfig\n",
"amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n",
"amlcompute_run_config.environment.docker.base_image = \"pytorch/pytorch\"\n",
"amlcompute_run_config.environment.spark.precache_packages = False"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ffmpeg_audio = OutputFileDatasetConfig(name=\"ffmpeg_audio\")\n",
"processed_images = OutputFileDatasetConfig(name=\"processed_images\")\n",
"output_video = OutputFileDatasetConfig(name=\"output_video\")\n",
"\n",
"ffmpeg_images = OutputFileDatasetConfig(name=\"ffmpeg_images\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Define tweakable parameters to pipeline\n",
"These parameters can be changed when the pipeline is published and rerun from a REST call.\n",
"As part of ParallelRunStep following 2 pipeline parameters will be created which can be used to override values.\n",
" node_count\n",
" process_count_per_node"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core.graph import PipelineParameter\n",
"# create a parameter for style (one of \"candy\", \"mosaic\") to transfer the images to\n",
"style_param = PipelineParameter(name=\"style\", default_value=\"mosaic\")\n",
"# create a parameter for the number of nodes to use in step no. 2 (style transfer)\n",
"nodecount_param = PipelineParameter(name=\"nodecount\", default_value=2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"split_video_step = PythonScriptStep(\n",
" name=\"split video\",\n",
" script_name=\"process_video.py\",\n",
" arguments=[\"--input_video\", orangutan_video.as_mount(),\n",
" \"--output_audio\", ffmpeg_audio,\n",
" \"--output_images\", ffmpeg_images],\n",
" compute_target=cpu_cluster,\n",
" runconfig=amlcompute_run_config,\n",
" source_directory=scripts_folder\n",
")\n",
"\n",
"stitch_video_step = PythonScriptStep(\n",
" name=\"stitch\",\n",
" script_name=\"stitch_video.py\",\n",
" arguments=[\"--images_dir\", processed_images.as_input(), \n",
" \"--input_audio\", ffmpeg_audio.as_input(), \n",
" \"--output_dir\", output_video],\n",
" compute_target=cpu_cluster,\n",
" runconfig=amlcompute_run_config,\n",
" source_directory=scripts_folder\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create environment, parallel step run config and parallel run step"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n",
"\n",
"parallel_cd = CondaDependencies.create(python_version=\"3.8\", conda_packages=['pip==20.2.4', 'numpy==1.19'])\n",
"\n",
"parallel_cd.add_channel(\"pytorch\")\n",
"parallel_cd.add_conda_package(\"pytorch\")\n",
"parallel_cd.add_conda_package(\"torchvision\")\n",
"parallel_cd.add_conda_package(\"pillow<7\") # needed for torchvision==0.4.0\n",
"\n",
"styleenvironment = Environment(name=\"styleenvironment\")\n",
"styleenvironment.python.conda_dependencies=parallel_cd\n",
"styleenvironment.docker.base_image = DEFAULT_GPU_IMAGE"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import PipelineParameter\n",
"from azureml.pipeline.steps import ParallelRunConfig\n",
"\n",
"parallel_run_config = ParallelRunConfig(\n",
" environment=styleenvironment,\n",
" entry_script='transform.py',\n",
" output_action='summary_only',\n",
" mini_batch_size=\"1\",\n",
" error_threshold=1,\n",
" source_directory=scripts_folder,\n",
" compute_target=gpu_cluster, \n",
" node_count=nodecount_param,\n",
" process_count_per_node=2\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.steps import ParallelRunStep\n",
"from datetime import datetime\n",
"\n",
"parallel_step_name = 'styletransfer-' + datetime.now().strftime('%Y%m%d%H%M')\n",
"\n",
"distributed_style_transfer_step = ParallelRunStep(\n",
" name=parallel_step_name,\n",
" inputs=[ffmpeg_images], # Input file share/blob container/file dataset\n",
" output=processed_images, # Output file share/blob container\n",
" arguments=[\"--style\", style_param],\n",
" parallel_run_config=parallel_run_config,\n",
" allow_reuse=False #[optional - default value True]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Run the pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline = Pipeline(workspace=ws, steps=[stitch_video_step])\n",
"\n",
"pipeline.validate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# submit the pipeline and provide values for the PipelineParameters used in the pipeline\n",
"pipeline_run = Experiment(ws, 'styletransfer_parallel_mosaic').submit(pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Monitor pipeline run\n",
"\n",
"The pipeline run status could be checked in Azure Machine Learning portal (https://ml.azure.com). The link to the pipeline run could be retrieved by inspecting the `pipeline_run` object.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# This will output information of the pipeline run, including the link to the details page of portal.\n",
"pipeline_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Optional: View detailed logs (streaming) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait the run for completion and show output log to console\n",
"pipeline_run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Download output video"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Downloads the video in `output_video` folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def download_video(run, target_dir=None):\n",
" stitch_run = run.find_step_run(stitch_video_step.name)[0]\n",
" port_data = stitch_run.get_details()['outputDatasets'][0]['dataset']\n",
" port_data.download(target_dir)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run.wait_for_completion()\n",
"download_video(pipeline_run, \"output_video_mosaic\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Publish pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_name = \"style-transfer-batch-inference\"\n",
"print(pipeline_name)\n",
"\n",
"published_pipeline = pipeline.publish(\n",
" name=pipeline_name, \n",
" description=pipeline_name)\n",
"print(\"Newly published pipeline id: {}\".format(published_pipeline.id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get published pipeline\n",
"This is another way to get the published pipeline."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import PublishedPipeline\n",
"\n",
"# You could retrieve all pipelines that are published, or \n",
"# just get the published pipeline object that you have the ID for.\n",
"\n",
"# Get all published pipeline objects in the workspace\n",
"all_pub_pipelines = PublishedPipeline.list(ws)\n",
"\n",
"# We will iterate through the list of published pipelines and \n",
"# use the last ID in the list for Schelue operations: \n",
"print(\"Published pipelines found in the workspace:\")\n",
"for pub_pipeline in all_pub_pipelines:\n",
" print(\"Name:\", pub_pipeline.name,\"\\tDescription:\", pub_pipeline.description, \"\\tId:\", pub_pipeline.id, \"\\tStatus:\", pub_pipeline.status)\n",
" if(pub_pipeline.name == pipeline_name):\n",
" published_pipeline = pub_pipeline\n",
"\n",
"print(\"Published pipeline id: {}\".format(published_pipeline.id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Run pipeline through REST calls for other styles\n",
"\n",
"# Get AAD token"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
"import requests\n",
"\n",
"auth = InteractiveLoginAuthentication()\n",
"aad_token = auth.get_authentication_header()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get endpoint URL"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rest_endpoint = published_pipeline.endpoint\n",
"print(\"Pipeline REST endpoing: {}\".format(rest_endpoint))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Send request and monitor"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = 'styletransfer_parallel_candy'\n",
"response = requests.post(rest_endpoint, \n",
" headers=aad_token,\n",
" json={\"ExperimentName\": experiment_name,\n",
" \"ParameterAssignments\": {\"style\": \"candy\", \"NodeCount\": 3}})\n",
"\n",
"run_id = response.json()[\"Id\"]\n",
"\n",
"from azureml.pipeline.core.run import PipelineRun\n",
"published_pipeline_run_candy = PipelineRun(ws.experiments[experiment_name], run_id)\n",
"\n",
"# Show detail information of run\n",
"published_pipeline_run_candy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Download output from re-run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"published_pipeline_run_candy.wait_for_completion()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"download_video(published_pipeline_run_candy, target_dir=\"output_video_candy\")"
]
}
],
"metadata": {
"authors": [
{
"name": "sanpil joringer asraniwa pansav tracych"
}
],
"category": "Other notebooks",
"compute": [
"AML Compute"
],
"datasets": [],
"deployment": [
"None"
],
"exclude_from_index": true,
"framework": [
"None"
],
"friendly_name": "Style transfer using ParallelRunStep",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
},
"tags": [
"Batch Inferencing",
"Pipeline"
],
"task": "Style transfer"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,22 +0,0 @@
import argparse
import glob
import os
import subprocess
parser = argparse.ArgumentParser(description="Process input video")
parser.add_argument('--input_video', required=True)
parser.add_argument('--output_audio', required=True)
parser.add_argument('--output_images', required=True)
args = parser.parse_args()
os.makedirs(args.output_audio, exist_ok=True)
os.makedirs(args.output_images, exist_ok=True)
subprocess.run("ffmpeg -i {} {}/video.aac".format(args.input_video, args.output_audio),
shell=True,
check=True)
subprocess.run("ffmpeg -i {} {}/%05d_video.jpg -hide_banner".format(args.input_video, args.output_images),
shell=True,
check=True)

View File

@@ -1,22 +0,0 @@
import argparse
import os
import subprocess
parser = argparse.ArgumentParser(description="Process input video")
parser.add_argument('--images_dir', required=True)
parser.add_argument('--input_audio', required=True)
parser.add_argument('--output_dir', required=True)
args = parser.parse_args()
os.makedirs(args.output_dir, exist_ok=True)
subprocess.run("ffmpeg -framerate 30 -i {}/%05d_video.jpg -c:v libx264 -profile:v high -crf 20 -pix_fmt yuv420p "
"-y {}/video_without_audio.mp4"
.format(args.images_dir, args.output_dir),
shell=True, check=True)
subprocess.run("ffmpeg -i {}/video_without_audio.mp4 -i {}/video.aac -map 0:0 -map 1:0 -vcodec "
"copy -acodec copy -y {}/video_with_audio.mp4"
.format(args.output_dir, args.input_audio, args.output_dir),
shell=True, check=True)

View File

@@ -1,172 +0,0 @@
import argparse
import os
import sys
import re
import json
import traceback
from PIL import Image
import torch
from torchvision import transforms
from azureml.core.model import Model
style_model = None
class TransformerNet(torch.nn.Module):
def __init__(self):
super(TransformerNet, self).__init__()
# Initial convolution layers
self.conv1 = ConvLayer(3, 32, kernel_size=9, stride=1)
self.in1 = torch.nn.InstanceNorm2d(32, affine=True)
self.conv2 = ConvLayer(32, 64, kernel_size=3, stride=2)
self.in2 = torch.nn.InstanceNorm2d(64, affine=True)
self.conv3 = ConvLayer(64, 128, kernel_size=3, stride=2)
self.in3 = torch.nn.InstanceNorm2d(128, affine=True)
# Residual layers
self.res1 = ResidualBlock(128)
self.res2 = ResidualBlock(128)
self.res3 = ResidualBlock(128)
self.res4 = ResidualBlock(128)
self.res5 = ResidualBlock(128)
# Upsampling Layers
self.deconv1 = UpsampleConvLayer(128, 64, kernel_size=3, stride=1, upsample=2)
self.in4 = torch.nn.InstanceNorm2d(64, affine=True)
self.deconv2 = UpsampleConvLayer(64, 32, kernel_size=3, stride=1, upsample=2)
self.in5 = torch.nn.InstanceNorm2d(32, affine=True)
self.deconv3 = ConvLayer(32, 3, kernel_size=9, stride=1)
# Non-linearities
self.relu = torch.nn.ReLU()
def forward(self, X):
y = self.relu(self.in1(self.conv1(X)))
y = self.relu(self.in2(self.conv2(y)))
y = self.relu(self.in3(self.conv3(y)))
y = self.res1(y)
y = self.res2(y)
y = self.res3(y)
y = self.res4(y)
y = self.res5(y)
y = self.relu(self.in4(self.deconv1(y)))
y = self.relu(self.in5(self.deconv2(y)))
y = self.deconv3(y)
return y
class ConvLayer(torch.nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride):
super(ConvLayer, self).__init__()
reflection_padding = kernel_size // 2
self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding)
self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride)
def forward(self, x):
out = self.reflection_pad(x)
out = self.conv2d(out)
return out
class ResidualBlock(torch.nn.Module):
"""ResidualBlock
introduced in: https://arxiv.org/abs/1512.03385
recommended architecture: http://torch.ch/blog/2016/02/04/resnets.html
"""
def __init__(self, channels):
super(ResidualBlock, self).__init__()
self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1)
self.in1 = torch.nn.InstanceNorm2d(channels, affine=True)
self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1)
self.in2 = torch.nn.InstanceNorm2d(channels, affine=True)
self.relu = torch.nn.ReLU()
def forward(self, x):
residual = x
out = self.relu(self.in1(self.conv1(x)))
out = self.in2(self.conv2(out))
out = out + residual
return out
class UpsampleConvLayer(torch.nn.Module):
"""UpsampleConvLayer
Upsamples the input and then does a convolution. This method gives better results
compared to ConvTranspose2d.
ref: http://distill.pub/2016/deconv-checkerboard/
"""
def __init__(self, in_channels, out_channels, kernel_size, stride, upsample=None):
super(UpsampleConvLayer, self).__init__()
self.upsample = upsample
if upsample:
self.upsample_layer = torch.nn.Upsample(mode='nearest', scale_factor=upsample)
reflection_padding = kernel_size // 2
self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding)
self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride)
def forward(self, x):
x_in = x
if self.upsample:
x_in = self.upsample_layer(x_in)
out = self.reflection_pad(x_in)
out = self.conv2d(out)
return out
def load_image(filename):
img = Image.open(filename)
return img
def save_image(filename, data):
img = data.clone().clamp(0, 255).numpy()
img = img.transpose(1, 2, 0).astype("uint8")
img = Image.fromarray(img)
img.save(filename)
def init():
global output_path, args
global style_model, device
output_path = os.environ['AZUREML_BI_OUTPUT_PATH']
print(f'output path: {output_path}')
print(f'Cuda available? {torch.cuda.is_available()}')
arg_parser = argparse.ArgumentParser(description="parser for fast-neural-style")
arg_parser.add_argument("--style", type=str, help="style name")
args, unknown_args = arg_parser.parse_known_args()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
with torch.no_grad():
style_model = TransformerNet()
model_path = Model.get_model_path(args.style)
state_dict = torch.load(os.path.join(model_path))
# remove saved deprecated running_* keys in InstanceNorm from the checkpoint
for k in list(state_dict.keys()):
if re.search(r'in\d+\.running_(mean|var)$', k):
del state_dict[k]
style_model.load_state_dict(state_dict)
style_model.to(device)
print(f'Model loaded successfully. Path: {model_path}')
def run(mini_batch):
result = []
for image_file_path in mini_batch:
img = load_image(image_file_path)
with torch.no_grad():
content_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x.mul(255))
])
content_image = content_transform(img)
content_image = content_image.unsqueeze(0).to(device)
output = style_model(content_image).cpu()
output_file_path = os.path.join(output_path, os.path.basename(image_file_path))
save_image(output_file_path, output[0])
result.append(output_file_path)
return result

View File

@@ -293,7 +293,7 @@
"source": [
"from azureml.core import Environment\n",
"\n",
"pytorch_env = Environment.get(ws, name='azureml-acpt-pytorch-1.13-cuda11.7')"
"pytorch_env = Environment.get(ws, name='azureml-acpt-pytorch-2.2-cuda12.1')"
]
},
{

View File

@@ -1,378 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/ml-frameworks/pytorch/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Distributed PyTorch with Horovod\n",
"In this tutorial, you will train a PyTorch model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using distributed training via [Horovod](https://github.com/uber/horovod) across a GPU cluster."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`\n",
"* Review the [tutorial](../train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) on single-node PyTorch training using Azure Machine Learning"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize workspace\n",
"\n",
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.workspace import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `Standard_NC6s_v3` GPU cluster that autoscales from `0` to `4` nodes.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6s_v3',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n",
"# use get_status() to get a detailed status for the current AmlCompute. \n",
"print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model on the remote compute\n",
"Now that we have the AmlCompute ready to go, let's run our distributed training job."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a project directory\n",
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"project_folder = './pytorch-distr-hvd'\n",
"os.makedirs(project_folder, exist_ok=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare training script\n",
"Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `pytorch_horovod_mnist.py`. In practice, you should be able to take any custom PyTorch training script as is and run it with Azure ML without having to modify your code.\n",
"\n",
"However, if you would like to use Azure ML's [metric logging](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#logging) capabilities, you will have to add a small amount of Azure ML logic inside your training script. In this example, at each logging interval, we will log the loss for that minibatch to our Azure ML run.\n",
"\n",
"To do so, in `pytorch_horovod_mnist.py`, we will first access the Azure ML `Run` object within the script:\n",
"```Python\n",
"from azureml.core.run import Run\n",
"run = Run.get_context()\n",
"```\n",
"Later within the script, we log the loss metric to our run:\n",
"```Python\n",
"run.log('loss', loss.item())\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once your script is ready, copy the training script `pytorch_horovod_mnist.py` into the project directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"\n",
"shutil.copy('pytorch_horovod_mnist.py', project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an experiment\n",
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"experiment_name = 'pytorch-distr-hvd'\n",
"experiment = Experiment(ws, name=experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an environment\n",
"\n",
"In this tutorial, we will use one of Azure ML's curated PyTorch environments for training. [Curated environments](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments#use-a-curated-environment) are available in your workspace by default. Specifically, we will use the PyTorch 1.6 GPU curated environment. The curated environment includes the `torch`, `torchvision` and `horovod` packages required by the training script."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"\n",
"pytorch_env = Environment.get(ws, name='AzureML-acpt-pytorch-1.13-cuda11.7')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure the training job\n",
"\n",
"Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on.\n",
"\n",
"In order to execute a distributed run using MPI/Horovod, you must create an `MpiConfiguration` object and pass it to the `distributed_job_config` parameter of the ScriptRunConfig constructor. The below code will configure a 2-node distributed job running one process per node. If you would also like to run multiple processes per node (i.e. if your cluster SKU has multiple GPUs), additionally specify the `process_count_per_node` parameter in `MpiConfiguration` (the default is `1`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"from azureml.core.runconfig import MpiConfiguration\n",
"\n",
"src = ScriptRunConfig(source_directory=project_folder,\n",
" script='pytorch_horovod_mnist.py',\n",
" compute_target=compute_target,\n",
" environment=pytorch_env,\n",
" distributed_job_config=MpiConfiguration(node_count=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit job\n",
"Run your experiment by submitting your ScriptRunConfig object. Note that this call is asynchronous."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = experiment.submit(src)\n",
"print(run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monitor your run\n",
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"\n",
"RunDetails(run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, you can block until the script has completed training before running more code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True) # this provides a verbose log"
]
}
],
"metadata": {
"authors": [
{
"name": "ninhu"
}
],
"category": "training",
"compute": [
"AML Compute"
],
"datasets": [
"MNIST"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"PyTorch"
],
"friendly_name": "Distributed PyTorch",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
},
"tags": [
"None"
],
"task": "Train a model using the distributed training via Horovod"
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,181 +0,0 @@
# Copyright (c) 2017, PyTorch contributors
# Modifications copyright (C) Microsoft Corporation
# Licensed under the BSD license
# Adapted from https://github.com/uber/horovod/blob/master/examples/pytorch_mnist.py
from __future__ import print_function
import argparse
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import torch.utils.data.distributed
import horovod.torch as hvd
from azureml.core.run import Run
# get the Azure ML run object
run = Run.get_context()
print("Torch version:", torch.__version__)
# Training settings
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument('--batch-size', type=int, default=64, metavar='N',
help='input batch size for training (default: 64)')
parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
help='input batch size for testing (default: 1000)')
parser.add_argument('--epochs', type=int, default=10, metavar='N',
help='number of epochs to train (default: 10)')
parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
help='learning rate (default: 0.01)')
parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
help='SGD momentum (default: 0.5)')
parser.add_argument('--no-cuda', action='store_true', default=False,
help='disables CUDA training')
parser.add_argument('--seed', type=int, default=42, metavar='S',
help='random seed (default: 42)')
parser.add_argument('--log-interval', type=int, default=10, metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--fp16-allreduce', action='store_true', default=False,
help='use fp16 compression during allreduce')
args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()
hvd.init()
torch.manual_seed(args.seed)
if args.cuda:
# Horovod: pin GPU to local rank.
torch.cuda.set_device(hvd.local_rank())
torch.cuda.manual_seed(args.seed)
kwargs = {}
# MNIST dataset
datasets.MNIST.resources = [
("train-images-idx3-ubyte.gz",
"f68b3c2dcbeaaa9fbdd348bbdeb94873"),
("train-labels-idx1-ubyte.gz",
"d53e105ee54ea40749a09fcbcd1e9432"),
("t10k-images-idx3-ubyte.gz",
"9fb629c4189551a2d022fa330f9573f3"),
("t10k-labels-idx1-ubyte.gz",
"ec29112dd5afa0611ce80d1b7f02629c")
]
train_dataset = \
datasets.MNIST('data-%d' % hvd.rank(), train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
train_sampler = torch.utils.data.distributed.DistributedSampler(
train_dataset, num_replicas=hvd.size(), rank=hvd.rank())
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, sampler=train_sampler, **kwargs)
test_dataset = \
datasets.MNIST('data-%d' % hvd.rank(), train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
test_sampler = torch.utils.data.distributed.DistributedSampler(
test_dataset, num_replicas=hvd.size(), rank=hvd.rank())
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=args.test_batch_size,
sampler=test_sampler, **kwargs)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)
model = Net()
if args.cuda:
# Move model to GPU.
model.cuda()
# Horovod: broadcast parameters.
hvd.broadcast_parameters(model.state_dict(), root_rank=0)
# Horovod: scale learning rate by the number of GPUs.
optimizer = optim.SGD(model.parameters(), lr=args.lr * hvd.size(),
momentum=args.momentum)
# Horovod: (optional) compression algorithm.
compression = hvd.Compression.fp16 if args.fp16_allreduce else hvd.Compression.none
# Horovod: wrap optimizer with DistributedOptimizer.
optimizer = hvd.DistributedOptimizer(optimizer,
named_parameters=model.named_parameters(),
compression=compression)
def train(epoch):
model.train()
train_sampler.set_epoch(epoch)
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_sampler),
100. * batch_idx / len(train_loader), loss.item()))
# log the loss to the Azure ML run
run.log('loss', loss.item())
def metric_average(val, name):
tensor = torch.tensor(val)
avg_tensor = hvd.allreduce(tensor, name=name)
return avg_tensor.item()
def test():
model.eval()
test_loss = 0.
test_accuracy = 0.
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, size_average=False).item()
# get the index of the max log-probability
pred = output.data.max(1, keepdim=True)[1]
test_accuracy += pred.eq(target.data.view_as(pred)).cpu().float().sum()
test_loss /= len(test_sampler)
test_accuracy /= len(test_sampler)
test_loss = metric_average(test_loss, 'avg_loss')
test_accuracy = metric_average(test_accuracy, 'avg_accuracy')
if hvd.rank() == 0:
print('\nTest set: Average loss: {:.4f}, Accuracy: {:.2f}%\n'.format(
test_loss, 100. * test_accuracy))
for epoch in range(1, args.epochs + 1):
train(epoch)
test()

View File

@@ -273,7 +273,7 @@
"source": [
"from azureml.core import Environment\n",
"\n",
"pytorch_env = Environment.get(ws, name='azureml-acpt-pytorch-1.13-cuda11.7')"
"pytorch_env = Environment.get(ws, name='azureml-acpt-pytorch-2.2-cuda12.1')"
]
},
{

View File

@@ -322,7 +322,7 @@
"source": [
"from azureml.core import Environment\n",
"\n",
"sklearn_env = Environment.get(ws, name='azureml-sklearn-1.0')"
"sklearn_env = Environment.get(ws, name='azureml-sklearn-1.5')"
]
},
{

View File

@@ -1,344 +0,0 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/ml-frameworks/tensorflow/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.png)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Distributed TensorFlow with Horovod\n",
"In this tutorial, you will train a model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n",
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../../configuration.ipynb) to:\n",
" * install the AML SDK\n",
" * create a workspace and its configuration file (`config.json`)\n",
"* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize workspace\n",
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.workspace import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep='\\n')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6s_v3', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You may want to register datasets using the register() method to your workspace so that the dataset can be shared with others, reused across various experiments, and referred to by name in your training script."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model on the remote compute"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an experiment\n",
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"experiment_name = 'tf-distr-hvd'\n",
"experiment = Experiment(ws, name=experiment_name)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an environment\n",
"\n",
"In this tutorial, we will use one of Azure ML's curated TensorFlow environments for training. [Curated environments](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments#use-a-curated-environment) are available in your workspace by default. Specifically, we will use the TensorFlow 1.13 GPU curated environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"\n",
"tf_env = Environment.get(ws, name='azureml-tensorflow-2.11-cuda11')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure the training job\n",
"\n",
"Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on.\n",
"\n",
"In order to execute a distributed run using MPI/Horovod, you must create an `MpiConfiguration` object and pass it to the `distributed_job_config` parameter of the ScriptRunConfig constructor. The below code will configure a 2-node distributed job running one process per node. If you would also like to run multiple processes per node (i.e. if your cluster SKU has multiple GPUs), additionally specify the `process_count_per_node` parameter in `MpiConfiguration` (the default is `1`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"from azureml.core.runconfig import MpiConfiguration\n",
"\n",
"src = ScriptRunConfig(source_directory=\"src\",\n",
" script='train.py',\n",
" compute_target=compute_target,\n",
" environment=tf_env,\n",
" distributed_job_config=MpiConfiguration(node_count=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit job\n",
"Run your experiment by submitting your ScriptRunConfig object. Note that this call is asynchronous."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = experiment.submit(src)\n",
"print(run)\n",
"run.get_details()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monitor your run\n",
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, you can block until the script has completed training before running more code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
}
],
"metadata": {
"authors": [
{
"name": "minxia"
}
],
"category": "training",
"compute": [
"AML Compute"
],
"datasets": [
"None"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"TensorFlow"
],
"friendly_name": "Distributed training using TensorFlow with Horovod",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
},
"tags": [
"None"
],
"task": "Use the TensorFlow estimator to train a word2vec model"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,120 +0,0 @@
# Copyright 2019 Uber Technologies, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Script adapted from: https://github.com/horovod/horovod/blob/master/examples/tensorflow2_keras_mnist.py
# ==============================================================================
import tensorflow as tf
import horovod.tensorflow.keras as hvd
import os
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--learning-rate", "-lr", type=float, default=0.001)
parser.add_argument("--epochs", type=int, default=24)
parser.add_argument("--steps-per-epoch", type=int, default=500)
args = parser.parse_args()
# Horovod: initialize Horovod.
hvd.init()
# Horovod: pin GPU to be used to process local rank (one GPU per process)
gpus = tf.config.experimental.list_physical_devices("GPU")
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
if gpus:
tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], "GPU")
(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data(
path="mnist-%d.npz" % hvd.rank()
)
dataset = tf.data.Dataset.from_tensor_slices(
(
tf.cast(mnist_images[..., tf.newaxis] / 255.0, tf.float32),
tf.cast(mnist_labels, tf.int64),
)
)
dataset = dataset.repeat().shuffle(10000).batch(128)
mnist_model = tf.keras.Sequential(
[
tf.keras.layers.Conv2D(32, [3, 3], activation="relu"),
tf.keras.layers.Conv2D(64, [3, 3], activation="relu"),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation="softmax"),
]
)
# Horovod: adjust learning rate based on number of GPUs.
scaled_lr = args.learning_rate * hvd.size()
opt = tf.optimizers.Adam(scaled_lr)
# Horovod: add Horovod DistributedOptimizer.
opt = hvd.DistributedOptimizer(opt)
# Horovod: Specify `experimental_run_tf_function=False` to ensure TensorFlow
# uses hvd.DistributedOptimizer() to compute gradients.
mnist_model.compile(
loss=tf.losses.SparseCategoricalCrossentropy(),
optimizer=opt,
metrics=["accuracy"],
experimental_run_tf_function=False,
)
callbacks = [
# Horovod: broadcast initial variable states from rank 0 to all other processes.
# This is necessary to ensure consistent initialization of all workers when
# training is started with random weights or restored from a checkpoint.
hvd.callbacks.BroadcastGlobalVariablesCallback(0),
# Horovod: average metrics among workers at the end of every epoch.
#
# Note: This callback must be in the list before the ReduceLROnPlateau,
# TensorBoard or other metrics-based callbacks.
hvd.callbacks.MetricAverageCallback(),
# Horovod: using `lr = 1.0 * hvd.size()` from the very beginning leads to worse final
# accuracy. Scale the learning rate `lr = 1.0` ---> `lr = 1.0 * hvd.size()` during
# the first three epochs. See https://arxiv.org/abs/1706.02677 for details.
hvd.callbacks.LearningRateWarmupCallback(
warmup_epochs=3, initial_lr=scaled_lr, verbose=1
),
]
# Horovod: save checkpoints only on worker 0 to prevent other workers from corrupting them.
if hvd.rank() == 0:
output_dir = "./outputs"
os.makedirs(output_dir, exist_ok=True)
callbacks.append(
tf.keras.callbacks.ModelCheckpoint(
os.path.join(output_dir, "checkpoint-{epoch}.h5")
)
)
# Horovod: write logs on worker 0.
verbose = 1 if hvd.rank() == 0 else 0
# Train the model.
# Horovod: adjust number of steps based on number of GPUs.
mnist_model.fit(
dataset,
steps_per_epoch=args.steps_per_epoch // hvd.size(),
callbacks=callbacks,
epochs=args.epochs,
verbose=verbose,
)

View File

@@ -1,190 +0,0 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import numpy as np
import argparse
import os
import re
import tensorflow as tf
import time
import glob
from azureml.core import Run
from utils import load_data
from tensorflow.keras import Model, layers
# Create TF Model.
class NeuralNet(Model):
# Set layers.
def __init__(self):
super(NeuralNet, self).__init__()
# First hidden layer.
self.h1 = layers.Dense(n_h1, activation=tf.nn.relu)
# Second hidden layer.
self.h2 = layers.Dense(n_h2, activation=tf.nn.relu)
self.out = layers.Dense(n_outputs)
# Set forward pass.
def call(self, x, is_training=False):
x = self.h1(x)
x = self.h2(x)
x = self.out(x)
if not is_training:
# Apply softmax when not training.
x = tf.nn.softmax(x)
return x
def cross_entropy_loss(y, logits):
# Convert labels to int 64 for tf cross-entropy function.
y = tf.cast(y, tf.int64)
# Apply softmax to logits and compute cross-entropy.
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
# Average loss across the batch.
return tf.reduce_mean(loss)
# Accuracy metric.
def accuracy(y_pred, y_true):
# Predicted class is the index of highest score in prediction vector (i.e. argmax).
correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))
return tf.reduce_mean(tf.cast(correct_prediction, tf.float32), axis=-1)
# Optimization process.
def run_optimization(x, y):
# Wrap computation inside a GradientTape for automatic differentiation.
with tf.GradientTape() as g:
# Forward pass.
logits = neural_net(x, is_training=True)
# Compute loss.
loss = cross_entropy_loss(y, logits)
# Variables to update, i.e. trainable variables.
trainable_variables = neural_net.trainable_variables
# Compute gradients.
gradients = g.gradient(loss, trainable_variables)
# Update W and b following gradients.
optimizer.apply_gradients(zip(gradients, trainable_variables))
print("TensorFlow version:", tf.__version__)
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str, dest='data_folder', default='data', help='data folder mounting point')
parser.add_argument('--batch-size', type=int, dest='batch_size', default=128, help='mini batch size for training')
parser.add_argument('--first-layer-neurons', type=int, dest='n_hidden_1', default=128,
help='# of neurons in the first layer')
parser.add_argument('--second-layer-neurons', type=int, dest='n_hidden_2', default=128,
help='# of neurons in the second layer')
parser.add_argument('--learning-rate', type=float, dest='learning_rate', default=0.01, help='learning rate')
parser.add_argument('--resume-from', type=str, default=None,
help='location of the model or checkpoint files from where to resume the training')
args = parser.parse_args()
previous_model_location = args.resume_from
# You can also use environment variable to get the model/checkpoint files location
# previous_model_location = os.path.expandvars(os.getenv("AZUREML_DATAREFERENCE_MODEL_LOCATION", None))
data_folder = args.data_folder
print('Data folder:', data_folder)
# load train and test set into numpy arrays
# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.
X_train = load_data(glob.glob(os.path.join(data_folder, '**/train-images-idx3-ubyte.gz'),
recursive=True)[0], False) / np.float32(255.0)
X_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-images-idx3-ubyte.gz'),
recursive=True)[0], False) / np.float32(255.0)
y_train = load_data(glob.glob(os.path.join(data_folder, '**/train-labels-idx1-ubyte.gz'),
recursive=True)[0], True).reshape(-1)
y_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-labels-idx1-ubyte.gz'),
recursive=True)[0], True).reshape(-1)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep='\n')
training_set_size = X_train.shape[0]
n_inputs = 28 * 28
n_h1 = args.n_hidden_1
n_h2 = args.n_hidden_2
n_outputs = 10
learning_rate = args.learning_rate
n_epochs = 20
batch_size = args.batch_size
# Build neural network model.
neural_net = NeuralNet()
# Stochastic gradient descent optimizer.
optimizer = tf.optimizers.SGD(learning_rate)
# start an Azure ML run
run = Run.get_context()
if previous_model_location:
# Restore variables from latest checkpoint.
checkpoint = tf.train.Checkpoint(model=neural_net, optimizer=optimizer)
checkpoint_file_path = tf.train.latest_checkpoint(previous_model_location)
checkpoint.restore(checkpoint_file_path)
checkpoint_filename = os.path.basename(checkpoint_file_path)
num_found = re.search(r'\d+', checkpoint_filename)
if num_found:
start_epoch = int(num_found.group(0))
print("Resuming from epoch {}".format(str(start_epoch)))
start_time = time.perf_counter()
for epoch in range(0, n_epochs):
# randomly shuffle training set
indices = np.random.permutation(training_set_size)
X_train = X_train[indices]
y_train = y_train[indices]
# batch index
b_start = 0
b_end = b_start + batch_size
for _ in range(training_set_size // batch_size):
# get a batch
X_batch, y_batch = X_train[b_start: b_end], y_train[b_start: b_end]
# update batch index for the next batch
b_start = b_start + batch_size
b_end = min(b_start + batch_size, training_set_size)
# train
run_optimization(X_batch, y_batch)
# evaluate training set
pred = neural_net(X_batch, is_training=False)
acc_train = accuracy(pred, y_batch)
# evaluate validation set
pred = neural_net(X_test, is_training=False)
acc_val = accuracy(pred, y_test)
# log accuracies
run.log('training_acc', np.float(acc_train))
run.log('validation_acc', np.float(acc_val))
print(epoch, '-- Training accuracy:', acc_train, '\b Validation accuracy:', acc_val)
# Save checkpoints in the "./outputs" folder so that they are automatically uploaded into run history.
checkpoint_dir = './outputs/'
checkpoint = tf.train.Checkpoint(model=neural_net, optimizer=optimizer)
if epoch % 2 == 0:
checkpoint.save(checkpoint_dir)
run.log('final_acc', np.float(acc_val))
os.makedirs('./outputs/model', exist_ok=True)
# files saved in the "./outputs" folder are automatically uploaded into run history
# this is workaround for https://github.com/tensorflow/tensorflow/issues/33913 and will be fixed once we move to >tf2.1
neural_net._set_inputs(X_train)
tf.saved_model.save(neural_net, './outputs/model/')
stop_time = time.perf_counter()
training_time = (stop_time - start_time) * 1000
print("Total time in milliseconds for training: {}".format(str(training_time)))

View File

@@ -1,27 +0,0 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import gzip
import numpy as np
import struct
# load compressed MNIST gz files and return numpy arrays
def load_data(filename, label=False):
with gzip.open(filename) as gz:
struct.unpack('I', gz.read(4))
n_items = struct.unpack('>I', gz.read(4))
if not label:
n_rows = struct.unpack('>I', gz.read(4))[0]
n_cols = struct.unpack('>I', gz.read(4))[0]
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
res = res.reshape(n_items[0], n_rows * n_cols)
else:
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
res = res.reshape(n_items[0], 1)
return res
# one-hot encode a 1-D array
def one_hot_encode(array, num_of_classes):
return np.eye(num_of_classes)[array.reshape(-1)]

View File

@@ -33,8 +33,6 @@ Using these samples, you will learn how to do the following.
| File/folder | Description |
|-------------------|--------------------------------------------|
| [cartpole_ci.ipynb](cartpole-on-compute-instance/cartpole_ci.ipynb) | Notebook to train a Cartpole playing agent on an Azure Machine Learning Compute Instance |
| [cartpole_sc.ipynb](cartpole-on-single-compute/cartpole_sc.ipynb) | Notebook to train a Cartpole playing agent on an Azure Machine Learning Compute Cluster (single node) |
| [pong_rllib.ipynb](atari-on-distributed-compute/pong_rllib.ipynb) | Notebook for distributed training of Pong agent using RLlib on multiple compute targets |
## Prerequisites

View File

@@ -1,768 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/reinforcement-learning/cartpole-on-compute-instance/cartpole_ci.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reinforcement Learning in Azure Machine Learning - Cartpole Problem on Compute Instance\n",
"\n",
"Reinforcement Learning in Azure Machine Learning is a managed service for running reinforcement learning training and simulation. With Reinforcement Learning in Azure Machine Learning, data scientists can start developing reinforcement learning systems on one machine, and scale to compute targets with 100s of nodes if needed.\n",
"\n",
"This example shows how to use Reinforcement Learning in Azure Machine Learning to train a Cartpole playing agent on a compute instance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cartpole problem\n",
"\n",
"Cartpole, also known as [Inverted Pendulum](https://en.wikipedia.org/wiki/Inverted_pendulum), is a pendulum with a center of mass above its pivot point. This formation is essentially unstable and will easily fall over but can be kept balanced by applying appropriate horizontal forces to the pivot point.\n",
"\n",
"<table style=\"width:50%\">\n",
" <tr>\n",
" <th>\n",
" <img src=\"./images/cartpole.png\" alt=\"Cartpole image\" /> \n",
" </th>\n",
" </tr>\n",
" <tr>\n",
" <th><p>Fig 1. Cartpole problem schematic description (from <a href=\"https://towardsdatascience.com/cartpole-introduction-to-reinforcement-learning-ed0eb5b58288\">towardsdatascience.com</a>).</p></th>\n",
" </tr>\n",
"</table>\n",
"\n",
"The goal here is to train an agent to keep the cartpole balanced by applying appropriate forces to the pivot point.\n",
"\n",
"See [this video](https://www.youtube.com/watch?v=XiigTGKZfks) for a real-world demonstration of cartpole problem."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisite\n",
"The user should have completed the Azure Machine Learning Tutorial: [Get started creating your first ML experiment with the Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup). You will need to make sure that you have a valid subscription ID, a resource group, and an Azure Machine Learning workspace. All datastores and datasets you use should be associated with your workspace."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up Development Environment\n",
"The following subsections show typical steps to setup your development environment. Setup includes:\n",
"\n",
"* Connecting to a workspace to enable communication between your local machine and remote resources\n",
"* Creating an experiment to track all your runs\n",
"* Using a Compute Instance as compute target"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Azure Machine Learning SDK \n",
"Display the Azure Machine Learning SDK version."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062935076
}
},
"outputs": [],
"source": [
"import azureml.core\n",
"print(\"Azure Machine Learning SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get Azure Machine Learning workspace\n",
"Get a reference to an existing Azure Machine Learning workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062936280
}
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.location, ws.resource_group, sep = ' | ')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use Compute Instance as compute target\n",
"\n",
"A compute target is a designated compute resource where you run your training and simulation scripts. This location may be your local machine or a cloud-based compute resource. For more information see [What are compute targets in Azure Machine Learning?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target)\n",
"\n",
"The code below shows how to use current compute instance as a compute target. First some helper functions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062936485
}
},
"outputs": [],
"source": [
"import os.path\n",
"\n",
"# Get information about the currently running compute instance (notebook VM), like its name and prefix.\n",
"def load_nbvm():\n",
" if not os.path.isfile(\"/mnt/azmnt/.nbvm\"):\n",
" return None\n",
" with open(\"/mnt/azmnt/.nbvm\", 'r') as nbvm_file:\n",
" return { key:value for (key, value) in [ line.strip().split('=') for line in nbvm_file if '=' in line ] }\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we use these helper functions to get a handle to current compute instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062937126
}
},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeInstance\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"import random\n",
"import string\n",
"\n",
"# Load current compute instance info\n",
"current_compute_instance = load_nbvm()\n",
"\n",
"# For this demo, let's use the current compute instance as the compute target, if available\n",
"if current_compute_instance:\n",
" print(\"Current compute instance:\", current_compute_instance)\n",
" instance_name = current_compute_instance['instance']\n",
"else:\n",
" # Compute instance name needs to be unique across all existing compute instances within an Azure region\n",
" instance_name = \"cartpole-ci-\" + \"\".join(random.choice(string.ascii_lowercase) for _ in range(5))\n",
" try:\n",
" instance = ComputeInstance(workspace=ws, name=instance_name)\n",
" print('Found existing instance, use it.')\n",
" except ComputeTargetException:\n",
" print(\"Creating new compute instance...\")\n",
" compute_config = ComputeInstance.provisioning_configuration(\n",
" vm_size='STANDARD_D2_V2'\n",
" )\n",
" instance = ComputeInstance.create(ws, instance_name, compute_config)\n",
" instance.wait_for_completion(show_output=True)\n",
" print(\"Instance name:\", instance_name)\n",
"\n",
"compute_target = ws.compute_targets[instance_name]\n",
"\n",
"print(\"Compute target status:\")\n",
"print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Azure Machine Learning experiment\n",
"Create an experiment to track the runs in your workspace. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062937499
}
},
"outputs": [],
"source": [
"from azureml.core.experiment import Experiment\n",
"\n",
"experiment_name = 'CartPole-v1-CI'\n",
"experiment = Experiment(workspace=ws, name=experiment_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064044718
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"import os\n",
"import time\n",
"\n",
"ray_environment_name = 'cartpole-ray-ci'\n",
"ray_environment_dockerfile_path = os.path.join(os.getcwd(), 'files', 'docker', 'Dockerfile')\n",
"\n",
"# Build environment image\n",
"ray_environment = Environment. \\\n",
" from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path). \\\n",
" register(workspace=ws)\n",
"ray_env_build_details = ray_environment.build(workspace=ws)\n",
"\n",
"ray_env_build_details.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train Cartpole Agent\n",
"In this section, we show how to use Azure Machine Learning jobs and Ray/RLlib framework to train a cartpole playing agent. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create reinforcement learning training run\n",
"\n",
"The code below submits the training run using a `ScriptRunConfig`. By providing the\n",
"command to run the training, and a `RunConfig` object configured with your\n",
"compute target, number of nodes, and environment image to use."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064046594
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core import RunConfiguration, ScriptRunConfig, Experiment\n",
"from azureml.core.runconfig import DockerConfiguration, RunConfiguration\n",
"\n",
"config_name = 'cartpole-ppo.yaml'\n",
"script_name = 'cartpole_training.py'\n",
"script_arguments = [\n",
" '--config', config_name\n",
"]\n",
"\n",
"aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
"aml_run_config_ml.target = compute_target\n",
"aml_run_config_ml.node_count = 1\n",
"aml_run_config_ml.environment = ray_environment\n",
"\n",
"training_config = ScriptRunConfig(source_directory='./files',\n",
" script=script_name,\n",
" arguments=script_arguments,\n",
" run_config = aml_run_config_ml\n",
" )\n",
"training_run = experiment.submit(training_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training configuration\n",
"\n",
"This is the training configuration (in yaml) that we use to train an agent to solve the CartPole problem using\n",
"the PPO algorithm.\n",
"\n",
"```yaml\n",
"cartpole-ppo:\n",
" env: CartPole-v1\n",
" run: PPO\n",
" stop:\n",
" episode_reward_mean: 475\n",
" time_total_s: 300\n",
" checkpoint_config:\n",
" checkpoint_frequency: 2\n",
" checkpoint_at_end: true\n",
" config:\n",
" # Works for both torch and tf.\n",
" framework: torch\n",
" gamma: 0.99\n",
" lr: 0.0003\n",
" num_workers: 1\n",
" observation_filter: MeanStdFilter\n",
" num_sgd_iter: 6\n",
" vf_loss_coeff: 0.01\n",
" model:\n",
" fcnet_hiddens: [32]\n",
" fcnet_activation: linear\n",
" vf_share_layers: true\n",
" enable_connectors: true\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monitor experiment\n",
"Azure Machine Learning provides a Jupyter widget to show the status of an experiment run. You could use this widget to monitor the status of the runs.\n",
"\n",
"You can click on the link under **Status** to see the details of a child run. It will also show the metrics being logged."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064049813
}
},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"\n",
"RunDetails(training_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stop the run\n",
"\n",
"To stop the run, call `training_run.cancel()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064050024
}
},
"outputs": [],
"source": [
"# Uncomment line below to cancel the run\n",
"# training_run.cancel()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Wait for completion\n",
"Wait for the run to complete before proceeding.\n",
"\n",
"**Note: The run may take a few minutes to complete.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064304728
}
},
"outputs": [],
"source": [
"training_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate Trained Agent and See Results\n",
"\n",
"We can evaluate a previously trained policy using the `cartpole_rollout.py` helper script provided by RLlib (see [Evaluating Trained Policies](https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies) for more details). Here we use an adaptation of this script to reconstruct a policy from a checkpoint taken and saved during training. We took these checkpoints by setting `checkpoint-freq` and `checkpoint-at-end` parameters above.\n",
"\n",
"In this section we show how to get access to these checkpoints data, and then how to use them to evaluate the trained policy."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a dataset of training artifacts\n",
"To evaluate a trained policy (a checkpoint) we need to make the checkpoint accessible to the rollout script.\n",
"We can use the Run API to download policy training artifacts (saved model and checkpoints) to local compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305251
}
},
"outputs": [],
"source": [
"from os import path\n",
"from distutils import dir_util\n",
"\n",
"training_artifacts_path = path.join(\"logs\", \"cartpole-ppo\")\n",
"print(\"Training artifacts path:\", training_artifacts_path)\n",
"\n",
"if path.exists(training_artifacts_path):\n",
" dir_util.remove_tree(training_artifacts_path)\n",
"\n",
"# Download run artifacts to local compute\n",
"training_run.download_files(training_artifacts_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's find the checkpoints and the last checkpoint number."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305283
}
},
"outputs": [],
"source": [
"# A helper function to find all of the checkpoint directories located within a larger directory tree\n",
"def find_checkpoints(file_path):\n",
" print(\"Looking in path:\", file_path)\n",
" checkpoints = []\n",
" for root, dirs, files in os.walk(file_path):\n",
" trimmed_root = root[len(file_path)+1:]\n",
" for name in dirs:\n",
" if name.startswith('checkpoint_'):\n",
" checkpoints.append(path.join(trimmed_root, name))\n",
" return checkpoints"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305305
}
},
"outputs": [],
"source": [
"# Find checkpoints and last checkpoint number\n",
"checkpoint_files = find_checkpoints(training_artifacts_path)\n",
"\n",
"last_checkpoint_path = None\n",
"last_checkpoint_number = -1\n",
"for checkpoint_file in checkpoint_files:\n",
" checkpoint_number = int(os.path.basename(checkpoint_file).split('_')[1])\n",
" if checkpoint_number > last_checkpoint_number:\n",
" last_checkpoint_path = checkpoint_file\n",
" last_checkpoint_number = checkpoint_number\n",
"\n",
"print(\"Last checkpoint number:\", last_checkpoint_number)\n",
"print(\"Last checkpoint path:\", last_checkpoint_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we upload checkpoints to default datastore and create a file dataset. This dataset will be used to pass in the checkpoints to the rollout script."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305331
}
},
"outputs": [],
"source": [
"# Upload the checkpoint files and create a DataSet\n",
"from azureml.data.dataset_factory import FileDatasetFactory\n",
"\n",
"datastore = ws.get_default_datastore()\n",
"checkpoint_ds = FileDatasetFactory.upload_directory(training_artifacts_path, (datastore, 'cartpole_checkpoints_' + training_run.id), overwrite=False, show_progress=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To verify, we can print out the number (and paths) of all the files in the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305353
}
},
"outputs": [],
"source": [
"artifacts_paths = checkpoint_ds.to_path()\n",
"print(\"Number of files in dataset:\", len(artifacts_paths))\n",
"\n",
"# Uncomment line below to print all file paths\n",
"#print(\"Artifacts dataset file paths: \", artifacts_paths)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate Trained Agent and See Results\n",
"\n",
"We can evaluate a previously trained policy using the `cartpole_rollout.py` helper script provided by RLlib (see [Evaluating Trained Policies](https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies) for more details). Here we use an adaptation of this script to reconstruct a policy from a checkpoint taken and saved during training. We took these checkpoints by setting `checkpoint-freq` and `checkpoint-at-end` parameters above.\n",
"In this section we show how to use these checkpoints to evaluate the trained policy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305371
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"ray_environment_name = 'cartpole-ray-ci'\n",
"\n",
"experiment_name = 'CartPole-v1-CI'\n",
"\n",
"experiment = Experiment(workspace=ws, name=experiment_name)\n",
"ray_environment = Environment.get(workspace=ws, name=ray_environment_name)\n",
"\n",
"script_name = 'cartpole_rollout.py'\n",
"script_arguments = [\n",
" '--steps', '2000',\n",
" '--checkpoint', last_checkpoint_path,\n",
" '--algo', 'PPO',\n",
" '--render', 'false',\n",
" '--dataset_path', checkpoint_ds.as_named_input('dataset_path').as_mount()\n",
"]\n",
"\n",
"aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
"aml_run_config_ml.target = compute_target\n",
"aml_run_config_ml.node_count = 1\n",
"aml_run_config_ml.environment = ray_environment\n",
"aml_run_config_ml.data\n",
"\n",
"rollout_config = ScriptRunConfig(\n",
" source_directory='./files',\n",
" script=script_name,\n",
" arguments=script_arguments,\n",
" run_config = aml_run_config_ml\n",
" )\n",
" \n",
"rollout_run = experiment.submit(rollout_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And then, similar to the training section, we can monitor the real-time progress of the rollout run and its chid as follows. If you browse logs of the child run you can see the evaluation results recorded in std_log_process_0.txt file. Note that you may need to wait several minutes before these results become available."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305399
}
},
"outputs": [],
"source": [
"RunDetails(rollout_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wait for completion of the rollout run, or you may cancel the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305419
}
},
"outputs": [],
"source": [
"# Uncomment line below to cancel the run\n",
"#rollout_run.cancel()\n",
"rollout_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleaning up\n",
"For your convenience, below you can find code snippets to clean up any resources created as part of this tutorial that you don't wish to retain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683064305437
}
},
"outputs": [],
"source": [
"# To archive the created experiment:\n",
"#exp.archive()\n",
"\n",
"# To delete created compute instance\n",
"if not current_compute_instance:\n",
" compute_target.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next\n",
"This example was about running Reinforcement Learning in Azure Machine Learning (Ray/RLlib Framework) on a compute instance. Please see [Cartpole Problem on Single Compute](../cartpole-on-single-compute/cartpole_sc.ipynb)\n",
"example which uses Ray RLlib to train a Cartpole playing agent on a single node remote compute.\n"
]
}
],
"metadata": {
"authors": [
{
"name": "adrosa"
},
{
"name": "hoazari"
}
],
"categories": [
"how-to-use-azureml",
"reinforcement-learning"
],
"kernel_info": {
"name": "python38-azureml"
},
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
},
"microsoft": {
"host": {
"AzureML": {
"notebookHasBeenCompleted": true
}
},
"ms_spell_check": {
"ms_spell_check_language": "en"
}
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
"nteract": {
"version": "nteract-front-end@1.0.0"
},
"vscode": {
"interpreter": {
"hash": "00c28698cbad9eaca051e9759b1181630e646922505b47b4c6352eb5aa72ddfc"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -1,23 +0,0 @@
cartpole-ppo:
env: CartPole-v1
run: PPO
stop:
episode_reward_mean: 475
time_total_s: 300
checkpoint_config:
checkpoint_frequency: 2
checkpoint_at_end: true
config:
# Works for both torch and tf.
framework: torch
gamma: 0.99
lr: 0.0003
num_workers: 1
observation_filter: MeanStdFilter
num_sgd_iter: 6
vf_loss_coeff: 0.01
model:
fcnet_hiddens: [32]
fcnet_activation: linear
vf_share_layers: true
enable_connectors: true

View File

@@ -1,108 +0,0 @@
import os
import sys
import argparse
from ray.rllib.evaluate import RolloutSaver, rollout
from ray_on_aml.core import Ray_On_AML
import ray.cloudpickle as cloudpickle
from ray.tune.utils import merge_dicts
from ray.tune.registry import get_trainable_cls, _global_registry, ENV_CREATOR
from azureml.core import Run
from utils import callbacks
import collections
import copy
import gymnasium as gym
import json
from pathlib import Path
def run_rollout(checkpoint, algo, render, steps, episodes):
config_dir = os.path.dirname(checkpoint)
config_path = os.path.join(config_dir, "params.pkl")
config = None
# Try parent directory.
if not os.path.exists(config_path):
config_path = os.path.join(config_dir, "../params.pkl")
# Load the config from pickled.
if os.path.exists(config_path):
with open(config_path, "rb") as f:
config = cloudpickle.load(f)
# If no pkl file found, require command line `--config`.
else:
raise ValueError("Could not find params.pkl in either the checkpoint dir or its parent directory")
# Make sure worker 0 has an Env.
config["create_env_on_driver"] = True
# Merge with `evaluation_config` (first try from command line, then from
# pkl file).
evaluation_config = copy.deepcopy(config.get("evaluation_config", {}))
config = merge_dicts(config, evaluation_config)
env = config.get("env")
# Make sure we have evaluation workers.
if not config.get("evaluation_num_workers"):
config["evaluation_num_workers"] = config.get("num_workers", 0)
if not config.get("evaluation_duration"):
config["evaluation_duration"] = 1
# Hard-override this as it raises a warning by Algorithm otherwise.
# Makes no sense anyways, to have it set to None as we don't call
# `Algorithm.train()` here.
config["evaluation_interval"] = 1
# Rendering settings.
config["render_env"] = render
# Create the Algorithm from config.
cls = get_trainable_cls(algo)
algorithm = cls(env=env, config=config)
# Load state from checkpoint, if provided.
if checkpoint:
algorithm.restore(checkpoint)
# Do the actual rollout.
with RolloutSaver(
outfile=None,
use_shelve=False,
write_update_file=False,
target_steps=steps,
target_episodes=episodes,
save_info=False,
) as saver:
rollout(algorithm, env, steps, episodes, saver, not render)
algorithm.stop()
if __name__ == "__main__":
# Start ray head (single node)
ray_on_aml = Ray_On_AML()
ray = ray_on_aml.getRay()
if ray:
parser = argparse.ArgumentParser()
parser.add_argument('--dataset_path', required=True, help='Path to artifacts dataset')
parser.add_argument('--checkpoint', required=True, help='Name of checkpoint file directory')
parser.add_argument('--algo', required=True, help='Name of RL algorithm')
parser.add_argument('--render', default=False, required=False, help='True to render')
parser.add_argument('--steps', required=False, type=int, help='Number of steps to run')
parser.add_argument('--episodes', required=False, type=int, help='Number of episodes to run')
args = parser.parse_args()
# Get a handle to run
run = Run.get_context()
# Get handles to the tarining artifacts dataset and mount path
dataset_path = run.input_datasets['dataset_path']
# Find checkpoint file to be evaluated
checkpoint = os.path.join(dataset_path, args.checkpoint)
print('Checkpoint:', checkpoint)
# Start rollout
ray.init(address='auto')
run_rollout(checkpoint, args.algo, args.render, args.steps, args.episodes)

View File

@@ -1,34 +0,0 @@
from ray_on_aml.core import Ray_On_AML
import yaml
from ray.tune.tune import run_experiments
from utils import callbacks
import argparse
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--config', help='Path to yaml configuration file')
args = parser.parse_args()
ray_on_aml = Ray_On_AML()
ray = ray_on_aml.getRay()
if ray: # in the headnode
ray.init(address="auto")
print("Configuring run from file: ", args.config)
experiment_config = None
with open(args.config, "r") as file:
experiment_config = yaml.safe_load(file)
# Set local_dir in each experiment configuration to ensure generated logs get picked up
# Also set monitor to ensure videos are captured
for experiment_name, experiment in experiment_config.items():
experiment["storage_path"] = "./logs"
experiment['config']['monitor'] = True
print(f'Config: {experiment_config}')
trials = run_experiments(
experiment_config,
callbacks=[callbacks.TrialCallback()],
verbose=2
)
else:
print("in worker node")

View File

@@ -1,27 +0,0 @@
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
RUN pip install ray-on-aml==0.2.4 \
ray==2.4.0 \
ray[rllib]==2.4.0 \
mlflow==2.3.1 \
azureml-defaults==1.50.0 \
azureml-dataset-runtime[fuse,pandas]==1.50.0 \
azureml-contrib-reinforcementlearning==1.50.0 \
gputil==1.4.0 \
scipy==1.9.1 \
pyglet==2.0.6 \
cloudpickle==2.2.1 \
tensorflow==2.11.0 \
tensorflow-probability==0.19.0 \
torch \
tabulate==0.9.0 \
dm_tree==0.1.8 \
lz4==4.3.2 \
psutil==5.9.4 \
setproctitle==1.3.2 \
pygame==2.1.0 \
gymnasium[classic_control]==0.26.3 \
gym[classic_control]==0.26.2
# Display the exact versions we have installed
RUN pip freeze

View File

@@ -1,22 +0,0 @@
'''RLlib callbacks module:
Common callback methods to be passed to RLlib trainer.
'''
from azureml.core import Run
from ray import tune
from ray.tune import Callback
from ray.air import session
class TrialCallback(Callback):
def on_trial_result(self, iteration, trials, trial, result, **info):
'''Callback on train result to record metrics returned by trainer.
'''
run = Run.get_context()
run.log(
name='episode_reward_mean',
value=result["episode_reward_mean"])
run.log(
name='episodes_total',
value=result["episodes_total"])

View File

@@ -1,13 +0,0 @@
'''Misc module:
Miscellaneous helper functions and utilities.
'''
import os
import glob
# Helper function to find a file or folder path
def find_path(name, path_prefix):
for root, _, _ in os.walk(path_prefix):
if glob.glob(os.path.join(root, name)):
return root

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

View File

@@ -1,917 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/reinforcement-learning/cartpole_on_single_compute/cartpole_sc.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reinforcement Learning in Azure Machine Learning - Cartpole Problem on Single Compute\n",
"\n",
"Reinforcement Learning in Azure Machine Learning is a managed service for running reinforcement learning training and simulation. With Reinforcement Learning in Azure Machine Learning, data scientists can start developing reinforcement learning systems on one machine, and scale to compute targets with 100s of nodes if needed.\n",
"\n",
"This example shows how to use Reinforcement Learning in Azure Machine Learning to train a Cartpole playing agent on a single compute. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cartpole problem\n",
"\n",
"Cartpole, also known as [Inverted Pendulum](https://en.wikipedia.org/wiki/Inverted_pendulum), is a pendulum with a center of mass above its pivot point. This formation is essentially unstable and will easily fall over but can be kept balanced by applying appropriate horizontal forces to the pivot point.\n",
"\n",
"<table style=\"width:50%\">\n",
" <tr>\n",
" <th>\n",
" <img src=\"./images/cartpole.png\" alt=\"Cartpole image\" /> \n",
" </th>\n",
" </tr>\n",
" <tr>\n",
" <th><p>Fig 1. Cartpole problem schematic description (from <a href=\"https://towardsdatascience.com/cartpole-introduction-to-reinforcement-learning-ed0eb5b58288\">towardsdatascience.com</a>).</p></th>\n",
" </tr>\n",
"</table>\n",
"\n",
"The goal here is to train an agent to keep the cartpole balanced by applying appropriate forces to the pivot point.\n",
"\n",
"See [this video](https://www.youtube.com/watch?v=XiigTGKZfks) for a real-world demonstration of cartpole problem."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisite\n",
"The user should have completed the Azure Machine Learning Tutorial: [Get started creating your first ML experiment with the Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup). You will need to make sure that you have a valid subscription ID, a resource group, and an Azure Machine Learning workspace. All datastores and datasets you use should be associated with your workspace."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up Development Environment\n",
"The following subsections show typical steps to setup your development environment. Setup includes:\n",
"\n",
"* Connecting to a workspace to enable communication between your local machine and remote resources\n",
"* Creating an experiment to track all your runs\n",
"* Creating a remote compute target to use for training"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Azure Machine Learning SDK \n",
"Display the Azure Machine Learning SDK version."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683056824182
}
},
"outputs": [],
"source": [
"import azureml.core\n",
"\n",
"print(\"Azure Machine Learning SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get Azure Machine Learning workspace\n",
"Get a reference to an existing Azure Machine Learning workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683056825821
}
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.location, ws.resource_group, sep = ' | ')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a new compute resource or attach an existing one\n",
"\n",
"A compute target is a designated compute resource where you run your training and simulation scripts. This location may be your local machine or a cloud-based compute resource. The code below shows how to create a cloud-based compute target. For more information see [What are compute targets in Azure Machine Learning?](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target)\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"**Note: Creation of a compute resource can take several minutes**. Please make sure to change `STANDARD_D2_V2` to a [size available in your region](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=virtual-machines)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683056826903
}
},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"import os\n",
"\n",
"# Choose a name and maximum size for your cluster\n",
"compute_name = \"cpu-cluster-d2\"\n",
"compute_min_nodes = 0\n",
"compute_max_nodes = 4\n",
"vm_size = \"STANDARD_D2_V2\"\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" print(\"Found an existing compute target of name: \" + compute_name)\n",
" compute_target = ws.compute_targets[compute_name]\n",
" # Note: you may want to make sure compute_target is of type AmlCompute \n",
"else:\n",
" print(\"Creating new compute target...\")\n",
" provisioning_config = AmlCompute.provisioning_configuration(\n",
" vm_size=vm_size,\n",
" min_nodes=compute_min_nodes, \n",
" max_nodes=compute_max_nodes)\n",
" \n",
" # Create the cluster\n",
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n",
"print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Azure Machine Learning experiment\n",
"Create an experiment to track the runs in your workspace. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683056827252
}
},
"outputs": [],
"source": [
"from azureml.core.experiment import Experiment\n",
"\n",
"experiment_name = 'CartPole-v1-SC'\n",
"experiment = Experiment(workspace=ws, name=experiment_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1646417962898
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"import os\n",
"\n",
"ray_environment_name = 'cartpole-ray-sc'\n",
"ray_environment_dockerfile_path = os.path.join(os.getcwd(), 'files', 'docker', 'Dockerfile')\n",
"\n",
"# Build environment image\n",
"ray_environment = Environment. \\\n",
" from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path). \\\n",
" register(workspace=ws)\n",
"ray_env_build_details = ray_environment.build(workspace=ws)\n",
"\n",
"ray_env_build_details.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train Cartpole Agent\n",
"In this section, we show how to use Azure Machine Learning jobs and Ray/RLlib framework to train a cartpole playing agent. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create reinforcement learning training run\n",
"\n",
"The code below submits the training run using a `ScriptRunConfig`. By providing the\n",
"command to run the training, and a `RunConfig` object configured with your\n",
"compute target, number of nodes, and environment image to use."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683059658819
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core import RunConfiguration, ScriptRunConfig, Experiment\n",
"from azureml.core.runconfig import DockerConfiguration, RunConfiguration\n",
"\n",
"config_name = 'cartpole-ppo.yaml'\n",
"script_name = 'cartpole_training.py'\n",
"video_capture = True\n",
"script_arguments = [\n",
" '--config', config_name\n",
"]\n",
"command=[\"python\", script_name, *script_arguments]\n",
"\n",
"aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
"aml_run_config_ml.target = compute_target\n",
"aml_run_config_ml.node_count = 1\n",
"aml_run_config_ml.environment = ray_environment\n",
"\n",
"if video_capture:\n",
" command = [\"xvfb-run -s '-screen 0 640x480x16 -ac +extension GLX +render' \"] + command\n",
" aml_run_config_ml.environment_variables[\"SDL_VIDEODRIVER\"] = \"dummy\"\n",
"\n",
"training_config = ScriptRunConfig(source_directory='./files',\n",
" command=command,\n",
" run_config = aml_run_config_ml\n",
" )\n",
"training_run = experiment.submit(training_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training configuration\n",
"\n",
"This is the training configuration (in yaml) that we use to train an agent to solve the CartPole problem using\n",
"the PPO algorithm.\n",
"\n",
"```yaml\n",
"cartpole-ppo:\n",
" env: CartPole-v1\n",
" run: PPO\n",
" stop:\n",
" episode_reward_mean: 475\n",
" time_total_s: 300\n",
" checkpoint_config:\n",
" checkpoint_frequency: 2\n",
" checkpoint_at_end: true\n",
" config:\n",
" # Works for both torch and tf.\n",
" framework: torch\n",
" gamma: 0.99\n",
" lr: 0.0003\n",
" num_workers: 1\n",
" observation_filter: MeanStdFilter\n",
" num_sgd_iter: 6\n",
" vf_loss_coeff: 0.01\n",
" model:\n",
" fcnet_hiddens: [32]\n",
" fcnet_activation: linear\n",
" vf_share_layers: true\n",
" enable_connectors: true\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monitor experiment\n",
"\n",
"Azure Machine Learning provides a Jupyter widget to show the status of an experiment run. You could use this widget to monitor the status of the runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683060289002
}
},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"\n",
"RunDetails(training_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stop the run\n",
"To stop the run, call `training_run.cancel()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Uncomment line below to cancel the run\n",
"# training_run.cancel()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Wait for completion\n",
"Wait for the run to complete before proceeding.\n",
"\n",
"**Note: The length of the run depends on the provisioning time of the compute target and it may take several minutes to complete.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683060297005
}
},
"outputs": [],
"source": [
"training_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get access to training artifacts\n",
"We can simply use run id to get a handle to an in-progress or a previously concluded run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683060517858
}
},
"outputs": [],
"source": [
"from azureml.core import Run\n",
"\n",
"run_id = training_run.id # Or set to run id of a completed run (e.g. 'rl-cartpole-v0_1587572312_06e04ace_head')\n",
"run = Run(experiment, run_id=run_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use the Run API to download policy training artifacts (saved model and checkpoints) to local compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683060521847
}
},
"outputs": [],
"source": [
"from os import path\n",
"from distutils import dir_util\n",
"\n",
"training_artifacts_path = path.join(\"logs\", \"cartpole-ppo\")\n",
"print(\"Training artifacts path:\", training_artifacts_path)\n",
"\n",
"if path.exists(training_artifacts_path):\n",
" dir_util.remove_tree(training_artifacts_path)\n",
"\n",
"# Download run artifacts to local compute\n",
"training_run.download_files(training_artifacts_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Display movies of selected training episodes\n",
"\n",
"Ray creates video output of selected training episodes in mp4 format. Here we will display two of these, i.e. the first and the last recorded videos, so you could see the improvement of the agent after training.\n",
"\n",
"First we introduce a few helper functions: a function to download the movies from our dataset, another one to find mp4 movies in a local directory, and one more to display a downloaded movie."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683060867182
}
},
"outputs": [],
"source": [
"import shutil\n",
"\n",
"# A helper function to find movies in a directory\n",
"def find_movies(movie_path):\n",
" print(\"Looking in path:\", movie_path)\n",
" mp4_movies = []\n",
" for root, _, files in os.walk(movie_path):\n",
" for name in files:\n",
" if name.endswith('.mp4'):\n",
" mp4_movies.append(path.join(root, name))\n",
" print('Found {} movies'.format(len(mp4_movies)))\n",
"\n",
" return mp4_movies\n",
"\n",
"\n",
"# A helper function to display a movie\n",
"from IPython.core.display import Video\n",
"from IPython.display import display\n",
"def display_movie(movie_file):\n",
" display(Video(movie_file, embed=True, html_attributes='controls'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look for the downloaded movies in the local directory and sort them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683060871682
}
},
"outputs": [],
"source": [
"mp4_files = find_movies(training_artifacts_path)\n",
"mp4_files.sort()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Display a movie of the first training episode. This is how the agent performs with no training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683060900828
}
},
"outputs": [],
"source": [
"first_movie = mp4_files[0] if len(mp4_files) > 0 else None\n",
"print(\"First movie:\", first_movie)\n",
"\n",
"if first_movie:\n",
" display_movie(first_movie)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Display a movie of the last training episode. This is how a fully-trained agent performs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683060914790
}
},
"outputs": [],
"source": [
"last_movie = mp4_files[-1] if len(mp4_files) > 0 else None\n",
"print(\"Last movie:\", last_movie)\n",
"\n",
"if last_movie:\n",
" display_movie(last_movie)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate Trained Agent and See Results\n",
"\n",
"We can evaluate a previously trained policy using the `rollout.py` helper script provided by RLlib (see [Evaluating Trained Policies](https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies) for more details). Here we use an adaptation of this script to reconstruct a policy from a checkpoint taken and saved during training. We took these checkpoints by setting `checkpoint-freq` and `checkpoint-at-end` parameters above.\n",
"In this section we show how to use these checkpoints to evaluate the trained policy."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluate a trained policy\n",
"In this section, we submit another job, to evalute a trained policy. The entrypoint for this job is\n",
"`cartpole-rollout.py` script, and we we pass the checkpoints dataset to this script as a dataset refrence.\n",
"\n",
"We are using script parameters to pass in the same algorithm and the same environment used during training. We also specify the checkpoint number of the checkpoint we wish to evaluate, `checkpoint-number`, and number of the steps we shall run the rollout, `steps`.\n",
"\n",
"The training artifacts dataset will be accessible to the rollout script as a mounted folder. The mounted folder and the checkpoint number, passed in via `checkpoint-number`, will be used to create a path to the checkpoint we are going to evaluate. The created checkpoint path then will be passed into RLlib rollout script for evaluation.\n",
"\n",
"Let's find the checkpoints and the last checkpoint number first."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683061167899
}
},
"outputs": [],
"source": [
"# A helper function to find all of the checkpoint directories located within a larger directory tree\n",
"def find_checkpoints(file_path):\n",
" print(\"Looking in path:\", file_path)\n",
" checkpoints = []\n",
" for root, dirs, files in os.walk(file_path):\n",
" trimmed_root = root[len(file_path)+1:]\n",
" for name in dirs:\n",
" if name.startswith('checkpoint_'):\n",
" checkpoints.append(path.join(trimmed_root, name))\n",
" return checkpoints"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683061170184
}
},
"outputs": [],
"source": [
"# Find checkpoints and last checkpoint number\n",
"checkpoint_files = find_checkpoints(training_artifacts_path)\n",
"\n",
"last_checkpoint_path = None\n",
"last_checkpoint_number = -1\n",
"for checkpoint_file in checkpoint_files:\n",
" checkpoint_number = int(os.path.basename(checkpoint_file).split('_')[1])\n",
" if checkpoint_number > last_checkpoint_number:\n",
" last_checkpoint_path = checkpoint_file\n",
" last_checkpoint_number = checkpoint_number\n",
"\n",
"print(\"Last checkpoint number:\", last_checkpoint_number)\n",
"print(\"Last checkpoint path:\", last_checkpoint_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683061176740
}
},
"outputs": [],
"source": [
"# Upload the checkpoint files and create a DataSet\n",
"from azureml.data.dataset_factory import FileDatasetFactory\n",
"\n",
"datastore = ws.get_default_datastore()\n",
"checkpoint_ds = FileDatasetFactory.upload_directory(training_artifacts_path, (datastore, 'cartpole_checkpoints_' + training_run.id), overwrite=False, show_progress=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can submit the training run using a `ScriptRunConfig`. By providing the\n",
"command to run the training, and a `RunConfig` object configured w"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062377151
}
},
"outputs": [],
"source": [
"ray_environment_name = 'cartpole-ray-sc'\n",
"\n",
"experiment_name = 'CartPole-v1-SC'\n",
"training_algorithm = 'PPO'\n",
"rl_environment = 'CartPole-v1'\n",
"\n",
"experiment = Experiment(workspace=ws, name=experiment_name)\n",
"ray_environment = Environment.get(workspace=ws, name=ray_environment_name)\n",
"\n",
"script_name = 'cartpole_rollout.py'\n",
"script_arguments = [\n",
" '--steps', '2000',\n",
" '--checkpoint', last_checkpoint_path,\n",
" '--algo', 'PPO',\n",
" '--render', 'true',\n",
" '--dataset_path', checkpoint_ds.as_named_input('dataset_path').as_mount()\n",
"]\n",
"\n",
"aml_run_config_ml = RunConfiguration(communicator='OpenMpi')\n",
"aml_run_config_ml.target = compute_target\n",
"aml_run_config_ml.node_count = 1\n",
"aml_run_config_ml.environment = ray_environment\n",
"aml_run_config_ml.data\n",
"\n",
"rollout_config = ScriptRunConfig(\n",
" source_directory='./files',\n",
" script=script_name,\n",
" arguments=script_arguments,\n",
" run_config = aml_run_config_ml\n",
" )\n",
" \n",
"rollout_run = experiment.submit(rollout_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And then, similar to the training section, we can monitor the real-time progress of the rollout run and its chid as follows. If you browse logs of the child run you can see the evaluation results recorded in driver_log.txt file. Note that you may need to wait several minutes before these results become available."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062379999
}
},
"outputs": [],
"source": [
"RunDetails(rollout_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wait for completion of the rollout run before moving to the next section, or you may cancel the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062451723
}
},
"outputs": [],
"source": [
"# Uncomment line below to cancel the run\n",
"#rollout_run.cancel()\n",
"rollout_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Display movies of selected rollout episodes\n",
"\n",
"To display recorded movies first we download recorded videos to local machine. Here again we create a dataset of rollout artifacts and use the helper functions introduced above to download and displays rollout videos."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062747822
}
},
"outputs": [],
"source": [
"# Download rollout artifacts\n",
"rollout_artifacts_path = path.join(\"logs\", \"rollout\")\n",
"print(\"Rollout artifacts path:\", rollout_artifacts_path)\n",
"\n",
"if path.exists(rollout_artifacts_path):\n",
" dir_util.remove_tree(rollout_artifacts_path)\n",
"\n",
"# Download videos to local compute\n",
"rollout_run.download_files(\"logs/video\", output_directory = rollout_artifacts_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, similar to the training section, we look for the last video."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062752847
}
},
"outputs": [],
"source": [
"# Look for the downloaded movie in local directory\n",
"mp4_files = find_movies(rollout_artifacts_path)\n",
"mp4_files.sort()\n",
"last_movie = mp4_files[-1] if len(mp4_files) > 1 else None\n",
"print(\"Last movie:\", last_movie)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Display last video recorded during the rollout."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1683062763275
}
},
"outputs": [],
"source": [
"last_movie = mp4_files[-1] if len(mp4_files) > 0 else None\n",
"print(\"Last movie:\", last_movie)\n",
"\n",
"if last_movie:\n",
" display_movie(last_movie)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleaning up\n",
"For your convenience, below you can find code snippets to clean up any resources created as part of this tutorial that you don't wish to retain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# To archive the created experiment:\n",
"#exp.archive()\n",
"\n",
"# To delete the compute target:\n",
"#compute_target.delete()\n",
"\n",
"# To delete downloaded training artifacts\n",
"#if os.path.exists(training_artifacts_path):\n",
"# dir_util.remove_tree(training_artifacts_path)\n",
"\n",
"# To delete downloaded rollout videos\n",
"#if path.exists(rollout_artifacts_path):\n",
"# dir_util.remove_tree(rollout_artifacts_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next\n",
"This example was about running Reinforcement Learning in Azure Machine Learning (Ray/RLlib Framework) on a single compute. Please see [Pong Problem](../atari-on-distributed-compute/pong_rllib.ipynb)\n",
"example which uses Ray RLlib to train a Pong playing agent on a multi-node cluster."
]
}
],
"metadata": {
"authors": [
{
"name": "hoazari"
},
{
"name": "dasommer"
}
],
"categories": [
"how-to-use-azureml",
"reinforcement-learning"
],
"kernel_info": {
"name": "python38-azureml"
},
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
},
"microsoft": {
"host": {
"AzureML": {
"notebookHasBeenCompleted": true
}
},
"ms_spell_check": {
"ms_spell_check_language": "en"
}
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
"nteract": {
"version": "nteract-front-end@1.0.0"
},
"vscode": {
"interpreter": {
"hash": "00c28698cbad9eaca051e9759b1181630e646922505b47b4c6352eb5aa72ddfc"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -1,24 +0,0 @@
cartpole-ppo:
env: CartPole-v1
run: PPO
stop:
episode_reward_mean: 475
time_total_s: 300
checkpoint_config:
checkpoint_frequency: 2
checkpoint_at_end: true
config:
# Works for both torch and tf.
framework: torch
gamma: 0.99
lr: 0.0003
num_workers: 1
observation_filter: MeanStdFilter
num_sgd_iter: 6
vf_loss_coeff: 0.01
model:
fcnet_hiddens: [32]
fcnet_activation: linear
vf_share_layers: true
enable_connectors: true
render_env: true

View File

@@ -1,108 +0,0 @@
import os
import sys
import argparse
from ray.rllib.evaluate import RolloutSaver, rollout
from ray_on_aml.core import Ray_On_AML
import ray.cloudpickle as cloudpickle
from ray.tune.utils import merge_dicts
from ray.tune.registry import get_trainable_cls, _global_registry, ENV_CREATOR
from azureml.core import Run
from utils import callbacks
import collections
import copy
import gymnasium as gym
import json
from pathlib import Path
def run_rollout(checkpoint, algo, render, steps, episodes):
config_dir = os.path.dirname(checkpoint)
config_path = os.path.join(config_dir, "params.pkl")
config = None
# Try parent directory.
if not os.path.exists(config_path):
config_path = os.path.join(config_dir, "../params.pkl")
# Load the config from pickled.
if os.path.exists(config_path):
with open(config_path, "rb") as f:
config = cloudpickle.load(f)
# If no pkl file found, require command line `--config`.
else:
raise ValueError("Could not find params.pkl in either the checkpoint dir or its parent directory")
# Make sure worker 0 has an Env.
config["create_env_on_driver"] = True
# Merge with `evaluation_config` (first try from command line, then from
# pkl file).
evaluation_config = copy.deepcopy(config.get("evaluation_config", {}))
config = merge_dicts(config, evaluation_config)
env = config.get("env")
# Make sure we have evaluation workers.
if not config.get("evaluation_num_workers"):
config["evaluation_num_workers"] = config.get("num_workers", 0)
if not config.get("evaluation_duration"):
config["evaluation_duration"] = 1
# Hard-override this as it raises a warning by Algorithm otherwise.
# Makes no sense anyways, to have it set to None as we don't call
# `Algorithm.train()` here.
config["evaluation_interval"] = 1
# Rendering settings.
config["render_env"] = render
# Create the Algorithm from config.
cls = get_trainable_cls(algo)
algorithm = cls(env=env, config=config)
# Load state from checkpoint, if provided.
if checkpoint:
algorithm.restore(checkpoint)
# Do the actual rollout.
with RolloutSaver(
outfile=None,
use_shelve=False,
write_update_file=False,
target_steps=steps,
target_episodes=episodes,
save_info=False,
) as saver:
rollout(algorithm, env, steps, episodes, saver, not render)
algorithm.stop()
if __name__ == "__main__":
# Start ray head (single node)
ray_on_aml = Ray_On_AML()
ray = ray_on_aml.getRay()
if ray:
parser = argparse.ArgumentParser()
parser.add_argument('--dataset_path', required=True, help='Path to artifacts dataset')
parser.add_argument('--checkpoint', required=True, help='Name of checkpoint file directory')
parser.add_argument('--algo', required=True, help='Name of RL algorithm')
parser.add_argument('--render', default=False, required=False, help='True to render')
parser.add_argument('--steps', required=False, type=int, help='Number of steps to run')
parser.add_argument('--episodes', required=False, type=int, help='Number of episodes to run')
args = parser.parse_args()
# Get a handle to run
run = Run.get_context()
# Get handles to the tarining artifacts dataset and mount path
dataset_path = run.input_datasets['dataset_path']
# Find checkpoint file to be evaluated
checkpoint = os.path.join(dataset_path, args.checkpoint)
print('Checkpoint:', checkpoint)
# Start rollout
ray.init(address='auto')
run_rollout(checkpoint, args.algo, args.render, args.steps, args.episodes)

View File

@@ -1,34 +0,0 @@
from ray_on_aml.core import Ray_On_AML
import yaml
from ray.tune.tune import run_experiments
from utils import callbacks
import argparse
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--config', help='Path to yaml configuration file')
args = parser.parse_args()
ray_on_aml = Ray_On_AML()
ray = ray_on_aml.getRay()
if ray: # in the headnode
ray.init(address="auto")
print("Configuring run from file: ", args.config)
experiment_config = None
with open(args.config, "r") as file:
experiment_config = yaml.safe_load(file)
# Set local_dir in each experiment configuration to ensure generated logs get picked up
# Also set monitor to ensure videos are captured
for experiment_name, experiment in experiment_config.items():
experiment["storage_path"] = "./logs"
experiment['config']['monitor'] = True
print(f'Config: {experiment_config}')
trials = run_experiments(
experiment_config,
callbacks=[callbacks.TrialCallback()],
verbose=2
)
else:
print("in worker node")

View File

@@ -1,35 +0,0 @@
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
RUN apt-get update && apt-get install -y --no-install-recommends \
python-opengl \
rsync \
xvfb && \
apt-get clean -y && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /usr/share/man/*
RUN pip install ray-on-aml==0.2.4 \
ray==2.4.0 \
ray[rllib]==2.4.0 \
mlflow==2.3.1 \
azureml-defaults==1.50.0 \
azureml-dataset-runtime[fuse,pandas]==1.50.0 \
azureml-contrib-reinforcementlearning==1.50.0 \
gputil==1.4.0 \
scipy==1.9.1 \
pyglet==2.0.6 \
cloudpickle==2.2.1 \
tensorflow==2.11.0 \
tensorflow-probability==0.19.0 \
torch \
tabulate==0.9.0 \
dm_tree==0.1.8 \
lz4==4.3.2 \
psutil==5.9.4 \
setproctitle==1.3.2 \
pygame==2.1.0 \
gymnasium[classic_control]==0.26.3 \
gym[classic_control]==0.26.2
# Display the exact versions we have installed
RUN pip freeze

View File

@@ -1,22 +0,0 @@
'''RLlib callbacks module:
Common callback methods to be passed to RLlib trainer.
'''
from azureml.core import Run
from ray import tune
from ray.tune import Callback
from ray.air import session
class TrialCallback(Callback):
def on_trial_result(self, iteration, trials, trial, result, **info):
'''Callback on train result to record metrics returned by trainer.
'''
run = Run.get_context()
run.log(
name='episode_reward_mean',
value=result["episode_reward_mean"])
run.log(
name='episodes_total',
value=result["episodes_total"])

View File

@@ -1,13 +0,0 @@
'''Misc module:
Miscellaneous helper functions and utilities.
'''
import os
import glob
# Helper function to find a file or folder path
def find_path(name, path_prefix):
for root, _, _ in os.walk(path_prefix):
if glob.glob(os.path.join(root, name)):
return root

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

View File

@@ -1,17 +0,0 @@
# AzureML Responsible AI
AzureML Responsible AI empowers data scientists and developers to innovate responsibly with a growing set of tools including model interpretability and fairness.
Follow these sample notebooks to learn about the model interpretability and fairness integration in Azure:
<a name="samples"></a>
# Responsible AI Sample Notebooks
- **Visualize fairness metrics and model explanations**
- Dataset: [UCI Adult](https://archive.ics.uci.edu/ml/datasets/Adult)
- **[Jupyter Notebook](visualize-upload-loan-decision/rai-loan-decision.ipynb)**
- Train a model to predict annual income
- Generate fairness and interpretability explanations for the trained model
- Visualize the explanations in the notebook widget dashboard
- Upload the explanations to Azure to be viewed in AzureML studio

View File

@@ -1,718 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/responsible-ai/visualize-upload-loan-decision/rai-loan-decision.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Assess Fairness, Explore Interpretability, and Mitigate Fairness Issues \n",
"\n",
"This notebook demonstrates how to use [InterpretML](interpret.ml), [Fairlearn](fairlearn.org), and the [Responsible AI Widget's](https://github.com/microsoft/responsible-ai-widgets/) Fairness and Interpretability dashboards to understand a model trained on the Census dataset. This dataset is a classification problem - given a range of data about 32,000 individuals, predict whether their annual income is above or below fifty thousand dollars per year.\n",
"\n",
"For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan.\n",
"\n",
"We will first train a fairness-unaware predictor, load its global and local explanations, and use the interpretability and fairness dashboards to demonstrate how this model leads to unfair decisions (under a specific notion of fairness called *demographic parity*). We then mitigate unfairness by applying the `GridSearch` algorithm from `Fairlearn` package.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install required packages\n",
"\n",
"This notebook works with Fairlearn v0.7.0, but not with versions pre-v0.5.0. If needed, please uncomment and run the following cell:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %pip install --upgrade fairlearn>=0.6.2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After installing packages, you must close and reopen the notebook as well as restarting the kernel."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load and preprocess the dataset\n",
"\n",
"For simplicity, we import the dataset from the `shap` package, which contains the data in a cleaned format. We start by importing the various modules we're going to use:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairlearn.reductions import GridSearch\n",
"from fairlearn.reductions import DemographicParity\n",
"\n",
"from sklearn.compose import ColumnTransformer, make_column_selector\n",
"from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.metrics import accuracy_score\n",
"\n",
"import pandas as pd\n",
"\n",
"# SHAP Tabular Explainer\n",
"from interpret.ext.blackbox import MimicExplainer\n",
"from interpret.ext.glassbox import LGBMExplainableModel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now load and inspect the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from utilities import fetch_census_dataset\n",
"\n",
"dataset = fetch_census_dataset()\n",
"X_raw, y = dataset['data'], dataset['target']\n",
"X_raw[\"race\"].value_counts().to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to treat the sex of each individual as a protected attribute (where 0 indicates female and 1 indicates male), and in this particular case we are going separate this attribute out and drop it from the main data. We then perform some standard data preprocessing steps to convert the data into a format suitable for the ML algorithms"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sensitive_features = X_raw[['sex','race']]\n",
"\n",
"le = LabelEncoder()\n",
"y = le.fit_transform(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we split the data into training and test sets:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test, sensitive_features_train, sensitive_features_test = \\\n",
" train_test_split(X_raw, y, sensitive_features,\n",
" test_size = 0.2, random_state=0, stratify=y)\n",
"\n",
"# Work around indexing bug\n",
"X_train = X_train.reset_index(drop=True)\n",
"sensitive_features_train = sensitive_features_train.reset_index(drop=True)\n",
"X_test = X_test.reset_index(drop=True)\n",
"sensitive_features_test = sensitive_features_test.reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training a fairness-unaware predictor\n",
"\n",
"To show the effect of `Fairlearn` we will first train a standard ML predictor that does not incorporate fairness. For speed of demonstration, we use a simple logistic regression estimator from `sklearn`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"numeric_transformer = Pipeline(\n",
" steps=[\n",
" (\"impute\", SimpleImputer()),\n",
" (\"scaler\", StandardScaler()),\n",
" ]\n",
")\n",
"categorical_transformer = Pipeline(\n",
" [\n",
" (\"impute\", SimpleImputer(strategy=\"most_frequent\")),\n",
" (\"ohe\", OneHotEncoder(handle_unknown=\"ignore\")),\n",
" ]\n",
")\n",
"preprocessor = ColumnTransformer(\n",
" transformers=[\n",
" (\"num\", numeric_transformer, make_column_selector(dtype_exclude=\"category\")),\n",
" (\"cat\", categorical_transformer, make_column_selector(dtype_include=\"category\")),\n",
" ]\n",
")\n",
"\n",
"model = Pipeline(\n",
" steps=[\n",
" (\"preprocessor\", preprocessor),\n",
" (\n",
" \"classifier\",\n",
" LogisticRegression(solver=\"liblinear\", fit_intercept=True),\n",
" ),\n",
" ]\n",
")\n",
"\n",
"model.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate model explanations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using SHAP KernelExplainer\n",
"# clf.steps[-1][1] returns the trained classification model\n",
"explainer = MimicExplainer(model.steps[-1][1], \n",
" X_train,\n",
" LGBMExplainableModel,\n",
" features=X_raw.columns, \n",
" classes=['Rejected', 'Approved'],\n",
" transformations=preprocessor)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate global explanations\n",
"Explain overall model predictions (global explanation)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Explain the model based on a subset of 1000 rows\n",
"global_explanation = explainer.explain_global(X_test[:1000])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"global_explanation.get_feature_importance_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate local explanations\n",
"Explain local data points (individual instances)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can pass a specific data point or a group of data points to the explain_local function\n",
"# E.g., Explain the first data point in the test set\n",
"instance_num = 1\n",
"local_explanation = explainer.explain_local(X_test[:instance_num])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
"prediction_value = model.predict(X_test)[instance_num]\n",
"\n",
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
"print('local importance names: {}'.format(sorted_local_importance_names))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize model explanations\n",
"Load the interpretability visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, model, dataset=X_test[:1000], true_y=y_test[:1000])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can load this predictor into the Fairness dashboard, and examine how it is unfair:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assess model fairness \n",
"Load the fairness visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import FairnessDashboard\n",
"\n",
"y_pred = model.predict(X_test)\n",
"\n",
"FairnessDashboard(sensitive_features=sensitive_features_test,\n",
" y_true=y_test,\n",
" y_pred=y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the disparity in accuracy, we see that males have an error rate about three times greater than the females. More interesting is the disparity in opportunitiy - males are offered loans at three times the rate of females.\n",
"\n",
"Despite the fact that we removed the feature from the training data, our predictor still discriminates based on sex. This demonstrates that simply ignoring a protected attribute when fitting a predictor rarely eliminates unfairness. There will generally be enough other features correlated with the removed attribute to lead to disparate impact."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mitigation with Fairlearn (GridSearch)\n",
"\n",
"The `GridSearch` class in `Fairlearn` implements a simplified version of the exponentiated gradient reduction of [Agarwal et al. 2018](https://arxiv.org/abs/1803.02453). The user supplies a standard ML estimator, which is treated as a blackbox. `GridSearch` works by generating a sequence of relabellings and reweightings, and trains a predictor for each.\n",
"\n",
"For this example, we specify demographic parity (on the protected attribute of sex) as the fairness metric. Demographic parity requires that individuals are offered the opportunity (are approved for a loan in this example) independent of membership in the protected class (i.e., females and males should be offered loans at the same rate). We are using this metric for the sake of simplicity; in general, the appropriate fairness metric will not be obvious."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Fairlearn is not yet fully compatible with Pipelines, so we have to pass the estimator only\n",
"X_train_prep = preprocessor.transform(X_train).toarray()\n",
"X_test_prep = preprocessor.transform(X_test).toarray()\n",
"\n",
"sweep = GridSearch(LogisticRegression(solver=\"liblinear\", fit_intercept=True),\n",
" constraints=DemographicParity(),\n",
" grid_size=70)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our algorithms provide `fit()` and `predict()` methods, so they behave in a similar manner to other ML packages in Python. We do however have to specify two extra arguments to `fit()` - the column of protected attribute labels, and also the number of predictors to generate in our sweep.\n",
"\n",
"After `fit()` completes, we extract the full set of predictors from the `GridSearch` object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sweep.fit(X_train_prep, y_train,\n",
" sensitive_features=sensitive_features_train.sex)\n",
"\n",
"predictors = sweep.predictors_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could load these predictors into the Fairness dashboard now. However, the plot would be somewhat confusing due to their number. In this case, we are going to remove the predictors which are dominated in the error-disparity space by others from the sweep (note that the disparity will only be calculated for the sensitive feature). In general, one might not want to do this, since there may be other considerations beyond the strict optimization of error and disparity (of the given protected attribute)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fairlearn.metrics import demographic_parity_difference\n",
"\n",
"accuracies, disparities = [], []\n",
"\n",
"for predictor in predictors:\n",
" y_pred = predictor.predict(X_train_prep)\n",
" # accuracy_metric_frame = MetricFrame(accuracy_score, y_train, predictor.predict(X_train_prep), sensitive_features=sensitive_features_train.sex)\n",
" # selection_rate_metric_frame = MetricFrame(selection_rate, y_train, predictor.predict(X_train_prep), sensitive_features=sensitive_features_train.sex)\n",
" accuracies.append(accuracy_score(y_train, y_pred))\n",
" disparities.append(demographic_parity_difference(y_train,\n",
" y_pred,\n",
" sensitive_features=sensitive_features_train.sex))\n",
" \n",
"all_results = pd.DataFrame({\"predictor\": predictors, \"accuracy\": accuracies, \"disparity\": disparities})\n",
"\n",
"all_models_dict = {\"unmitigated\": model.steps[-1][1]}\n",
"dominant_models_dict = {\"unmitigated\": model.steps[-1][1]}\n",
"base_name_format = \"grid_{0}\"\n",
"row_id = 0\n",
"for row in all_results.itertuples():\n",
" model_name = base_name_format.format(row_id)\n",
" all_models_dict[model_name] = row.predictor\n",
" accuracy_for_lower_or_eq_disparity = all_results[\"accuracy\"][all_results[\"disparity\"] <= row.disparity]\n",
" if row.accuracy >= accuracy_for_lower_or_eq_disparity.max():\n",
" dominant_models_dict[model_name] = row.predictor\n",
" row_id = row_id + 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can construct predictions for all the models, and also for the dominant models:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dashboard_all = {}\n",
"for name, predictor in all_models_dict.items():\n",
" value = predictor.predict(X_test_prep)\n",
" dashboard_all[name] = value\n",
" \n",
"dominant_all = {}\n",
"for name, predictor in dominant_models_dict.items():\n",
" dominant_all[name] = predictor.predict(X_test_prep)\n",
"\n",
"FairnessDashboard(sensitive_features=sensitive_features_test, \n",
" y_true=y_test,\n",
" y_pred=dominant_all)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can look at just the dominant models in the dashboard:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see a Pareto front forming - the set of predictors which represent optimal tradeoffs between accuracy and disparity in predictions. In the ideal case, we would have a predictor at (1,0) - perfectly accurate and without any unfairness under demographic parity (with respect to the protected attribute \"sex\"). The Pareto front represents the closest we can come to this ideal based on our data and choice of estimator. Note the range of the axes - the disparity axis covers more values than the accuracy, so we can reduce disparity substantially for a small loss in accuracy.\n",
"\n",
"By clicking on individual models on the plot, we can inspect their metrics for disparity and accuracy in greater detail. In a real example, we would then pick the model which represented the best trade-off between accuracy and disparity given the relevant business constraints."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AzureML integration\n",
"\n",
"We will now go through a brief example of the AzureML integration.\n",
"\n",
"The required package can be installed via:\n",
"\n",
"```\n",
"pip install azureml-contrib-fairness\n",
"pip install azureml-interpret\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connect to workspace\n",
"\n",
"Just like in the previous tutorials, we will need to connect to a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py).\n",
"\n",
"The following code will allow you to create a workspace if you don't already have one created. You must have an Azure subscription to create a workspace:\n",
"\n",
"```python\n",
"from azureml.core import Workspace\n",
"ws = Workspace.create(name='myworkspace',\n",
" subscription_id='<azure-subscription-id>',\n",
" resource_group='myresourcegroup',\n",
" create_resource_group=True,\n",
" location='eastus2')\n",
"```\n",
"\n",
"**If you are running this on a Notebook VM, you can import the existing workspace.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Registering models\n",
"\n",
"The fairness dashboard is designed to integrate with registered models, so we need to do this for the models we want in the Studio portal. The assumption is that the names of the models specified in the dashboard dictionary correspond to the `id`s (i.e. `<name>:<version>` pairs) of registered models in the workspace."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we register each of the models in the `dominant_all` dictionary into the workspace. For this, we have to save each model to a file, and then register that file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import joblib\n",
"import os\n",
"from azureml.core import Model, Experiment\n",
"\n",
"os.makedirs('models', exist_ok=True)\n",
"def register_model(name, model):\n",
" print(\"Registering \", name)\n",
" model_path = \"models/{0}.pkl\".format(name)\n",
" joblib.dump(value=model, filename=model_path)\n",
" registered_model = Model.register(model_path=model_path,\n",
" model_name=name,\n",
" workspace=ws)\n",
" print(\"Registered \", registered_model.id)\n",
" return registered_model.id\n",
"\n",
"model_name_id_mapping = dict()\n",
"for name, model in dominant_all.items():\n",
" m_id = register_model(name, model)\n",
" model_name_id_mapping[name] = m_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, produce new predictions dictionaries, with the updated names:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dominant_all_ids = dict()\n",
"for name, y_pred in dominant_all.items():\n",
" dominant_all_ids[model_name_id_mapping[name]] = y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uploading a dashboard\n",
"\n",
"We create a _dashboard dictionary_ using Fairlearn's `metrics` package. The `_create_group_metric_set` method has arguments similar to the Dashboard constructor, except that the sensitive features are passed as a dictionary (to ensure that names are available), and we must specify the type of prediction. Note that we use the `dashboard_registered` dictionary we just created:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf = { 'sex': sensitive_features_test.sex, 'race': sensitive_features_test.race }\n",
"\n",
"from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
"\n",
"dash_dict_all = _create_group_metric_set(y_true=y_test,\n",
" predictions=dominant_all_ids,\n",
" sensitive_features=sf,\n",
" prediction_type='binary_classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we import our `contrib` package which contains the routine to perform the upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can create an Experiment, then a Run, and upload our dashboard to it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, 'responsible-ai-loan-decision')\n",
"print(exp)\n",
"\n",
"run = exp.start_logging()\n",
"try:\n",
" dashboard_title = \"Upload MultiAsset from Grid Search with Census Data Notebook\"\n",
" upload_id = upload_dashboard_dictionary(run,\n",
" dash_dict_all,\n",
" dashboard_name=dashboard_title)\n",
" print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
"\n",
" downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
"finally:\n",
" run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uploading explanations\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.interpret import ExplanationClient\n",
"\n",
"client = ExplanationClient.from_run(run)\n",
"client.upload_model_explanation(global_explanation, comment = \"census data global explanation\")"
]
}
],
"metadata": {
"authors": [
{
"name": "chgrego"
}
],
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,73 +0,0 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
"""Utilities for azureml-contrib-fairness notebooks."""
import arff
from collections import OrderedDict
from contextlib import closing
import gzip
import pandas as pd
from sklearn.utils import Bunch
from time import sleep
def _is_gzip_encoded(_fsrc):
return _fsrc.info().get('Content-Encoding', '') == 'gzip'
_categorical_columns = [
'workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country'
]
def fetch_census_dataset():
"""Fetch the Adult Census Dataset
This uses a particular URL for the Adult Census dataset. The code
is a simplified version of fetch_openml() in sklearn.
The data are copied from:
https://openml.org/data/v1/download/1595261.gz
(as of 2021-03-31)
"""
dataset_path = "1595261.gz"
try:
file_stream = gzip.GzipFile(filename=dataset_path, mode='rb')
with closing(file_stream):
def _stream_generator(response):
for line in response:
yield line.decode('utf-8')
stream = _stream_generator(file_stream)
data = arff.load(stream)
except Exception as exc:
raise Exception("Could not load dataset from {} with exception {}".format(dataset_path, exc))
attributes = OrderedDict(data['attributes'])
arff_columns = list(attributes)
raw_df = pd.DataFrame(data=data['data'], columns=arff_columns)
target_column_name = 'class'
target = raw_df.pop(target_column_name)
for col_name in _categorical_columns:
dtype = pd.api.types.CategoricalDtype(attributes[col_name])
raw_df[col_name] = raw_df[col_name].astype(dtype, copy=False)
result = Bunch()
result.data = raw_df
result.target = target
return result

View File

@@ -101,7 +101,7 @@
"\n",
"# Check core SDK version number\n",
"\n",
"print(\"This notebook was created using SDK version 1.56.0, you are currently running version\", azureml.core.VERSION)"
"print(\"This notebook was created using SDK version 1.59.0, you are currently running version\", azureml.core.VERSION)"
]
},
{

View File

@@ -186,8 +186,7 @@
"\n",
"# Specify conda dependencies with scikit-learn and temporary pointers to mlflow extensions\n",
"cd = CondaDependencies.create(\n",
" conda_packages=[\"scikit-learn\", \"matplotlib\"],\n",
" pip_packages=[\"azureml-mlflow\", \"pandas\", \"numpy\"]\n",
" pip_packages=[\"azureml-mlflow\", \"scikit-learn\", \"matplotlib\", \"pandas\", \"numpy\", \"protobuf==5.28.3\"]\n",
" )\n",
"\n",
"env.python.conda_dependencies = cd"

View File

@@ -1,466 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/datadrift-tutorial/datadrift-quickdemo.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Analyze data drift in Azure Machine Learning datasets \n",
"\n",
"In this tutorial, you will setup a data drift monitor on a weather dataset to:\n",
"\n",
"&#x2611; Analyze historical data for drift\n",
"\n",
"&#x2611; Setup a monitor to recieve email alerts if data drift is detected going forward\n",
"\n",
"If your workspace is Enterprise level, view and exlpore the results in the Azure Machine Learning studio. The video below shows the results from this tutorial. \n",
"\n",
"![gif](media/video.gif)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Compute instance, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) if you haven't already established your connection to the AzureML Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print('SDK version:', azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"ws"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup target and baseline datasets\n",
"\n",
"Setup the baseline and target datasets. The baseline will be used to compare each time slice of the target dataset, which is sampled by a given frequency. For further details, see [our documentation](http://aka.ms/datadrift). \n",
"\n",
"The next few cells will:\n",
" * get the default datastore\n",
" * upload the `weather-data` to the datastore\n",
" * create the Tabular dataset from the data\n",
" * add the timeseries trait by specifying the timestamp column `datetime`\n",
" * register the dataset\n",
" * create the baseline as a time slice of the target dataset\n",
" * optionally, register the baseline dataset\n",
" \n",
"The folder `weather-data` contains weather data from the [NOAA Integrated Surface Data](https://azure.microsoft.com/services/open-datasets/catalog/noaa-integrated-surface-data/) filtered down to to station names containing the string 'FLORIDA' to reduce the size of data. See `get_data.py` to see how this data is curated and modify as desired. This script may take a long time to run, hence the data is provided in the `weather-data` folder for this demo."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# use default datastore\n",
"dstore = ws.get_default_datastore()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# upload weather data\n",
"dstore.upload('weather-data', 'datadrift-data', overwrite=True, show_progress=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# import Dataset class\n",
"from azureml.core import Dataset\n",
"\n",
"# create target dataset \n",
"target = Dataset.Tabular.from_parquet_files(dstore.path('datadrift-data/**/data.parquet'))\n",
"# set the timestamp column\n",
"target = target.with_timestamp_columns('datetime')\n",
"# register the target dataset\n",
"target = target.register(ws, 'target')\n",
"# retrieve the dataset from the workspace by name\n",
"target = Dataset.get_by_name(ws, 'target')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# import datetime \n",
"from datetime import datetime\n",
"\n",
"# set baseline dataset as January 2019 weather data\n",
"baseline = Dataset.Tabular.from_parquet_files(dstore.path('datadrift-data/2019/01/data.parquet'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# optionally, register the baseline dataset. if skipped, an unregistered dataset will be used\n",
"#baseline = baseline.register(ws, 'baseline')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create compute target\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"Create an Azure Machine Learning compute cluster to run the data drift monitor and associated runs. The below cell will create a compute cluster named `'cpu-cluster'`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"\n",
"compute_name = 'cpu-cluster'\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('found compute target. just use it. ' + compute_name)\n",
"else:\n",
" print('creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2', min_nodes=0, max_nodes=2)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout.\n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n",
" # For a more detailed view of current AmlCompute status, use get_status()\n",
" print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create data drift monitor\n",
"\n",
"See [our documentation](http://aka.ms/datadrift) for a complete description for all of the parameters. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"datadrift-remarks-sample"
]
},
"outputs": [],
"source": [
"from azureml.datadrift import DataDriftDetector, AlertConfiguration\n",
"\n",
"alert_config = AlertConfiguration(['user@contoso.com']) # replace with your email to recieve alerts from the scheduled pipeline after enabling\n",
"\n",
"monitor = DataDriftDetector.create_from_datasets(ws, 'weather-monitor', baseline, target, \n",
" compute_target='cpu-cluster', # compute target for scheduled pipeline and backfills \n",
" frequency='Week', # how often to analyze target data\n",
" feature_list=None, # list of features to detect drift on\n",
" drift_threshold=None, # threshold from 0 to 1 for email alerting\n",
" latency=0, # SLA in hours for target data to arrive in the dataset\n",
" alert_config=alert_config) # email addresses to send alert"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Update data drift monitor\n",
"\n",
"Many settings of the data drift monitor can be updated after creation. In this demo, we will update the `drift_threshold` and `feature_list`. See [our documentation](http://aka.ms/datadrift) for details on which settings can be changed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get monitor by name\n",
"monitor = DataDriftDetector.get_by_name(ws, 'weather-monitor')\n",
"\n",
"# create feature list - need to exclude columns that naturally drift or increment over time, such as year, day, index\n",
"columns = list(baseline.take(1).to_pandas_dataframe())\n",
"exclude = ['year', 'day', 'version', '__index_level_0__']\n",
"features = [col for col in columns if col not in exclude]\n",
"\n",
"# update the feature list\n",
"monitor = monitor.update(feature_list=features)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyze historical data and backfill\n",
"\n",
"You can use the `backfill` method to:\n",
" * analyze historical data\n",
" * backfill metrics after updating the settings (mainly the feature list)\n",
" * backfill metrics for failed runs\n",
" \n",
"The below cells will run two backfills that will produce data drift results for 2019 weather data, with January used as the baseline in the monitor. The output can be seen from the `show` method after the runs have completed, or viewed from the Azure Machine Learning studio for Enterprise workspaces.\n",
"\n",
"![Drift results](media/drift-results.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"jupyter": {
"source_hidden": true
}
},
"source": [
">**Tip!** When starting with the data drift capability, start by backfilling on a small section of data to get initial results. Update the feature list as needed by removing columns that are causing drift, but can be ignored, and backfill this section of data until satisfied with the results. Then, backfill on a larger slice of data and/or set the alert configuration, threshold, and enable the schedule to recieve alerts to drift on your dataset. All of this can be done through the UI (Enterprise) or Python SDK."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Although it depends on many factors, the below backfill should typically take less than 20 minutes to run. Results will show as soon as they become available, not when the backfill is completed, so you may begin to see some metrics in a few minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# backfill for one month\n",
"backfill_start_date = datetime(2019, 9, 1)\n",
"backfill_end_date = datetime(2019, 10, 1)\n",
"backfill = monitor.backfill(backfill_start_date, backfill_end_date)\n",
"backfill"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Query metrics and show results in Python\n",
"\n",
"The below cell will plot some key data drift metrics, and can be used to query the results. Run `help(monitor.get_output)` for specifics on the object returned."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# make sure the backfill has completed\n",
"backfill.wait_for_completion(wait_post_processing=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get results from Python SDK (wait for backfills or monitor runs to finish)\n",
"results, metrics = monitor.get_output(start_time=datetime(year=2019, month=9, day=1))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot the results from Python SDK \n",
"monitor.show(backfill_start_date, backfill_end_date)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enable the monitor's pipeline schedule\n",
"\n",
"Turn on a scheduled pipeline which will anlayze the target dataset for drift every `frequency`. Use the latency parameter to adjust the start time of the pipeline. For instance, if it takes 24 hours for my data processing pipelines for data to arrive in the target dataset, set latency to 24. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable the pipeline schedule and recieve email alerts\n",
"monitor.enable_schedule()\n",
"\n",
"# disable the pipeline schedule \n",
"#monitor.disable_schedule()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete compute target\n",
"\n",
"Do not delete the compute target if you intend to keep using it for the data drift monitor scheduled runs or otherwise. If the minimum nodes are set to 0, it will scale down soon after jobs are completed, and scale up the next time the cluster is needed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# optionally delete the compute target\n",
"#compute_target.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete the DataDriftDetector\n",
"\n",
"Invoking the `delete()` method on the object deletes the the drift monitor permanently and cannot be undone. You will no longer be able to find it in the UI and the `list()` or `get()` methods. The object on which delete() was called will have its state set to deleted and name suffixed with deleted. The baseline and target datasets and model data that was collected, if any, are not deleted. The compute is not deleted. The DataDrift schedule pipeline is disabled and archived."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"monitor.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
" * See [our documentation](https://aka.ms/datadrift) or [Python SDK reference](https://docs.microsoft.com/python/api/overview/azure/ml/intro)\n",
" * [Send requests or feedback](mailto:driftfeedback@microsoft.com) on data drift directly to the team\n",
" * Please open issues with data drift here on GitHub or on StackOverflow if others are likely to run into the same issue"
]
}
],
"metadata": {
"authors": [
{
"name": "jamgan"
}
],
"category": "tutorial",
"compute": [
"Remote"
],
"datasets": [
"NOAA"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"Azure ML"
],
"friendly_name": "Data drift quickdemo",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.8 - AzureML",
"language": "python",
"name": "python38-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
},
"star_tag": [
"featured"
],
"tags": [
"Dataset",
"Timeseries",
"Drift"
],
"task": "Filtering"
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,30 +0,0 @@
# import packages
import os
import pandas as pd
from calendar import monthrange
from datetime import datetime, timedelta
from azureml.core import Dataset, Datastore, Workspace
from azureml.opendatasets import NoaaIsdWeather
# get workspace and datastore
ws = Workspace.from_config()
dstore = ws.get_default_datastore()
# adjust parameters as needed
target_years = list(range(2010, 2020))
start_month = 1
# get data
for year in target_years:
for month in range(start_month, 12 + 1):
path = 'weather-data/{}/{:02d}/'.format(year, month)
try:
start = datetime(year, month, 1)
end = datetime(year, month, monthrange(year, month)[1]) + timedelta(days=1)
isd = NoaaIsdWeather(start, end).to_pandas_dataframe()
isd = isd[isd['stationName'].str.contains('FLORIDA', regex=True, na=False)]
os.makedirs(path, exist_ok=True)
isd.to_parquet(path + 'data.parquet')
except Exception as e:
print('Month {} in year {} likely has no data.\n'.format(month, year))
print('Exception: {}'.format(e))

Binary file not shown.

Before

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.7 MiB

Some files were not shown because too many files have changed in this diff Show More