update samples from Release-53 as a part of 1.19.0 SDK stable release

2025-12-22 02:25:12 -05:00 · 2020-12-07 18:55:07 +00:00
parent 41366a4af0
commit 48e3e7b510
39 changed files with 371 additions and 279 deletions
--- a/contrib/fairness/fairlearn-azureml-mitigation.ipynb
+++ b/contrib/fairness/fairlearn-azureml-mitigation.ipynb
@@ -38,7 +38,7 @@
        "## Introduction\n",
        "This notebook shows how to use [Fairlearn (an open source fairness assessment and unfairness mitigation package)](http://fairlearn.github.io) and Azure Machine Learning Studio for a binary classification problem. This example uses the well-known adult census dataset. For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan. Its purpose is purely illustrative of a workflow including a fairness dashboard - in particular, we do **not** include a full discussion of the detailed issues which arise when considering fairness in machine learning. For such discussions, please [refer to the Fairlearn website](http://fairlearn.github.io/).\n",
        "\n",
-        "We will apply the [grid search algorithm](https://fairlearn.github.io/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
+        "We will apply the [grid search algorithm](https://fairlearn.github.io/master/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
        "\n",
        "### Setup\n",
        "\n",
@@ -98,8 +98,11 @@
      "metadata": {},
      "outputs": [],
      "source": [
-        "from sklearn.datasets import fetch_openml\n",
-        "data = fetch_openml(data_id=1590, as_frame=True)\n",
+        "from utilities import fetch_openml_with_retries\n",
+        "\n",
+        "data = fetch_openml_with_retries(data_id=1590)\n",
+        "    \n",
+        "# Extract the items we want\n",
        "X_raw = data.data\n",
        "Y = (data.target == '>50K') * 1\n",
        "\n",
--- a/contrib/fairness/upload-fairness-dashboard.ipynb
+++ b/contrib/fairness/upload-fairness-dashboard.ipynb
@@ -98,8 +98,11 @@
      "metadata": {},
      "outputs": [],
      "source": [
-        "from sklearn.datasets import fetch_openml\n",
-        "data = fetch_openml(data_id=1590, as_frame=True)\n",
+        "from utilities import fetch_openml_with_retries\n",
+        "\n",
+        "data = fetch_openml_with_retries(data_id=1590)\n",
+        "    \n",
+        "# Extract the items we want\n",
        "X_raw = data.data\n",
        "Y = (data.target == '>50K') * 1"
      ]
--- a/contrib/fairness/utilities.py
+++ b/contrib/fairness/utilities.py
@@ -0,0 +1,28 @@
+# ---------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# ---------------------------------------------------------
+
+"""Utilities for azureml-contrib-fairness notebooks."""
+
+from sklearn.datasets import fetch_openml
+import time
+
+
+def fetch_openml_with_retries(data_id, max_retries=4, retry_delay=60):
+    """Fetch a given dataset from OpenML with retries as specified."""
+    for i in range(max_retries):
+        try:
+            print("Download attempt {0} of {1}".format(i + 1, max_retries))
+            data = fetch_openml(data_id=data_id, as_frame=True)
+            break
+        except Exception as e:
+            print("Download attempt failed with exception:")
+            print(e)
+            if i + 1 != max_retries:
+                print("Will retry after {0} seconds".format(retry_delay))
+                time.sleep(retry_delay)
+                retry_delay = retry_delay * 2
+    else:
+        raise RuntimeError("Unable to download dataset from OpenML")
+
+    return data