Update notebooks

2025-12-20 17:45:10 -05:00 · 2018-10-12 14:39:33 -04:00
parent 216aa8b6a1
commit a4792d95ac
67 changed files with 6470 additions and 1610 deletions
--- a/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb
+++ b/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb
@@ -15,7 +15,7 @@
      "source": [
        "# AutoML 08: Remote Execution with DataStore\n",
        "\n",
-        "In this sample accesses a data file on a remote DSVM through DataStore. Advantagets of using data store\n",
+        "This sample accesses a data file on a remote DSVM through DataStore. Advantages of using data store are:\n",
        "1. DataStore secures the access details.\n",
        "2. DataStore supports read, write to blob and file store\n",
        "3. AutoML natively supports copying data from DataStore to DSVM\n",
@@ -23,8 +23,8 @@
        "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n",
        "\n",
        "In this notebook you would see\n",
-        "1. Configuring the DSVM to allow files to be access directly by the get_data method.\n",
-        "2. get_data returning data from a local file.\n",
+        "1. Storing data in DataStore.\n",
+        "2. get_data returning data from DataStore.\n",
        "\n"
      ]
    },
@@ -285,11 +285,11 @@
        "    le = LabelEncoder()\n",
        "    le.fit(df[\"Label\"].values)\n",
        "    y = le.transform(df[\"Label\"].values)\n",
-        "    df = df.drop([\"Label\"], axis=1)\n",
+        "    X = df.drop([\"Label\"], axis=1)\n",
        "\n",
-        "    df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)\n",
+        "    X_train, _, y_train, _ = train_test_split(X, y, test_size=0.1, random_state=42)\n",
        "\n",
-        "    return { \"X\" : df.values, \"y\" : y }"
+        "    return { \"X\" : X_train.values, \"y\" : y_train }"
      ]
    },
    {
@@ -300,7 +300,7 @@
        "\n",
        "You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n",
        "\n",
-        "<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.</i>\n",
+        "<i>Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to AutoMLConfig.</i>\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
@@ -342,7 +342,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "## Training the Model <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
+        "## Training the Models <a class=\"anchor\" id=\"Training-the-model-Remote-DSVM\"></a>\n",
        "\n",
        "For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run."
      ]
@@ -410,7 +410,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "## Canceling runs\n",
+        "## Canceling Runs\n",
        "You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions"
      ]
    },
@@ -433,7 +433,7 @@
      "source": [
        "### Retrieve the Best Model\n",
        "\n",
-        "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
+        "Below we select the best pipeline from our iterations. The *get_output* method returns the best run and the fitted model. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*."
      ]
    },
    {
@@ -483,26 +483,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "### Register fitted model for deployment"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#description = 'AutoML Model'\n",
-        "#tags = None\n",
-        "#remote_run.register_model(description=description, tags=tags)\n",
-        "#remote_run.model_id # Use this id to deploy the model as a web service in Azure"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Testing the Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n"
+        "### Testing the Best Fitted Model <a class=\"anchor\" id=\"Testing-the-Fitted-Model-Remote-DSVM\"></a>\n"
      ]
    },
    {
@@ -523,11 +504,11 @@
        "le = LabelEncoder()\n",
        "le.fit(df[\"Label\"].values)\n",
        "y = le.transform(df[\"Label\"].values)\n",
-        "df = df.drop([\"Label\"], axis=1)\n",
+        "X = df.drop([\"Label\"], axis=1)\n",
        "\n",
-        "_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n",
+        "_, X_test, _, y_test = train_test_split(X, y, test_size=0.1, random_state=42)\n",
        "\n",
-        "ypred = fitted_model.predict(df_test.values)\n",
+        "ypred = fitted_model.predict(X_test.values)\n",
        "\n",
        "ypred_strings = le.inverse_transform(ypred)\n",
        "ytest_strings = le.inverse_transform(y_test)\n",
@@ -541,6 +522,11 @@
    }
  ],
  "metadata": {
+    "authors": [
+      {
+        "name": "savitam"
+      }
+    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",