Update notebooks

2025-12-21 10:05:09 -05:00 · 2018-10-12 14:39:33 -04:00
parent 216aa8b6a1
commit a4792d95ac
67 changed files with 6470 additions and 1610 deletions
--- a/automl/06.auto-ml-sparse-data-custom-cv-split.ipynb
+++ b/automl/06.auto-ml-sparse-data-custom-cv-split.ipynb
@@ -137,17 +137,17 @@
        "                                shuffle = True, random_state = 42,\n",
        "                                remove = remove)\n",
        "\n",
-        "X_train, X_validation, y_train, y_validation = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n",
+        "X_train, X_valid, y_train, y_valid = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n",
        "\n",
        "\n",
        "vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
        "                               n_features = 2**16)\n",
        "X_train = vectorizer.transform(X_train)\n",
-        "X_validation = vectorizer.transform(X_validation)\n",
+        "X_valid = vectorizer.transform(X_valid)\n",
        "\n",
        "summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n",
        "summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
-        "summary_df['Validation Set'] = [X_validation.shape[0], X_validation.shape[1]]\n",
+        "summary_df['Validation Set'] = [X_valid.shape[0], X_valid.shape[1]]\n",
        "summary_df"
      ]
    },
@@ -188,8 +188,8 @@
        "                             verbosity = logging.INFO,\n",
        "                             X = X_train, \n",
        "                             y = y_train,\n",
-        "                             X_valid = X_validation, \n",
-        "                             y_valid = y_validation, \n",
+        "                             X_valid = X_valid, \n",
+        "                             y_valid = y_valid, \n",
        "                             path = project_folder)"
      ]
    },
@@ -197,7 +197,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "## Train the Model\n",
+        "## Train the Models\n",
        "\n",
        "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
        "In this example, we specify `show_output = True` to print currently running iterations to the console."
@@ -266,20 +266,13 @@
        "rundata"
      ]
    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": []
-    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieve the Best Model\n",
        "\n",
-        "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
+        "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing.  Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
      ]
    },
    {
@@ -331,26 +324,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "### Register the Fitted Model for Deployment"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "description = 'AutoML Model'\n",
-        "tags = None\n",
-        "local_run.register_model(description = description, tags = tags)\n",
-        "local_run.model_id # Use this id to deploy the model as a web service in Azure."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Testing the Fitted Model"
+        "### Testing the Best Fitted Model"
      ]
    },
    {
@@ -360,25 +334,12 @@
      "outputs": [],
      "source": [
        "# Load test data.\n",
-        "import sklearn\n",
        "from pandas_ml import ConfusionMatrix\n",
        "\n",
-        "remove = ('headers', 'footers', 'quotes')\n",
-        "categories = [\n",
-        "    'alt.atheism',\n",
-        "    'talk.religion.misc',\n",
-        "    'comp.graphics',\n",
-        "    'sci.space',\n",
-        "]\n",
-        "\n",
-        "\n",
        "data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
        "                               shuffle = True, random_state = 42,\n",
        "                               remove = remove)\n",
        "\n",
-        "vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
-        "                               n_features = 2**16)\n",
-        "\n",
        "X_test = vectorizer.transform(data_test.data)\n",
        "y_test = data_test.target\n",
        "\n",
@@ -395,6 +356,11 @@
    }
  ],
  "metadata": {
+    "authors": [
+      {
+        "name": "savitam"
+      }
+    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",