update samples from Release-132 as a part of 1.0.48 SDK release

2025-12-21 10:05:09 -05:00 · 2019-07-09 22:02:57 +00:00
parent 9e0fc4f0e7
commit 475ea36106
195 changed files with 31305 additions and 4675 deletions
--- a/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb
+++ b/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb
@@ -0,0 +1,171 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/open-save-dataflows.png)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Opening and Saving Dataflows\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Once you have built a Dataflow, you can save it to a `.dprep` file. This persists all of the information in your Dataflow including steps you've added, examples and programs from by-example steps, computed aggregations, etc.\n",
+        "\n",
+        "You can also open `.dprep` files to access any Dataflows you have previously persisted."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Open\n",
+        "\n",
+        "Use the `open()` method of the Dataflow class to load existing `.dprep` files."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "dflow_path = os.path.join(os.getcwd(), '..', 'data', 'crime.dprep')\n",
+        "print(dflow_path)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from azureml.dataprep import Dataflow"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "dflow = Dataflow.open(dflow_path)\n",
+        "head = dflow.head(5)\n",
+        "head"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Edit\n",
+        "\n",
+        "After a Dataflow is loaded, it can be further edited as needed. In this example, a filter is added."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from azureml.dataprep import col"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "dflow = dflow.filter(col('Description') != 'SIMPLE')\n",
+        "head = dflow.head(5)\n",
+        "head"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Save\n",
+        "\n",
+        "Use the `save()` method of the Dataflow class to write out the `.dprep` file."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import tempfile\n",
+        "temp_dir = tempfile._get_default_tempdir()\n",
+        "temp_file_name = next(tempfile._get_candidate_names())\n",
+        "temp_dflow_path = os.path.join(temp_dir, temp_file_name + '.dprep')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "dflow.save(temp_dflow_path)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Round-trip\n",
+        "\n",
+        "This illustrates the ability to load the edited Dataflow back in and use it, in this case to get a pandas DataFrame."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "dflow_to_open = Dataflow.open(temp_dflow_path)\n",
+        "df = dflow_to_open.to_pandas_dataframe()\n",
+        "df"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "if os.path.isfile(temp_dflow_path):\n",
+        "    os.remove(temp_dflow_path)"
+      ]
+    }
+  ],
+  "metadata": {
+    "authors": [
+      {
+        "name": "sihhu"
+      }
+    ],
+    "kernelspec": {
+      "display_name": "Python 3.6",
+      "language": "python",
+      "name": "python36"
+    },
+    "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}