MachineLearningNotebooks/how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Opening and Saving Dataflows\n",
        "Copyright (c) Microsoft Corporation. All rights reserved.<br>\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Once you have built a Dataflow, you can save it to a `.dprep` file. This persists all of the information in your Dataflow including steps you've added, examples and programs from by-example steps, computed aggregations, etc.\n",
        "\n",
        "You can also open `.dprep` files to access any Dataflows you have previously persisted."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Open\n",
        "\n",
        "Use the `open()` method of the Dataflow class to load existing `.dprep` files."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "dflow_path = os.path.join(os.getcwd(), '..', 'data', 'crime.dprep')\n",
        "print(dflow_path)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.dataprep import Dataflow"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "dflow = Dataflow.open(dflow_path)\n",
        "head = dflow.head(5)\n",
        "head"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Edit\n",
        "\n",
        "After a Dataflow is loaded, it can be further edited as needed. In this example, a filter is added."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.dataprep import col"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "dflow = dflow.filter(col('Description') != 'SIMPLE')\n",
        "head = dflow.head(5)\n",
        "head"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Save\n",
        "\n",
        "Use the `save()` method of the Dataflow class to write out the `.dprep` file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import tempfile\n",
        "temp_dir = tempfile._get_default_tempdir()\n",
        "temp_file_name = next(tempfile._get_candidate_names())\n",
        "temp_dflow_path = os.path.join(temp_dir, temp_file_name + '.dprep')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "dflow.save(temp_dflow_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Round-trip\n",
        "\n",
        "This illustrates the ability to load the edited Dataflow back in and use it, in this case to get a pandas DataFrame."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "dflow_to_open = Dataflow.open(temp_dflow_path)\n",
        "df = dflow_to_open.to_pandas_dataframe()\n",
        "df"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "if os.path.isfile(temp_dflow_path):\n",
        "    os.remove(temp_dflow_path)"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "sihhu"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}