{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Opening and Saving Dataflows\n", "Copyright (c) Microsoft Corporation. All rights reserved.
\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have built a Dataflow, you can save it to a `.dprep` file. This persists all of the information in your Dataflow including steps you've added, examples and programs from by-example steps, computed aggregations, etc.\n", "\n", "You can also open `.dprep` files to access any Dataflows you have previously persisted." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Open\n", "\n", "Use the `open()` method of the Dataflow class to load existing `.dprep` files." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "dflow_path = os.path.join(os.getcwd(), '..', 'data', 'crime.dprep')\n", "print(dflow_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.dataprep import Dataflow" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflow = Dataflow.open(dflow_path)\n", "head = dflow.head(5)\n", "head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Edit\n", "\n", "After a Dataflow is loaded, it can be further edited as needed. In this example, a filter is added." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.dataprep import col" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflow = dflow.filter(col('Description') != 'SIMPLE')\n", "head = dflow.head(5)\n", "head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save\n", "\n", "Use the `save()` method of the Dataflow class to write out the `.dprep` file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tempfile\n", "temp_dir = tempfile._get_default_tempdir()\n", "temp_file_name = next(tempfile._get_candidate_names())\n", "temp_dflow_path = os.path.join(temp_dir, temp_file_name + '.dprep')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflow.save(temp_dflow_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Round-trip\n", "\n", "This illustrates the ability to load the edited Dataflow back in and use it, in this case to get a pandas DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflow_to_open = Dataflow.open(temp_dflow_path)\n", "df = dflow_to_open.to_pandas_dataframe()\n", "df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if os.path.isfile(temp_dflow_path):\n", " os.remove(temp_dflow_path)" ] } ], "metadata": { "authors": [ { "name": "sihhu" } ], "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" } }, "nbformat": 4, "nbformat_minor": 2 }