update samples from Release-132 as a part of 1.0.48 SDK release

This commit is contained in:
vizhur
2019-07-09 22:02:57 +00:00
parent 9e0fc4f0e7
commit 475ea36106
195 changed files with 31305 additions and 4675 deletions

View File

@@ -0,0 +1,171 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/open-save-dataflows.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Opening and Saving Dataflows\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you have built a Dataflow, you can save it to a `.dprep` file. This persists all of the information in your Dataflow including steps you've added, examples and programs from by-example steps, computed aggregations, etc.\n",
"\n",
"You can also open `.dprep` files to access any Dataflows you have previously persisted."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Open\n",
"\n",
"Use the `open()` method of the Dataflow class to load existing `.dprep` files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"dflow_path = os.path.join(os.getcwd(), '..', 'data', 'crime.dprep')\n",
"print(dflow_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.dataprep import Dataflow"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow = Dataflow.open(dflow_path)\n",
"head = dflow.head(5)\n",
"head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Edit\n",
"\n",
"After a Dataflow is loaded, it can be further edited as needed. In this example, a filter is added."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.dataprep import col"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow = dflow.filter(col('Description') != 'SIMPLE')\n",
"head = dflow.head(5)\n",
"head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save\n",
"\n",
"Use the `save()` method of the Dataflow class to write out the `.dprep` file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tempfile\n",
"temp_dir = tempfile._get_default_tempdir()\n",
"temp_file_name = next(tempfile._get_candidate_names())\n",
"temp_dflow_path = os.path.join(temp_dir, temp_file_name + '.dprep')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow.save(temp_dflow_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Round-trip\n",
"\n",
"This illustrates the ability to load the edited Dataflow back in and use it, in this case to get a pandas DataFrame."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow_to_open = Dataflow.open(temp_dflow_path)\n",
"df = dflow_to_open.to_pandas_dataframe()\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if os.path.isfile(temp_dflow_path):\n",
" os.remove(temp_dflow_path)"
]
}
],
"metadata": {
"authors": [
{
"name": "sihhu"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
},
"nbformat": 4,
"nbformat_minor": 2
}