Files
MachineLearningNotebooks/how-to-use-azureml/work-with-data/dataprep/how-to-guides/external-references.ipynb

118 lines
3.6 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/external-references.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# External References\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition to opening existing Dataflows in code and modifying them, it is also possible to create and persist Dataflows that reference another Dataflow that has been persisted to a .dprep file. In this case, executing this Dataflow will load and execute the referenced Dataflow dynamically, and then execute the steps in the referencing Dataflow."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To demonstrate, we will create a Dataflow that loads and transforms some data. After that, we will persist this Dataflow to disk. To learn more about saving and opening .dprep files, see: [Opening and Saving Dataflows](./open-save-dataflows.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.dataprep as dprep\n",
"import tempfile\n",
"import os\n",
"\n",
"dflow = dprep.auto_read_file('../data/crime.txt')\n",
"dflow = dflow.drop_errors(['Column7', 'Column8', 'Column9'], dprep.ColumnRelationship.ANY)\n",
"dflow_path = os.path.join(tempfile.gettempdir(), 'package.dprep')\n",
"dflow.save(dflow_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have a .dprep file, we can create a new Dataflow that references it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow_new = dprep.Dataflow.reference(dprep.ExternalReference(dflow_path))\n",
"dflow_new.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When executed, the new Dataflow returns the same results as the one we saved to the .dprep file. Since this reference is resolved on execution, updating the referenced Dataflow results in the changes being visible when re-executing the referencing Dataflow."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow = dflow.take(5)\n",
"dflow.save(dflow_path)\n",
"\n",
"dflow_new.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, even though we did not modify `dflow_new`, it now returns only 5 records, as the referenced Dataflow was updated with the result from `dflow.take(5)`."
]
}
],
"metadata": {
"authors": [
{
"name": "sihhu"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
},
"nbformat": 4,
"nbformat_minor": 2
}