{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading from and Writing to Datastores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.
\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A datastore is a reference that points to an Azure storage service like a blob container for example. It belongs to a workspace and a workspace can have many datastores.\n", "\n", "A data path points to a path on the underlying Azure storage service the datastore references. For example, given a datastore named `blob` that points to an Azure blob container, a data path can point to `/test/data/titanic.csv` in the blob container." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read data from Datastore" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data Prep supports reading data from a `Datastore` or a `DataPath` or a `DataReference`. \n", "\n", "Passing in a datastore into all the `read_*` methods of Data Prep will result in reading everything in the underlying Azure storage service. To read a specific folder or file in the underlying storage, you have to pass in a data reference." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azureml.core import Workspace, Datastore\n", "from azureml.data.datapath import DataPath\n", "\n", "import azureml.dataprep as dprep" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, get or create a workspace. Feel free to replace `subscription_id`, `resource_group`, and `workspace_name` with other values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subscription_id = '35f16a99-532a-4a47-9e93-00305f6c40f2'\n", "resource_group = 'DataStoreTest'\n", "workspace_name = 'dataprep-centraleuap'\n", "\n", "workspace = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "workspace.datastores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now read a crime data set from the datastore. If you are using your own workspace, the `crime0-10.csv` will not be there by default. You will have to upload the data to the datastore yourself." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "datastore = Datastore(workspace=workspace, name='dataprep_blob')\n", "dflow = dprep.read_csv(path=datastore.path('crime0-10.csv'))\n", "dflow.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also read from an Azure SQL database. To do that, you will first get an Azure SQL database datastore instance and pass it to Data Prep for reading." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "datastore = Datastore(workspace=workspace, name='test_sql')\n", "dflow_sql = dprep.read_sql(data_source=datastore, query='SELECT * FROM team')\n", "dflow_sql.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write data to Datastore" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also write a dataflow to a datastore. The code below will write the file you read in earlier to the folder in the datastore." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dest_datastore = Datastore(workspace, 'dataprep_blob_key')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflow.write_to_csv(directory_path=dest_datastore.path('output/crime0-10')).run_local()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can read all the files in the `dataprep_adls` datastore which references an Azure Data Lake store." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "datastore = Datastore(workspace=workspace, name='dataprep_adls')\n", "dflow_adls = dprep.read_csv(path=DataPath(datastore, path_on_datastore='/input/crime0-10.csv'))\n", "dflow_adls.head(5)" ] } ], "metadata": { "authors": [ { "name": "sihhu" } ], "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }