mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
656 lines
23 KiB
Plaintext
656 lines
23 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||
"Licensed under the MIT License."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Using Databricks as a Compute Target from Azure Machine Learning Pipeline\n",
|
||
"To use Databricks as a compute target from [Azure Machine Learning Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines), a [DatabricksStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py) is used. This notebook demonstrates the use of DatabricksStep in Azure Machine Learning Pipeline.\n",
|
||
"\n",
|
||
"The notebook will show:\n",
|
||
"1. Running an arbitrary Databricks notebook that the customer has in Databricks workspace\n",
|
||
"2. Running an arbitrary Python script that the customer has in DBFS\n",
|
||
"3. Running an arbitrary Python script that is available on local computer (will upload to DBFS, and then run in Databricks) \n",
|
||
"4. Running a JAR job that the customer has in DBFS.\n",
|
||
"\n",
|
||
"## Before you begin:\n",
|
||
"\n",
|
||
"1. **Create an Azure Databricks workspace** in the same subscription where you have your Azure Machine Learning workspace. You will need details of this workspace later on to define DatabricksStep. [Click here](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.Databricks%2Fworkspaces) for more information.\n",
|
||
"2. **Create PAT (access token)**: Manually create a Databricks access token at the Azure Databricks portal. See [this](https://docs.databricks.com/api/latest/authentication.html#generate-a-token) for more information.\n",
|
||
"3. **Add demo notebook to ADB**: This notebook has a sample you can use as is. Launch Azure Databricks attached to your Azure Machine Learning workspace and add a new notebook. \n",
|
||
"4. **Create/attach a Blob storage** for use from ADB"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Add demo notebook to ADB Workspace\n",
|
||
"Copy and paste the below code to create a new notebook in your ADB workspace."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"```python\n",
|
||
"# direct access\n",
|
||
"dbutils.widgets.get(\"myparam\")\n",
|
||
"p = getArgument(\"myparam\")\n",
|
||
"print (\"Param -\\'myparam':\")\n",
|
||
"print (p)\n",
|
||
"\n",
|
||
"dbutils.widgets.get(\"input\")\n",
|
||
"i = getArgument(\"input\")\n",
|
||
"print (\"Param -\\'input':\")\n",
|
||
"print (i)\n",
|
||
"\n",
|
||
"dbutils.widgets.get(\"output\")\n",
|
||
"o = getArgument(\"output\")\n",
|
||
"print (\"Param -\\'output':\")\n",
|
||
"print (o)\n",
|
||
"\n",
|
||
"n = i + \"/testdata.txt\"\n",
|
||
"df = spark.read.csv(n)\n",
|
||
"\n",
|
||
"display (df)\n",
|
||
"\n",
|
||
"data = [('value1', 'value2')]\n",
|
||
"df2 = spark.createDataFrame(data)\n",
|
||
"\n",
|
||
"z = o + \"/output.txt\"\n",
|
||
"df2.write.csv(z)\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Azure Machine Learning and Pipeline SDK-specific imports"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"import azureml.core\n",
|
||
"from azureml.core.runconfig import JarLibrary\n",
|
||
"from azureml.core.compute import ComputeTarget, DatabricksCompute\n",
|
||
"from azureml.exceptions import ComputeTargetException\n",
|
||
"from azureml.core import Workspace, Run, Experiment\n",
|
||
"from azureml.pipeline.core import Pipeline, PipelineData\n",
|
||
"from azureml.pipeline.steps import DatabricksStep\n",
|
||
"from azureml.core.datastore import Datastore\n",
|
||
"from azureml.data.data_reference import DataReference\n",
|
||
"\n",
|
||
"# Check core SDK version number\n",
|
||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Initialize Workspace\n",
|
||
"\n",
|
||
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ws = Workspace.from_config()\n",
|
||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Attach Databricks compute target\n",
|
||
"Next, you need to add your Databricks workspace to Azure Machine Learning as a compute target and give it a name. You will use this name to refer to your Databricks workspace compute target inside Azure Machine Learning.\n",
|
||
"\n",
|
||
"- **Resource Group** - The resource group name of your Azure Machine Learning workspace\n",
|
||
"- **Databricks Workspace Name** - The workspace name of your Azure Databricks workspace\n",
|
||
"- **Databricks Access Token** - The access token you created in ADB"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Replace with your account info before running.\n",
|
||
" \n",
|
||
"db_compute_name=os.getenv(\"DATABRICKS_COMPUTE_NAME\", \"<my-databricks-compute-name>\") # Databricks compute name\n",
|
||
"db_resource_group=os.getenv(\"DATABRICKS_RESOURCE_GROUP\", \"<my-db-resource-group>\") # Databricks resource group\n",
|
||
"db_workspace_name=os.getenv(\"DATABRICKS_WORKSPACE_NAME\", \"<my-db-workspace-name>\") # Databricks workspace name\n",
|
||
"db_access_token=os.getenv(\"DATABRICKS_ACCESS_TOKEN\", \"<my-access-token>\") # Databricks access token\n",
|
||
" \n",
|
||
"try:\n",
|
||
" databricks_compute = ComputeTarget(workspace=ws, name=db_compute_name)\n",
|
||
" print('Compute target {} already exists'.format(db_compute_name))\n",
|
||
"except ComputeTargetException:\n",
|
||
" print('Compute not found, will use below parameters to attach new one')\n",
|
||
" print('db_compute_name {}'.format(db_compute_name))\n",
|
||
" print('db_resource_group {}'.format(db_resource_group))\n",
|
||
" print('db_workspace_name {}'.format(db_workspace_name))\n",
|
||
" print('db_access_token {}'.format(db_access_token))\n",
|
||
" \n",
|
||
" config = DatabricksCompute.attach_configuration(\n",
|
||
" resource_group = db_resource_group,\n",
|
||
" workspace_name = db_workspace_name,\n",
|
||
" access_token= db_access_token)\n",
|
||
" databricks_compute=ComputeTarget.attach(ws, db_compute_name, config)\n",
|
||
" databricks_compute.wait_for_completion(True)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Data Connections with Inputs and Outputs\n",
|
||
"The DatabricksStep supports Azure Bloband ADLS for inputs and outputs. You also will need to define a [Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html) scope to enable authentication to external data sources such as Blob and ADLS from Databricks.\n",
|
||
"\n",
|
||
"- Databricks documentation on [Azure Blob](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html)\n",
|
||
"- Databricks documentation on [ADLS](https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html)\n",
|
||
"\n",
|
||
"### Type of Data Access\n",
|
||
"Databricks allows to interact with Azure Blob and ADLS in two ways.\n",
|
||
"- **Direct Access**: Databricks allows you to interact with Azure Blob or ADLS URIs directly. The input or output URIs will be mapped to a Databricks widget param in the Databricks notebook.\n",
|
||
"- **Mouting**: You will be supplied with additional parameters and secrets that will enable you to mount your ADLS or Azure Blob input or output location in your Databricks notebook."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Direct Access: Python sample code\n",
|
||
"If you have a data reference named \"input\" it will represent the URI of the input and you can access it directly in the Databricks python notebook like so:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"```python\n",
|
||
"dbutils.widgets.get(\"input\")\n",
|
||
"y = getArgument(\"input\")\n",
|
||
"df = spark.read.csv(y)\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Mounting: Python sample code for Azure Blob\n",
|
||
"Given an Azure Blob data reference named \"input\" the following widget params will be made available in the Databricks notebook:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"```python\n",
|
||
"# This contains the input URI\n",
|
||
"dbutils.widgets.get(\"input\")\n",
|
||
"myinput_uri = getArgument(\"input\")\n",
|
||
"\n",
|
||
"# How to get the input datastore name inside ADB notebook\n",
|
||
"# This contains the name of a Databricks secret (in the predefined \"amlscope\" secret scope) \n",
|
||
"# that contians an access key or sas for the Azure Blob input (this name is obtained by appending \n",
|
||
"# the name of the input with \"_blob_secretname\". \n",
|
||
"dbutils.widgets.get(\"input_blob_secretname\") \n",
|
||
"myinput_blob_secretname = getArgument(\"input_blob_secretname\")\n",
|
||
"\n",
|
||
"# This contains the required configuration for mounting\n",
|
||
"dbutils.widgets.get(\"input_blob_config\")\n",
|
||
"myinput_blob_config = getArgument(\"input_blob_config\")\n",
|
||
"\n",
|
||
"# Usage\n",
|
||
"dbutils.fs.mount(\n",
|
||
" source = myinput_uri,\n",
|
||
" mount_point = \"/mnt/input\",\n",
|
||
" extra_configs = {myinput_blob_config:dbutils.secrets.get(scope = \"amlscope\", key = myinput_blob_secretname)})\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Mounting: Python sample code for ADLS\n",
|
||
"Given an ADLS data reference named \"input\" the following widget params will be made available in the Databricks notebook:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"```python\n",
|
||
"# This contains the input URI\n",
|
||
"dbutils.widgets.get(\"input\") \n",
|
||
"myinput_uri = getArgument(\"input\")\n",
|
||
"\n",
|
||
"# This contains the client id for the service principal \n",
|
||
"# that has access to the adls input\n",
|
||
"dbutils.widgets.get(\"input_adls_clientid\") \n",
|
||
"myinput_adls_clientid = getArgument(\"input_adls_clientid\")\n",
|
||
"\n",
|
||
"# This contains the name of a Databricks secret (in the predefined \"amlscope\" secret scope) \n",
|
||
"# that contains the secret for the above mentioned service principal\n",
|
||
"dbutils.widgets.get(\"input_adls_secretname\") \n",
|
||
"myinput_adls_secretname = getArgument(\"input_adls_secretname\")\n",
|
||
"\n",
|
||
"# This contains the refresh url for the mounting configs\n",
|
||
"dbutils.widgets.get(\"input_adls_refresh_url\") \n",
|
||
"myinput_adls_refresh_url = getArgument(\"input_adls_refresh_url\")\n",
|
||
"\n",
|
||
"# Usage \n",
|
||
"configs = {\"dfs.adls.oauth2.access.token.provider.type\": \"ClientCredential\",\n",
|
||
" \"dfs.adls.oauth2.client.id\": myinput_adls_clientid,\n",
|
||
" \"dfs.adls.oauth2.credential\": dbutils.secrets.get(scope = \"amlscope\", key =myinput_adls_secretname),\n",
|
||
" \"dfs.adls.oauth2.refresh.url\": myinput_adls_refresh_url}\n",
|
||
"\n",
|
||
"dbutils.fs.mount(\n",
|
||
" source = myinput_uri,\n",
|
||
" mount_point = \"/mnt/output\",\n",
|
||
" extra_configs = configs)\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Use Databricks from Azure Machine Learning Pipeline\n",
|
||
"To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. Let's define a datasource (via DataReference) and intermediate data (via PipelineData) to be used in DatabricksStep."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Use the default blob storage\n",
|
||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||
"print('Datastore {} will be used'.format(def_blob_store.name))\n",
|
||
"\n",
|
||
"# We are uploading a sample file in the local directory to be used as a datasource\n",
|
||
"def_blob_store.upload_files([\"./testdata.txt\"], target_path=\"dbtest\", overwrite=False)\n",
|
||
"\n",
|
||
"step_1_input = DataReference(datastore=def_blob_store, path_on_datastore=\"dbtest\",\n",
|
||
" data_reference_name=\"input\")\n",
|
||
"\n",
|
||
"step_1_output = PipelineData(\"output\", datastore=def_blob_store)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Add a DatabricksStep\n",
|
||
"Adds a Databricks notebook as a step in a Pipeline.\n",
|
||
"- ***name:** Name of the Module\n",
|
||
"- **inputs:** List of input connections for data consumed by this step. Fetch this inside the notebook using dbutils.widgets.get(\"input\")\n",
|
||
"- **outputs:** List of output port definitions for outputs produced by this step. Fetch this inside the notebook using dbutils.widgets.get(\"output\")\n",
|
||
"- **spark_version:** Version of spark for the databricks run cluster. default value: 4.0.x-scala2.11\n",
|
||
"- **node_type:** Azure vm node types for the databricks run cluster. default value: Standard_D3_v2\n",
|
||
"- **num_workers:** Number of workers for the databricks run cluster\n",
|
||
"- **autoscale:** The autoscale configuration for the databricks run cluster\n",
|
||
"- **spark_env_variables:** Spark environment variables for the databricks run cluster (dictionary of {str:str}). default value: {'PYSPARK_PYTHON': '/databricks/python3/bin/python3'}\n",
|
||
"- ***notebook_path:** Path to the notebook in the databricks instance.\n",
|
||
"- **notebook_params:** Parameters for the databricks notebook (dictionary of {str:str}). Fetch this inside the notebook using dbutils.widgets.get(\"myparam\")\n",
|
||
"- **run_name:** Name in databricks for this run\n",
|
||
"- **timeout_seconds:** Timeout for the databricks run\n",
|
||
"- **maven_libraries:** maven libraries for the databricks run\n",
|
||
"- **pypi_libraries:** pypi libraries for the databricks run\n",
|
||
"- **egg_libraries:** egg libraries for the databricks run\n",
|
||
"- **jar_libraries:** jar libraries for the databricks run\n",
|
||
"- **rcran_libraries:** rcran libraries for the databricks run\n",
|
||
"- **databricks_compute:** Azure Databricks compute\n",
|
||
"- **databricks_compute_name:** Name of Azure Databricks compute\n",
|
||
"\n",
|
||
"\\* *denotes required fields* \n",
|
||
"*You must provide exactly one of num_workers or autoscale paramaters* \n",
|
||
"*You must provide exactly one of databricks_compute or databricks_compute_name parameters*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<a id='notebook_howto'></a>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 1. Running the demo notebook already added to the Databricks workspace\n",
|
||
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable notebook_path when you run the code cell below:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"notebook_path=os.getenv(\"DATABRICKS_NOTEBOOK_PATH\", \"<my-databricks-notebook-path>\") # Databricks notebook path\n",
|
||
"\n",
|
||
"dbNbStep = DatabricksStep(\n",
|
||
" name=\"DBNotebookInWS\",\n",
|
||
" inputs=[step_1_input],\n",
|
||
" outputs=[step_1_output],\n",
|
||
" num_workers=1,\n",
|
||
" notebook_path=notebook_path,\n",
|
||
" notebook_params={'myparam': 'testparam'},\n",
|
||
" run_name='DB_Notebook_demo',\n",
|
||
" compute_target=databricks_compute,\n",
|
||
" allow_reuse=False\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Build and submit the Experiment"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"steps = [dbNbStep]\n",
|
||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||
"pipeline_run = Experiment(ws, 'DB_Notebook_demo').submit(pipeline)\n",
|
||
"pipeline_run.wait_for_completion()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### View Run Details"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from azureml.widgets import RunDetails\n",
|
||
"RunDetails(pipeline_run).show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2. Running a Python script that is already added in DBFS\n",
|
||
"To run a Python script that is already uploaded to DBFS, follow the instructions below. You will first upload the Python script to DBFS using the [CLI](https://docs.azuredatabricks.net/user-guide/dbfs-databricks-file-system.html).\n",
|
||
"\n",
|
||
"The commented out code in the below cell assumes that you have uploaded `train-db-dbfs.py` to the root folder in DBFS. You can upload `train-db-dbfs.py` to the root folder in DBFS using this commandline so you can use `python_script_path = \"dbfs:/train-db-dbfs.py\"`:\n",
|
||
"\n",
|
||
"```\n",
|
||
"dbfs cp ./train-db-dbfs.py dbfs:/train-db-dbfs.py\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"python_script_path = \"dbfs:/train-db-dbfs.py\"\n",
|
||
"\n",
|
||
"dbPythonInDbfsStep = DatabricksStep(\n",
|
||
" name=\"DBPythonInDBFS\",\n",
|
||
" inputs=[step_1_input],\n",
|
||
" num_workers=1,\n",
|
||
" python_script_path=python_script_path,\n",
|
||
" python_script_params={'--input_data'},\n",
|
||
" run_name='DB_Python_demo',\n",
|
||
" compute_target=databricks_compute,\n",
|
||
" allow_reuse=False\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Build and submit the Experiment"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"steps = [dbPythonInDbfsStep]\n",
|
||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||
"pipeline_run = Experiment(ws, 'DB_Python_demo').submit(pipeline)\n",
|
||
"pipeline_run.wait_for_completion()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### View Run Details"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from azureml.widgets import RunDetails\n",
|
||
"RunDetails(pipeline_run).show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 3. Running a Python script in Databricks that currenlty is in local computer\n",
|
||
"To run a Python script that is currently in your local computer, follow the instructions below. \n",
|
||
"\n",
|
||
"The commented out code below code assumes that you have `train-db-local.py` in the `scripts` subdirectory under the current working directory.\n",
|
||
"\n",
|
||
"In this case, the Python script will be uploaded first to DBFS, and then the script will be run in Databricks."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"python_script_name = \"train-db-local.py\"\n",
|
||
"source_directory = \".\"\n",
|
||
"\n",
|
||
"dbPythonInLocalMachineStep = DatabricksStep(\n",
|
||
" name=\"DBPythonInLocalMachine\",\n",
|
||
" inputs=[step_1_input],\n",
|
||
" num_workers=1,\n",
|
||
" python_script_name=python_script_name,\n",
|
||
" source_directory=source_directory,\n",
|
||
" run_name='DB_Python_Local_demo',\n",
|
||
" compute_target=databricks_compute,\n",
|
||
" allow_reuse=False\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Build and submit the Experiment"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"steps = [dbPythonInLocalMachineStep]\n",
|
||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||
"pipeline_run = Experiment(ws, 'DB_Python_Local_demo').submit(pipeline)\n",
|
||
"pipeline_run.wait_for_completion()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### View Run Details"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from azureml.widgets import RunDetails\n",
|
||
"RunDetails(pipeline_run).show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 4. Running a JAR job that is alreay added in DBFS\n",
|
||
"To run a JAR job that is already uploaded to DBFS, follow the instructions below. You will first upload the JAR file to DBFS using the [CLI](https://docs.azuredatabricks.net/user-guide/dbfs-databricks-file-system.html).\n",
|
||
"\n",
|
||
"The commented out code in the below cell assumes that you have uploaded `train-db-dbfs.jar` to the root folder in DBFS. You can upload `train-db-dbfs.jar` to the root folder in DBFS using this commandline so you can use `jar_library_dbfs_path = \"dbfs:/train-db-dbfs.jar\"`:\n",
|
||
"\n",
|
||
"```\n",
|
||
"dbfs cp ./train-db-dbfs.jar dbfs:/train-db-dbfs.jar\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"main_jar_class_name = \"com.microsoft.aeva.Main\"\n",
|
||
"jar_library_dbfs_path = \"dbfs:/train-db-dbfs.jar\"\n",
|
||
"\n",
|
||
"dbJarInDbfsStep = DatabricksStep(\n",
|
||
" name=\"DBJarInDBFS\",\n",
|
||
" inputs=[step_1_input],\n",
|
||
" num_workers=1,\n",
|
||
" main_class_name=main_jar_class_name,\n",
|
||
" jar_params={'arg1', 'arg2'},\n",
|
||
" run_name='DB_JAR_demo',\n",
|
||
" jar_libraries=[JarLibrary(jar_library_dbfs_path)],\n",
|
||
" compute_target=databricks_compute,\n",
|
||
" allow_reuse=False\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Build and submit the Experiment"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"steps = [dbJarInDbfsStep]\n",
|
||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||
"pipeline_run = Experiment(ws, 'DB_JAR_demo').submit(pipeline)\n",
|
||
"pipeline_run.wait_for_completion()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### View Run Details"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from azureml.widgets import RunDetails\n",
|
||
"RunDetails(pipeline_run).show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Next: ADLA as a Compute Target\n",
|
||
"To use ADLA as a compute target from Azure Machine Learning Pipeline, a AdlaStep is used. This [notebook](./aml-pipelines-use-adla-as-compute-target.ipynb) demonstrates the use of AdlaStep in Azure Machine Learning Pipeline."
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"authors": [
|
||
{
|
||
"name": "diray"
|
||
}
|
||
],
|
||
"kernelspec": {
|
||
"display_name": "Python 3.6",
|
||
"language": "python",
|
||
"name": "python36"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.6.7"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|