{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using Synapse Spark Pool as a Compute Target from Azure Machine Learning Remote Run\n",
"1. To use Synapse Spark Pool as a compute target from Experiment Run, [ScriptRunConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig?view=azure-ml-py) is used, the same as other Experiment Runs. This notebook demonstrates how to leverage ScriptRunConfig to submit an experiment run to an attached Synapse Spark cluster.\n",
"2. To use Synapse Spark Pool as a compute target from [Azure Machine Learning Pipeline](https://aka.ms/pl-concept), a [SynapseSparkStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.synapse_spark_step.synapsesparkstep?view=azure-ml-py) is used. This notebook demonstrates how to leverage SynapseSparkStep in Azure Machine Learning Pipeline.\n",
"\n",
"## Before you begin:\n",
"1. **Create an Azure Synapse workspace**, check [this] (https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-workspace) for more information.\n",
"2. **Create Spark Pool in Synapse workspace**: check [this] (https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-apache-spark-pool-portal) for more information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure Machine Learning and Pipeline SDK-specific imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import azureml.core\n",
"from azureml.core import Workspace, Experiment\n",
"from azureml.core import LinkedService, SynapseWorkspaceLinkedServiceConfiguration\n",
"from azureml.core.compute import ComputeTarget, AmlCompute, SynapseCompute\n",
"from azureml.exceptions import ComputeTargetException\n",
"from azureml.data import HDFSOutputDatasetConfig\n",
"from azureml.core.datastore import Datastore\n",
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.pipeline.core import Pipeline\n",
"from azureml.pipeline.steps import PythonScriptStep, SynapseSparkStep\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Link Synapse workspace to AML \n",
"You have to be an \"Owner\" of Synapse workspace resource to perform linking. You can check your role in the Azure resource management portal, if you don't have an \"Owner\" role, you can contact an \"Owner\" to link the workspaces for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Replace with your resource info before running.\n",
"\n",
"synapse_subscription_id=os.getenv(\"SYNAPSE_SUBSCRIPTION_ID\", \"