mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
306 lines
12 KiB
Plaintext
306 lines
12 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# How to use CommandStep in Azure ML Pipelines\n",
|
|
"\n",
|
|
"This notebook shows how to use the CommandStep with Azure Machine Learning Pipelines for running commands in steps. The example shows running distributed TensorFlow training from within a pipeline.\n",
|
|
"\n",
|
|
"\n",
|
|
"## Prerequisite:\n",
|
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
|
|
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](https://aka.ms/pl-config) to:\n",
|
|
" * install the Azure ML SDK\n",
|
|
" * create a workspace and its configuration file (`config.json`)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's get started. First let's import some Python libraries."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import azureml.core\n",
|
|
"# check core SDK version number\n",
|
|
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initialize workspace\n",
|
|
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Workspace\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print('Workspace name: ' + ws.name, \n",
|
|
" 'Azure region: ' + ws.location, \n",
|
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
|
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create or Attach existing AmlCompute\n",
|
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
|
|
"1. create the configuration (this step is local and only takes a second)\n",
|
|
"2. create the cluster (this step will take about **20 seconds**)\n",
|
|
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
|
"\n",
|
|
"# choose a name for your cluster\n",
|
|
"cluster_name = \"gpu-cluster\"\n",
|
|
"\n",
|
|
"try:\n",
|
|
" gpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
|
" print('Found existing compute target')\n",
|
|
"except ComputeTargetException:\n",
|
|
" print('Creating a new compute target...')\n",
|
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n",
|
|
"\n",
|
|
" # create the cluster\n",
|
|
" gpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
|
"\n",
|
|
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
|
" # if no min node count is provided it uses the scale settings for the cluster\n",
|
|
" gpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
|
"\n",
|
|
"# use get_status() to get a detailed status for the current cluster. \n",
|
|
"print(gpu_cluster.get_status().serialize())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpu-cluster' of type `AmlCompute`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create a CommandStep\n",
|
|
"CommandStep adds a step to run a command in a Pipeline. For the full set of configurable options see the CommandStep [reference docs](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.commandstep?view=azure-ml-py).\n",
|
|
"\n",
|
|
"- **name:** Name of the step\n",
|
|
"- **runconfig:** ScriptRunConfig object. You can configure a ScriptRunConfig object as you would for a standalone non-pipeline run and pass it in to this parameter. If using this option, you do not have to specify the `command`, `source_directory`, `compute_target` parameters of the CommandStep constructor as they are already defined in your ScriptRunConfig.\n",
|
|
"- **runconfig_pipeline_params:** Override runconfig properties at runtime using key-value pairs each with name of the runconfig property and PipelineParameter for that property\n",
|
|
"- **command:** The command to run or path of the executable/script relative to `source_directory`. It is required unless the `runconfig` parameter is specified. It can be specified with string arguments in a single string or with input/output/PipelineParameter in a list.\n",
|
|
"- **source_directory:** A folder containing the script and other resources used in the step.\n",
|
|
"- **compute_target:** Compute target to use \n",
|
|
"- **allow_reuse:** Whether the step should reuse previous results when run with the same settings/inputs. If this is false, a new run will always be generated for this step during pipeline execution.\n",
|
|
"- **version:** Optional version tag to denote a change in functionality for the step\n",
|
|
"\n",
|
|
"> The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"First define the environment that you want to step to run in. This example users a curated TensorFlow environment, but in practice you can configure any environment you want."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"\n",
|
|
"tf_env = Environment.get(ws, name='AzureML-TensorFlow-2.3-GPU')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This example will first create a ScriptRunConfig object that configures the training job. Since we are running a distributed job, specify the `distributed_job_config` parameter. If you are just running a single-node job, omit that parameter.\n",
|
|
"\n",
|
|
"> If you have an input dataset you want to use in this step, you can specify that as part of the command. For example, if you have a FileDataset object called `dataset` and a `--data-dir` script argument, you can do the following: `command=['python train.py --epochs 30 --data-dir', dataset.as_mount()]`.\n",
|
|
"\n",
|
|
"> For detailed guidance on how to move data in pipelines for input and output data, see the documentation [Moving data into and between ML pipelines](https://docs.microsoft.com/azure/machine-learning/how-to-move-data-in-out-of-pipelines)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import ScriptRunConfig\n",
|
|
"from azureml.core.runconfig import MpiConfiguration\n",
|
|
"\n",
|
|
"src_dir = 'commandstep_train'\n",
|
|
"distr_config = MpiConfiguration(node_count=2) # you can also specify the process_count_per_node parameter for multi-process-per-node training\n",
|
|
"\n",
|
|
"src = ScriptRunConfig(source_directory=src_dir,\n",
|
|
" command=['python train.py --epochs 30'],\n",
|
|
" compute_target=gpu_cluster,\n",
|
|
" environment=tf_env,\n",
|
|
" distributed_job_config=distr_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now create a CommandStep and pass in the ScriptRunConfig object to the `runconfig` parameter."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": [
|
|
"estimatorstep-remarks-sample"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.pipeline.steps import CommandStep\n",
|
|
"\n",
|
|
"train = CommandStep(name='train-mnist', runconfig=src)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Build and Submit the Pipeline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.pipeline.core import Pipeline\n",
|
|
"from azureml.core import Experiment\n",
|
|
"\n",
|
|
"pipeline = Pipeline(workspace=ws, steps=[train])\n",
|
|
"pipeline_run = Experiment(ws, 'train-commandstep-pipeline').submit(pipeline)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## View Run Details"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"RunDetails(pipeline_run).show()"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "sanpil"
|
|
}
|
|
],
|
|
"category": "tutorial",
|
|
"compute": [
|
|
"AML Compute"
|
|
],
|
|
"datasets": [
|
|
"Custom"
|
|
],
|
|
"deployment": [
|
|
"None"
|
|
],
|
|
"exclude_from_index": false,
|
|
"framework": [
|
|
"Azure ML"
|
|
],
|
|
"friendly_name": "Azure Machine Learning Pipeline with CommandStep",
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.7"
|
|
},
|
|
"order_index": 7,
|
|
"star_tag": [
|
|
"None"
|
|
],
|
|
"tags": [
|
|
"None"
|
|
],
|
|
"task": "Demonstrates the use of CommandStep"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |