update cluster creation

This commit is contained in:
Roope Astala
2019-06-13 12:14:58 -04:00
parent daf27a76e4
commit de162316d7
29 changed files with 682 additions and 204 deletions

View File

@@ -39,6 +39,7 @@
" 1. Workspace parameters\n", " 1. Workspace parameters\n",
" 1. Access your workspace\n", " 1. Access your workspace\n",
" 1. Create a new workspace\n", " 1. Create a new workspace\n",
" 1. Create compute resources\n",
"1. [Next steps](#Next%20steps)\n", "1. [Next steps](#Next%20steps)\n",
"\n", "\n",
"---\n", "---\n",
@@ -241,6 +242,97 @@
"ws.write_config()" "ws.write_config()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create compute resources for your training experiments\n",
"\n",
"Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
"\n",
"To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
"\n",
"The cluster parameters are:\n",
"* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n",
"\n",
"```shell\n",
"az vm list-skus -o tsv\n",
"```\n",
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while note in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
"\n",
"\n",
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print(\"Found existing cpu-cluster\")\n",
"except ComputeTargetException:\n",
" print(\"Creating new cpu-cluster\")\n",
" \n",
" # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
" min_nodes=0,\n",
" max_nodes=4)\n",
"\n",
" # Create the cluster with the specified name and configuration\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
" \n",
" # Wait for the cluster to complete, show the output log\n",
" cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your GPU cluster\n",
"gpu_cluster_name = \"gpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
" print(\"Found existing gpu cluster\")\n",
"except ComputeTargetException:\n",
" print(\"Creating new gpu-cluster\")\n",
" \n",
" # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
" min_nodes=0,\n",
" max_nodes=4)\n",
" # Create the cluster with the specified name and configuration\n",
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
"\n",
" # Wait for the cluster to complete, show the output log\n",
" gpu_cluster.wait_for_completion(show_output=True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -205,7 +205,7 @@
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute import ComputeTarget\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpucluster\"\n", "amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"found = False\n", "found = False\n",
"\n", "\n",

View File

@@ -119,7 +119,9 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create or Attach existing AmlCompute\n", "### Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create an AmlCompute as your training compute resource.\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"\n", "\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
] ]
@@ -134,12 +136,10 @@
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute import ComputeTarget\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpucluster\"\n", "amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"found = False\n", "found = False\n",
"\n",
"# Check if this compute target already exists in the workspace.\n", "# Check if this compute target already exists in the workspace.\n",
"\n",
"cts = ws.compute_targets\n", "cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n", " found = True\n",

View File

@@ -98,7 +98,7 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# choose a name for your cluster\n", "# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n", "cluster_name = \"gpu-cluster\"\n",
"\n", "\n",
"try:\n", "try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",

View File

@@ -206,8 +206,15 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Retrieve default Azure Machine Learning compute\n", "#### Retrieve or create a Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target." "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
"\n",
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. We will create an Azure Machine Learning Compute containing **STANDARD_D2_V2 CPU VMs**. This process is broken down into the following steps:\n",
"\n",
"1. Create the configuration\n",
"2. Create the Azure Machine Learning compute\n",
"\n",
"**This process will take about 3 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**"
] ]
}, },
{ {
@@ -216,7 +223,23 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"aml_compute = ws.get_default_compute_target(\"CPU\")" "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new compute target\")\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
"print(\"Azure Machine Learning Compute attached\")\n"
] ]
}, },
{ {

View File

@@ -113,7 +113,25 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"batch_compute = ws.get_default_compute_target(\"CPU\")" "batch_compute_name = 'mybatchcompute' # Name to associate with new compute in workspace\n",
"\n",
"# Batch account details needed to attach as compute to workspace\n",
"batch_account_name = \"<batch_account_name>\" # Name of the Batch account\n",
"batch_resource_group = \"<batch_resource_group>\" # Name of the resource group which contains this account\n",
"\n",
"try:\n",
" # check if already attached\n",
" batch_compute = BatchCompute(ws, batch_compute_name)\n",
"except ComputeTargetException:\n",
" print('Attaching Batch compute...')\n",
" provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, \n",
" account_name=batch_account_name)\n",
" batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n",
" batch_compute.wait_for_completion()\n",
" print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n",
" print(\"Provisioning errors:{}\".format(batch_compute.provisioning_errors))\n",
"\n",
"print(\"Using Batch compute:{}\".format(batch_compute.cluster_resource_id))"
] ]
}, },
{ {

View File

@@ -76,8 +76,18 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or Attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource." "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
] ]
}, },
{ {
@@ -86,7 +96,25 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"cpu_cluster = ws.get_default_compute_target(\"CPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"cpu-cluster\"\n",
"\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n",
"\n",
" # create the cluster\n",
" cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(cpu_cluster.get_status().serialize())" "print(cpu_cluster.get_status().serialize())"
@@ -96,7 +124,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpucluster' of type `AmlCompute`." "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpu-cluster' of type `AmlCompute`."
] ]
}, },
{ {

View File

@@ -184,37 +184,36 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Retrieve or create a Azure Machine Learning compute\n", "## Retrieve or create a Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads.\n", "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
"Let's check the available computes first."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cts = ws.compute_targets\n",
"for name, ct in cts.items():\n",
" print(name, ct.type, ct.provisioning_state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"compute_target = ws.get_default_compute_target(\"GPU\")\n",
"\n", "\n",
"print(compute_target.get_status().serialize())" "If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n",
"\n",
"1. Create the configuration\n",
"2. Create the Azure Machine Learning compute\n",
"\n",
"**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target {}.'.format(cluster_name))\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
" max_nodes=4)\n",
"\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
" compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)\n",
"\n",
"print(\"Azure Machine Learning Compute attached\")"
] ]
}, },
{ {

View File

@@ -79,7 +79,20 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"aml_compute = ws.get_default_compute_target(\"CPU\")" "from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new compute target\")\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n"
] ]
}, },
{ {

View File

@@ -54,7 +54,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Compute Targets\n", "### Compute Targets\n",
"#### Retrieve the default Azure Machine Learning Compute" "#### Retrieve an already attached Azure Machine Learning Compute"
] ]
}, },
{ {
@@ -63,7 +63,31 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"aml_compute_target = ws.get_default_compute_target(\"CPU\")" "from azureml.core import Run, Experiment, Datastore\n",
"\n",
"from azureml.widgets import RunDetails\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"Found existing compute target: {}\".format(aml_compute_target))\n",
"except:\n",
" print(\"Creating new compute target: {}\".format(aml_compute_target))\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)"
] ]
}, },
{ {

View File

@@ -85,10 +85,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core import Run, Experiment, Datastore\n",
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"from azureml.pipeline.steps import PythonScriptStep\n", "from azureml.pipeline.steps import PythonScriptStep\n",
"from azureml.pipeline.core import Pipeline\n", "from azureml.pipeline.core import Pipeline\n",
"\n", "\n",
"aml_compute = ws.get_default_compute_target(\"CPU\")\n", "#Retrieve an already attached Azure Machine Learning Compute\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"Found existing compute target: {}\".format(aml_compute_target))\n",
"except:\n",
" print(\"Creating new compute target: {}\".format(aml_compute_target))\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n", "\n",
"# source_directory\n", "# source_directory\n",
"source_directory = '.'\n", "source_directory = '.'\n",

View File

@@ -139,7 +139,31 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(\"CPU\")" "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
" \n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 4)\n",
"\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = 1, timeout_in_minutes = 10)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {

View File

@@ -137,7 +137,22 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"aml_compute = ws.get_default_compute_target(\"CPU\")\n" "from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new compute target\")\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
"print(\"Aml Compute attached\")\n"
] ]
}, },
{ {
@@ -418,13 +433,31 @@
"RunDetails(pipeline_run1).show()" "RunDetails(pipeline_run1).show()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Wait for pipeline run to complete"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run1.wait_for_completion(show_output=True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### See Outputs\n", "### See Outputs\n",
"\n", "\n",
"See where outputs of each pipeline step are located on your datastore." "See where outputs of each pipeline step are located on your datastore.\n",
"\n",
"***Wait for pipeline run to complete, to make sure all the outputs are ready***"
] ]
}, },
{ {

View File

@@ -162,7 +162,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create and attach Compute targets\n", "### Create and attach Compute targets\n",
"Use the below code to get the default Compute target. " "Use the below code to create and attach Compute targets. "
] ]
}, },
{ {
@@ -171,9 +171,33 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"cluster_type = os.environ.get(\"AML_CLUSTER_TYPE\", \"GPU\")\n", "# choose a name for your cluster\n",
"aml_compute_name = os.environ.get(\"AML_COMPUTE_NAME\", \"gpu-cluster\")\n",
"cluster_min_nodes = os.environ.get(\"AML_COMPUTE_MIN_NODES\", 0)\n",
"cluster_max_nodes = os.environ.get(\"AML_COMPUTE_MAX_NODES\", 1)\n",
"vm_size = os.environ.get(\"AML_COMPUTE_SKU\", \"STANDARD_NC6\")\n",
"\n", "\n",
"compute_target = ws.get_default_compute_target(cluster_type)" "\n",
"if aml_compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[aml_compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('found compute target. just use it. ' + aml_compute_name)\n",
"else:\n",
" print('creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n",
" vm_priority = 'lowpriority', # optional\n",
" min_nodes = cluster_min_nodes, \n",
" max_nodes = cluster_max_nodes)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, aml_compute_name, provisioning_config)\n",
" \n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
" # For a more detailed view of current Azure Machine Learning Compute status, use get_status()\n",
" print(compute_target.get_status().serialize())"
] ]
}, },
{ {

View File

@@ -94,7 +94,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# AmlCompute\n", "# AmlCompute\n",
"cpu_cluster_name = \"cpucluster\"\n", "cpu_cluster_name = \"cpu-cluster\"\n",
"try:\n", "try:\n",
" cpu_cluster = AmlCompute(ws, cpu_cluster_name)\n", " cpu_cluster = AmlCompute(ws, cpu_cluster_name)\n",
" print(\"found existing cluster.\")\n", " print(\"found existing cluster.\")\n",
@@ -108,7 +108,7 @@
" cpu_cluster.wait_for_completion(show_output=True)\n", " cpu_cluster.wait_for_completion(show_output=True)\n",
" \n", " \n",
"# AmlCompute\n", "# AmlCompute\n",
"gpu_cluster_name = \"gpucluster\"\n", "gpu_cluster_name = \"gpu-cluster\"\n",
"try:\n", "try:\n",
" gpu_cluster = AmlCompute(ws, gpu_cluster_name)\n", " gpu_cluster = AmlCompute(ws, gpu_cluster_name)\n",
" print(\"found existing cluster.\")\n", " print(\"found existing cluster.\")\n",
@@ -351,7 +351,6 @@
" inputs=[models_dir, ffmpeg_images],\n", " inputs=[models_dir, ffmpeg_images],\n",
" outputs=[processed_images],\n", " outputs=[processed_images],\n",
" pip_packages=[\"mpi4py\", \"torch\", \"torchvision\"],\n", " pip_packages=[\"mpi4py\", \"torch\", \"torchvision\"],\n",
" runconfig=amlcompute_run_config,\n",
" use_gpu=True,\n", " use_gpu=True,\n",
" source_directory=scripts_folder\n", " source_directory=scripts_folder\n",
")\n", ")\n",

View File

@@ -95,8 +95,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code gets the default compute cluster.\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"\n", "\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
] ]
@@ -107,7 +109,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current AmlCompute. \n", "# use get_status() to get a detailed status for the current AmlCompute. \n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -117,7 +136,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"." "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
] ]
}, },
{ {
@@ -223,7 +242,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_training=MpiConfiguration()`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters." "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters."
] ]
}, },
{ {

View File

@@ -98,8 +98,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or Attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use default `AmlCompute` as the training compute resource.\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"\n", "\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
] ]
@@ -110,7 +112,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current AmlCompute\n", "# use get_status() to get a detailed status for the current AmlCompute\n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -270,7 +289,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.runconfig import MpiConfiguration\n",
"from azureml.train.estimator import Estimator\n", "from azureml.train.estimator import Estimator\n",
"\n", "\n",
"script_params = {\n", "script_params = {\n",
@@ -284,7 +302,8 @@
" entry_script='cntk_distr_mnist.py',\n", " entry_script='cntk_distr_mnist.py',\n",
" script_params=script_params,\n", " script_params=script_params,\n",
" node_count=2,\n", " node_count=2,\n",
" distributed_training=MpiConfiguration(),\n", " process_count_per_node=1,\n",
" distributed_backend='mpi',\n",
" pip_packages=['cntk-gpu==2.6'],\n", " pip_packages=['cntk-gpu==2.6'],\n",
" custom_docker_image='microsoft/mmlspark:gpu-0.12',\n", " custom_docker_image='microsoft/mmlspark:gpu-0.12',\n",
" use_gpu=True)" " use_gpu=True)"
@@ -296,7 +315,7 @@
"source": [ "source": [
"We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_image`. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n", "We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_image`. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n",
"\n", "\n",
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_training=MpiConfiguration()`." "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_backend='mpi'`."
] ]
}, },
{ {

View File

@@ -96,8 +96,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code uses the default compute in the workspace.\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"\n", "\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
] ]
@@ -108,7 +110,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current AmlCompute. \n", "# use get_status() to get a detailed status for the current AmlCompute. \n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -118,7 +137,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"." "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
] ]
}, },
{ {
@@ -236,7 +255,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_training=MpiConfiguration()`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters." "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters."
] ]
}, },
{ {

View File

@@ -98,8 +98,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or Attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource.\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"\n", "\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
] ]
@@ -110,7 +112,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(\"GPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -120,7 +139,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"." "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
] ]
}, },
{ {
@@ -304,7 +323,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_training=MpiConfiguration()`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n", "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n",
"\n", "\n",
"Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore." "Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore."
] ]

View File

@@ -98,8 +98,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or Attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource.\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"\n", "\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
] ]
@@ -110,7 +112,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -220,7 +239,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_training=TensorflowConfiguration()`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters." "The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters."
] ]
}, },
{ {

View File

@@ -9,8 +9,16 @@ print("Hello Azure ML!")
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument('--numbers-in-sequence', type=int, dest='num_in_sequence', default=10, parser.add_argument('--numbers-in-sequence', type=int, dest='num_in_sequence', default=10,
help='number of fibonacci numbers in sequence') help='number of fibonacci numbers in sequence')
# This is how you can use a bool argument in Python. If you want the 'my_bool_var' to be True, just pass it
# in Estimator's script_param as script+params:{'my_bool_var': ''}.
# And, if you want to use it as False, then do not pass it in the Estimator's script_params.
# You can reverse the behavior by setting action='store_false' in the next line.
parser.add_argument("--my_bool_var", action='store_true')
args = parser.parse_args() args = parser.parse_args()
num = args.num_in_sequence num = args.num_in_sequence
my_bool_var = args.my_bool_var
def fibo(n): def fibo(n):
@@ -23,6 +31,7 @@ def fibo(n):
try: try:
from azureml.core import Run from azureml.core import Run
run = Run.get_context() run = Run.get_context()
print("The value of boolean parameter 'my_bool_var' is {}".format(my_bool_var))
print("Log Fibonacci numbers.") print("Log Fibonacci numbers.")
for i in range(0, num - 1): for i in range(0, num - 1):
run.log('Fibonacci numbers', fibo(i)) run.log('Fibonacci numbers', fibo(i))

View File

@@ -113,8 +113,18 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or Attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource." "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
] ]
}, },
{ {
@@ -123,7 +133,25 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"cpu_cluster = ws.get_default_compute_target(\"CPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"cpu-cluster\"\n",
"\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n",
"\n",
" # create the cluster\n",
" cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(cpu_cluster.get_status().serialize())" "print(cpu_cluster.get_status().serialize())"
@@ -207,8 +235,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# use a conda environment, don't use Docker, on local computer\n", "# use a conda environment, don't use Docker, on local computer\n",
"# Let's see how you can pass bool arguments in the script_params. Passing `'--my_bool_var': ''` will set my_bool_var as True and\n",
"# if you want it to be False, just do not pass it in the script_params.\n",
"script_params = {\n", "script_params = {\n",
" '--numbers-in-sequence': 10\n", " '--numbers-in-sequence': 10,\n",
" '--my_bool_var': ''\n",
"}\n", "}\n",
"est = Estimator(source_directory='.', script_params=script_params, compute_target='local', entry_script='dummy_train.py', use_docker=False)\n", "est = Estimator(source_directory='.', script_params=script_params, compute_target='local', entry_script='dummy_train.py', use_docker=False)\n",
"run = exp.submit(est)\n", "run = exp.submit(est)\n",

View File

@@ -95,8 +95,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or Attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"\n", "\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
] ]
@@ -107,7 +109,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -117,7 +136,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"." "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
] ]
}, },
{ {

View File

@@ -239,8 +239,18 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or Attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource." "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
] ]
}, },
{ {
@@ -249,7 +259,26 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -259,7 +288,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Now that you have retrtieved the compute target, let's see what the workspace's `compute_targets` property returns." "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named \"gpu-cluster\" of type `AmlCompute`."
] ]
}, },
{ {
@@ -351,7 +380,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Create TensorFlow estimator & add Keras\n", "## Create TensorFlow estimator & add Keras\n",
"Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the `gpucluster` as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the `gpu-cluster` as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n",
"The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed. In this case, we add `keras` package (for the Keras framework obviously), and `matplotlib` package for plotting a \"Loss vs. Accuracy\" chart and record it in run history." "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed. In this case, we add `keras` package (for the Keras framework obviously), and `matplotlib` package for plotting a \"Loss vs. Accuracy\" chart and record it in run history."
] ]
}, },
@@ -524,7 +553,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In the training script, the Keras model is saved into two files, `model.json` and `model.h5`, in the `outputs/models` folder on the gpucluster AmlCompute node. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `run` object to download the model files. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`." "In the training script, the Keras model is saved into two files, `model.json` and `model.h5`, in the `outputs/models` folder on the gpu-cluster AmlCompute node. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `run` object to download the model files. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`."
] ]
}, },
{ {

View File

@@ -261,8 +261,18 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute\n", "## Create or Attach existing AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource." "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
] ]
}, },
{ {
@@ -271,7 +281,26 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -281,7 +310,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Now that you have retrieved the compute target, let's see what the workspace's `compute_targets` property returns." "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpu-cluster' of type `AmlCompute`."
] ]
}, },
{ {

View File

@@ -108,7 +108,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Get default AmlCompute" "## Create AmlCompute"
] ]
}, },
{ {
@@ -126,7 +126,24 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"compute_target = ws.get_default_compute_target(type=\"CPU\")\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())" "print(compute_target.get_status().serialize())"
@@ -136,7 +153,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The above code retrieves the default CPU compute." "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
] ]
}, },
{ {

View File

@@ -22,7 +22,7 @@ def main():
help='Penalty parameter of the error term') help='Penalty parameter of the error term')
args = parser.parse_args() args = parser.parse_args()
run.log('Kernel type', np.string(args.kernel)) run.log('Kernel type', np.str(args.kernel))
run.log('Penalty', np.float(args.penalty)) run.log('Penalty', np.float(args.penalty))
# loading the iris dataset # loading the iris dataset

View File

@@ -188,79 +188,6 @@
"myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" "myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get the default compute target\n",
"\n",
"In this case, we use the default `AmlCompute`target from the workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
"\n",
"src = ScriptRunConfig(source_directory=project_folder, script='train.py')\n",
"\n",
"# Use default compute target\n",
"src.run_config.target = ws.get_default_compute_target(type=\"CPU\").name\n",
"\n",
"# Set environment\n",
"src.run_config.environment = myenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Submit run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = experiment.submit(src)\n",
"\n",
"# Show run details\n",
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_metrics()"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -283,7 +210,7 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your CPU cluster\n", "# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpucluster\"\n", "cpu_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"# Verify that cluster does not exist already\n", "# Verify that cluster does not exist already\n",
"try:\n", "try:\n",
@@ -310,13 +237,28 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core import ScriptRunConfig\n",
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
"\n",
"src = ScriptRunConfig(source_directory=project_folder, script='train.py')\n",
"\n",
"# Set compute target to the one created in previous step\n", "# Set compute target to the one created in previous step\n",
"src.run_config.target = cpu_cluster.name\n", "src.run_config.target = cpu_cluster.name\n",
"\n",
"# Set environment\n",
"src.run_config.environment = myenv\n",
" \n", " \n",
"run = experiment.submit(config=src)\n", "run = experiment.submit(config=src)\n",
"run" "run"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -364,7 +306,7 @@
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"# Choose a name for your CPU cluster\n", "# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpucluster\"\n", "cpu_cluster_name = \"cpu-cluster\"\n",
"\n", "\n",
"# Verify that cluster does not exist already\n", "# Verify that cluster does not exist already\n",
"try:\n", "try:\n",
@@ -463,7 +405,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"#Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n", "#Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n",
"#'cpucluster' in this case but use a different VM family for instance.\n", "#'cpu-cluster' in this case but use a different VM family for instance.\n",
"\n", "\n",
"#cpu_cluster.delete()" "#cpu_cluster.delete()"
] ]

View File

@@ -126,7 +126,9 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create or Attach existing compute resource\n", "### Create or Attach existing compute resource\n",
"By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you use default Azure Machine Learning Compute as your training environment." "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n",
"\n",
"**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process."
] ]
}, },
{ {
@@ -140,10 +142,38 @@
}, },
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"import os\n", "import os\n",
"\n", "\n",
"cluster_type = os.environ.get(\"AML_COMPUTE_CLUSTER_TYPE\", \"CPU\")\n", "# choose a name for your cluster\n",
"compute_target = ws.get_default_compute_target(cluster_type)" "compute_name = os.environ.get(\"AML_COMPUTE_CLUSTER_NAME\", \"cpu-cluster\")\n",
"compute_min_nodes = os.environ.get(\"AML_COMPUTE_CLUSTER_MIN_NODES\", 0)\n",
"compute_max_nodes = os.environ.get(\"AML_COMPUTE_CLUSTER_MAX_NODES\", 4)\n",
"\n",
"# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6\n",
"vm_size = os.environ.get(\"AML_COMPUTE_CLUSTER_SKU\", \"STANDARD_D2_V2\")\n",
"\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('found compute target. just use it. ' + compute_name)\n",
"else:\n",
" print('creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,\n",
" min_nodes = compute_min_nodes, \n",
" max_nodes = compute_max_nodes)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
" \n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()\n",
" print(compute_target.get_status().serialize())"
] ]
}, },
{ {
@@ -324,8 +354,8 @@
"# get hold of the current run\n", "# get hold of the current run\n",
"run = Run.get_context()\n", "run = Run.get_context()\n",
"\n", "\n",
"print('Train a logistic regression model with regularizaion rate of', args.reg)\n", "print('Train a logistic regression model with regularization rate of', args.reg)\n",
"clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n", "clf = LogisticRegression(C=1.0/args.reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42)\n",
"clf.fit(X_train, y_train)\n", "clf.fit(X_train, y_train)\n",
"\n", "\n",
"print('Predict the test set')\n", "print('Predict the test set')\n",
@@ -386,14 +416,13 @@
"source": [ "source": [
"### Create an estimator\n", "### Create an estimator\n",
"\n", "\n",
"An estimator object is used to submit the run. Create your estimator by running the following code to define:\n", "An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create SKLearn estimator for scikit-learn model, by specifying\n",
"\n", "\n",
"* The name of the estimator object, `est`\n", "* The name of the estimator object, `est`\n",
"* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n",
"* The compute target. In this case you will use the AmlCompute you created\n", "* The compute target. In this case you will use the AmlCompute you created\n",
"* The training script name, train.py\n", "* The training script name, train.py\n",
"* Parameters required from the training script \n", "* Parameters required from the training script \n",
"* Python packages needed for training\n",
"\n", "\n",
"In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.path('mnist').as_mount()`)." "In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.path('mnist').as_mount()`)."
] ]
@@ -408,18 +437,17 @@
}, },
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.estimator import Estimator\n", "from azureml.train.sklearn import SKLearn\n",
"\n", "\n",
"script_params = {\n", "script_params = {\n",
" '--data-folder': ds.path('mnist').as_mount(),\n", " '--data-folder': ds.path('mnist').as_mount(),\n",
" '--regularization': 0.05\n", " '--regularization': 0.5\n",
"}\n", "}\n",
"\n", "\n",
"est = Estimator(source_directory=script_folder,\n", "est = SKLearn(source_directory=script_folder,\n",
" script_params=script_params,\n", " script_params=script_params,\n",
" compute_target=compute_target,\n", " compute_target=compute_target,\n",
" entry_script='train.py',\n", " entry_script='train.py')"
" conda_packages=['scikit-learn'])"
] ]
}, },
{ {
@@ -646,18 +674,6 @@
"language": "python", "language": "python",
"name": "python36" "name": "python36"
}, },
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
},
"msauthor": "roastala" "msauthor": "roastala"
}, },
"nbformat": 4, "nbformat": 4,