update cluster creation

2025-12-22 18:42:41 -05:00 · 2019-06-13 12:14:58 -04:00
parent daf27a76e4
commit de162316d7
29 changed files with 682 additions and 204 deletions
--- a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb
+++ b/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb
@@ -95,8 +95,10 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "## Get default AmlCompute\n",
-        "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code gets the default compute cluster.\n",
+        "## Create or attach existing AmlCompute\n",
+        "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
+        "\n",
+        "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
        "\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
@@ -107,7 +109,24 @@
      "metadata": {},
      "outputs": [],
      "source": [
-        "compute_target = ws.get_default_compute_target(type=\"GPU\")\n",
+        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
+        "from azureml.core.compute_target import ComputeTargetException\n",
+        "\n",
+        "# choose a name for your cluster\n",
+        "cluster_name = \"gpu-cluster\"\n",
+        "\n",
+        "try:\n",
+        "    compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
+        "    print('Found existing compute target.')\n",
+        "except ComputeTargetException:\n",
+        "    print('Creating a new compute target...')\n",
+        "    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
+        "                                                           max_nodes=4)\n",
+        "\n",
+        "    # create the cluster\n",
+        "    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
+        "\n",
+        "    compute_target.wait_for_completion(show_output=True)\n",
        "\n",
        "# use get_status() to get a detailed status for the current AmlCompute. \n",
        "print(compute_target.get_status().serialize())"
@@ -117,7 +136,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"."
+        "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
      ]
    },
    {
@@ -223,7 +242,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_training=MpiConfiguration()`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters."
+        "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters."
      ]
    },
    {