MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Azure Machine Learning Pipeline with  HyperDriveStep\n",
        "\n",
        "\n",
        "This notebook is used to demonstrate the use of HyperDriveStep in AML Pipeline.\n",
        "\n",
        "## Azure Machine Learning and Pipeline SDK-specific imports\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "import shutil\n",
        "import urllib\n",
        "from azureml.core import Experiment\n",
        "from azureml.core.datastore import Datastore\n",
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.exceptions import ComputeTargetException\n",
        "from azureml.data.data_reference import DataReference\n",
        "from azureml.pipeline.steps import HyperDriveStep\n",
        "from azureml.pipeline.core import Pipeline\n",
        "from azureml.train.dnn import TensorFlow\n",
        "from azureml.train.hyperdrive import *\n",
        "\n",
        "# Check core SDK version number\n",
        "print(\"SDK version:\", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initialize workspace\n",
        "\n",
        "Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create an Azure ML experiment\n",
        "Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "script_folder = './tf-mnist'\n",
        "os.makedirs(script_folder, exist_ok=True)\n",
        "\n",
        "exp = Experiment(workspace=ws, name='tf-mnist')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Download MNIST dataset\n",
        "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "os.makedirs('./data/mnist', exist_ok=True)\n",
        "\n",
        "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n",
        "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n",
        "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n",
        "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Upload MNIST dataset to blob datastore \n",
        "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). In the next step, we will use Azure Blob Storage and upload the training and test set into the Azure Blob datastore, which we will then later be mount on a Batch AI cluster for training."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ds = Datastore(workspace=ws, name=\"MyBlobDatastore\")\n",
        "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Retrieve or create a Azure Machine Learning compute\n",
        "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
        "\n",
        "If we could not find the compute with the given name in the previous cell, then we will create a new compute here. We will create an Azure Machine Learning Compute containing **STANDARD_D2_V2 CPU VMs**. This process is broken down into the following steps:\n",
        "\n",
        "1. Create the configuration\n",
        "2. Create the Azure Machine Learning compute\n",
        "\n",
        "**This process will take about 3 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "cluster_name = \"aml-compute\"\n",
        "\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
        "    print('Found existing compute target {}.'.format(cluster_name))\n",
        "except ComputeTargetException:\n",
        "    print('Creating a new compute target...')\n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
        "                                                               max_nodes=4)\n",
        "\n",
        "    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
        "    compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)\n",
        "\n",
        "print(\"Azure Machine Learning Compute attached\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Copy the training files into the script folder\n",
        "The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# the training logic is in the tf_mnist.py file.\n",
        "shutil.copy('./tf_mnist.py', script_folder)\n",
        "\n",
        "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n",
        "shutil.copy('./utils.py', script_folder)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create TensorFlow estimator\n",
        "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the Batch AI cluster as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n",
        "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "est = TensorFlow(source_directory=script_folder,                 \n",
        "                 compute_target=compute_target,\n",
        "                 entry_script='tf_mnist.py', \n",
        "                 use_gpu=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Intelligent hyperparameter tuning\n",
        "We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling.\n",
        "\n",
        "In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`validation_acc`)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ps = RandomParameterSampling(\n",
        "    {\n",
        "        '--batch-size': choice(25, 50, 100),\n",
        "        '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n",
        "        '--second-layer-neurons': choice(10, 50, 200, 500),\n",
        "        '--learning-rate': loguniform(-6, -1)\n",
        "    }\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric.\n",
        "\n",
        "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "hd_config = HyperDriveRunConfig(estimator=est, \n",
        "                                hyperparameter_sampling=ps,\n",
        "                                policy=early_termination_policy,\n",
        "                                primary_metric_name='validation_acc', \n",
        "                                primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n",
        "                                max_total_runs=1,\n",
        "                                max_concurrent_runs=1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Add HyperDrive as a step of pipeline\n",
        "\n",
        "Let's setup a data reference for inputs of hyperdrive step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_folder = DataReference(\n",
        "    datastore=ds,\n",
        "    data_reference_name=\"mnist_data\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### HyperDriveStep\n",
        "HyperDriveStep can be used to run HyperDrive job as a step in pipeline.\n",
        "- **name:** Name of the step\n",
        "- **hyperdrive_run_config:** A HyperDriveRunConfig that defines the configuration for this HyperDrive run\n",
        "- **estimator_entry_script_arguments:** List of command-line arguments for estimator entry script\n",
        "- **inputs:** List of input port bindings\n",
        "- **outputs:** List of output port bindings\n",
        "- **metrics_output:** Optional value specifying the location to store HyperDrive run metrics as a JSON file\n",
        "- **allow_reuse:** whether to allow reuse\n",
        "- **version:** version\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "hd_step = HyperDriveStep(\n",
        "    name=\"hyperdrive_module\",\n",
        "    hyperdrive_run_config=hd_config,\n",
        "    estimator_entry_script_arguments=['--data-folder', data_folder],\n",
        "    inputs=[data_folder])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Build the experiment"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline = Pipeline(workspace=ws, steps=[hd_step])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit the experiment "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline_run = Experiment(ws, 'Hyperdrive_Test').submit(pipeline)\n",
        "pipeline_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View Run Details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "RunDetails(pipeline_run).show()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "sonnyp"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.7"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}