{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Use Azure Machine Learning Pipelines for batch prediction\n",
        "\n",
        "In this tutorial, you use Azure Machine Learning service pipelines to run a batch scoring image classification job. The example job uses the pre-trained [Inception-V3](https://arxiv.org/abs/1512.00567) CNN (convolutional neural network) Tensorflow model to classify unlabeled images. Machine learning pipelines optimize your workflow with speed, portability, and reuse so you can focus on your expertise, machine learning, rather than on infrastructure and automation. After building and publishing a pipeline, you can configure a REST endpoint to enable triggering the pipeline from any HTTP library on any platform.\n",
        "\n",
        "\n",
        "In this tutorial, you learn the following tasks:\n",
        "\n",
        "> * Configure workspace and download sample data\n",
        "> * Create data objects to fetch and output data\n",
        "> * Download, prepare, and register the model to your workspace\n",
        "> * Provision compute targets and create a scoring script\n",
        "> * Build, run, and publish a pipeline\n",
        "> * Enable a REST endpoint for the pipeline\n",
        "\n",
        "If you don\u00e2\u20ac\u2122t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prerequisites\n",
        "\n",
        "* Complete the [setup tutorial](https://docs.microsoft.com/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) if you don't already have an Azure Machine Learning service workspace or notebook virtual machine.\n",
        "* After you complete the setup tutorial, open the **tutorials/tutorial-pipeline-batch-scoring-classification.ipynb** notebook using the same notebook server.\n",
        "\n",
        "This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](how-to-configure-environment.md#local). Run `pip install azureml-sdk[notebooks] azureml-pipeline-core azureml-pipeline-steps pandas requests` to get the required packages."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Configure workspace and create datastore"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create a workspace object from the existing workspace. A [Workspace](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) is a class that accepts your Azure subscription and resource information. It also creates a cloud resource to monitor and track your model runs. `Workspace.from_config()` reads the file **config.json** and loads the authentication details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "ws = Workspace.from_config()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create a datastore for sample images\n",
        "\n",
        "Get the ImageNet evaluation public data sample from the public blob container `sampledata` on the account `pipelinedata`. Calling `register_azure_blob_container()` makes the data available to the workspace under the name `images_datastore`. Then specify the workspace default datastore as the output datastore, which you use for scoring output in the pipeline."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.datastore import Datastore\n",
        "\n",
        "batchscore_blob = Datastore.register_azure_blob_container(ws, \n",
        "                      datastore_name=\"images_datastore\", \n",
        "                      container_name=\"sampledata\", \n",
        "                      account_name=\"pipelinedata\", \n",
        "                      overwrite=True)\n",
        "\n",
        "def_data_store = ws.get_default_datastore()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create data objects\n",
        "\n",
        "When building pipelines, `DataReference` objects are used for reading data from workspace datastores, and `PipelineData` objects are used for transferring intermediate data between pipeline steps.\n",
        "\n",
        "This batch scoring example only uses one pipeline step, but in use-cases with multiple steps, the typical flow will include:\n",
        "\n",
        "1. Using `DataReference` objects as **inputs** to fetch raw data, performing some transformations, then **outputting** a `PipelineData` object.\n",
        "1. Use the previous step's `PipelineData` **output object** as an *input object*, repeated for subsequent steps.\n",
        "\n",
        "For this scenario you create `DataReference` objects corresponding to the datastore directories for both the input images and the classification labels (y-test values). You also create a `PipelineData` object for the batch scoring output data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.data.data_reference import DataReference\n",
        "from azureml.pipeline.core import PipelineData\n",
        "\n",
        "input_images = DataReference(datastore=batchscore_blob, \n",
        "                             data_reference_name=\"input_images\",\n",
        "                             path_on_datastore=\"batchscoring/images\",\n",
        "                             mode=\"download\"\n",
        "                            )\n",
        "\n",
        "label_dir = DataReference(datastore=batchscore_blob, \n",
        "                          data_reference_name=\"input_labels\",\n",
        "                          path_on_datastore=\"batchscoring/labels\",\n",
        "                          mode=\"download\"                          \n",
        "                         )\n",
        "\n",
        "output_dir = PipelineData(name=\"scores\", \n",
        "                          datastore=def_data_store, \n",
        "                          output_path_on_compute=\"batchscoring/results\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Download and register the model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Download the pre-trained Tensorflow model to use it for batch scoring in the pipeline. First create a local directory where you store the model, then download and extract it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "import tarfile\n",
        "import urllib.request\n",
        "\n",
        "if not os.path.isdir(\"models\"):\n",
        "    os.mkdir(\"models\")\n",
        "    \n",
        "response = urllib.request.urlretrieve(\"http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz\", \"model.tar.gz\")\n",
        "tar = tarfile.open(\"model.tar.gz\", \"r:gz\")\n",
        "tar.extractall(\"models\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now you register the model to your workspace, which allows you to easily retrieve it in the pipeline process. In the `register()` static function, the `model_name` parameter is the key you use to locate your model throughout the SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.model import Model\n",
        " \n",
        "model = Model.register(model_path=\"models/inception_v3.ckpt\",\n",
        "                       model_name=\"inception\",\n",
        "                       tags={\"pretrained\": \"inception\"},\n",
        "                       description=\"Imagenet trained tensorflow inception\",\n",
        "                       workspace=ws)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create and attach remote compute target\n",
        "\n",
        "Azure Machine Learning service pipelines cannot be run locally, and only run on cloud resources. Remote compute targets are reusable virtual compute environments where you run experiments and work-flows. Run the following code to create a GPU-enabled [`AmlCompute`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.compute.amlcompute.amlcompute?view=azure-ml-py) target, and attach it to your workspace. See the [conceptual article](https://docs.microsoft.com/azure/machine-learning/service/concept-compute-target) for more information on compute targets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import AmlCompute, ComputeTarget\n",
        "from azureml.exceptions import ComputeTargetException\n",
        "compute_name = \"gpu-cluster\"\n",
        "\n",
        "# checks to see if compute target already exists in workspace, else create it\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=compute_name)\n",
        "except ComputeTargetException:\n",
        "    config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
        "                                                   vm_priority=\"lowpriority\", \n",
        "                                                   min_nodes=0, \n",
        "                                                   max_nodes=1)\n",
        "\n",
        "    compute_target = ComputeTarget.create(workspace=ws, name=compute_name, provisioning_configuration=config)\n",
        "    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Write a scoring script"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To do the scoring, you create a batch scoring script `batch_scoring.py`, and write it to the current directory. The script takes input images, applies the classification model, and outputs the predictions to a results file.\n",
        "\n",
        "The script `batch_scoring.py` takes the following parameters, which get passed from the `PythonScriptStep` that you create later:\n",
        "\n",
        "- `--model_name`: the name of the model being used\n",
        "- `--label_dir` : the directory holding the `labels.txt` file \n",
        "- `--dataset_path`: the directory containing the input images\n",
        "- `--output_dir` : the script will run the model on the data and output a `results-label.txt` to this directory\n",
        "- `--batch_size` : the batch size used in running the model\n",
        "\n",
        "The pipelines infrastructure uses the `ArgumentParser` class to pass parameters into pipeline steps. For example, in the code below the first argument `--model_name` is given the property identifier `model_name`. In the `main()` function, this property is accessed using `Model.get_model_path(args.model_name)`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%writefile batch_scoring.py\n",
        "\n",
        "import os\n",
        "import argparse\n",
        "import datetime\n",
        "import time\n",
        "import tensorflow as tf\n",
        "from math import ceil\n",
        "import numpy as np\n",
        "import shutil\n",
        "from tensorflow.contrib.slim.python.slim.nets import inception_v3\n",
        "from azureml.core.model import Model\n",
        "\n",
        "slim = tf.contrib.slim\n",
        "\n",
        "parser = argparse.ArgumentParser(description=\"Start a tensorflow model serving\")\n",
        "parser.add_argument('--model_name', dest=\"model_name\", required=True)\n",
        "parser.add_argument('--label_dir', dest=\"label_dir\", required=True)\n",
        "parser.add_argument('--dataset_path', dest=\"dataset_path\", required=True)\n",
        "parser.add_argument('--output_dir', dest=\"output_dir\", required=True)\n",
        "parser.add_argument('--batch_size', dest=\"batch_size\", type=int, required=True)\n",
        "\n",
        "args = parser.parse_args()\n",
        "\n",
        "image_size = 299\n",
        "num_channel = 3\n",
        "\n",
        "# create output directory if it does not exist\n",
        "os.makedirs(args.output_dir, exist_ok=True)\n",
        "\n",
        "\n",
        "def get_class_label_dict(label_file):\n",
        "    label = []\n",
        "    proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines()\n",
        "    for l in proto_as_ascii_lines:\n",
        "        label.append(l.rstrip())\n",
        "    return label\n",
        "\n",
        "\n",
        "class DataIterator:\n",
        "    def __init__(self, data_dir):\n",
        "        self.file_paths = []\n",
        "        image_list = os.listdir(data_dir)\n",
        "        self.file_paths = [data_dir + '/' + file_name.rstrip() for file_name in image_list]\n",
        "\n",
        "        self.labels = [1 for file_name in self.file_paths]\n",
        "\n",
        "    @property\n",
        "    def size(self):\n",
        "        return len(self.labels)\n",
        "\n",
        "    def input_pipeline(self, batch_size):\n",
        "        images_tensor = tf.convert_to_tensor(self.file_paths, dtype=tf.string)\n",
        "        labels_tensor = tf.convert_to_tensor(self.labels, dtype=tf.int64)\n",
        "        input_queue = tf.train.slice_input_producer([images_tensor, labels_tensor], shuffle=False)\n",
        "        labels = input_queue[1]\n",
        "        images_content = tf.read_file(input_queue[0])\n",
        "\n",
        "        image_reader = tf.image.decode_jpeg(images_content, channels=num_channel, name=\"jpeg_reader\")\n",
        "        float_caster = tf.cast(image_reader, tf.float32)\n",
        "        new_size = tf.constant([image_size, image_size], dtype=tf.int32)\n",
        "        images = tf.image.resize_images(float_caster, new_size)\n",
        "        images = tf.divide(tf.subtract(images, [0]), [255])\n",
        "\n",
        "        image_batch, label_batch = tf.train.batch([images, labels], batch_size=batch_size, capacity=5 * batch_size)\n",
        "        return image_batch\n",
        "\n",
        "\n",
        "def main(_):\n",
        "    label_file_name = os.path.join(args.label_dir, \"labels.txt\")\n",
        "    label_dict = get_class_label_dict(label_file_name)\n",
        "    classes_num = len(label_dict)\n",
        "    test_feeder = DataIterator(data_dir=args.dataset_path)\n",
        "    total_size = len(test_feeder.labels)\n",
        "    count = 0\n",
        "    \n",
        "    # get model from model registry\n",
        "    model_path = Model.get_model_path(args.model_name)\n",
        "    \n",
        "    with tf.Session() as sess:\n",
        "        test_images = test_feeder.input_pipeline(batch_size=args.batch_size)\n",
        "        with slim.arg_scope(inception_v3.inception_v3_arg_scope()):\n",
        "            input_images = tf.placeholder(tf.float32, [args.batch_size, image_size, image_size, num_channel])\n",
        "            logits, _ = inception_v3.inception_v3(input_images,\n",
        "                                                  num_classes=classes_num,\n",
        "                                                  is_training=False)\n",
        "            probabilities = tf.argmax(logits, 1)\n",
        "\n",
        "        sess.run(tf.global_variables_initializer())\n",
        "        sess.run(tf.local_variables_initializer())\n",
        "        coord = tf.train.Coordinator()\n",
        "        threads = tf.train.start_queue_runners(sess=sess, coord=coord)\n",
        "        saver = tf.train.Saver()\n",
        "        saver.restore(sess, model_path)\n",
        "        out_filename = os.path.join(args.output_dir, \"result-labels.txt\")\n",
        "        with open(out_filename, \"w\") as result_file:\n",
        "            i = 0\n",
        "            while count < total_size and not coord.should_stop():\n",
        "                test_images_batch = sess.run(test_images)\n",
        "                file_names_batch = test_feeder.file_paths[i * args.batch_size:\n",
        "                                                          min(test_feeder.size, (i + 1) * args.batch_size)]\n",
        "                results = sess.run(probabilities, feed_dict={input_images: test_images_batch})\n",
        "                new_add = min(args.batch_size, total_size - count)\n",
        "                count += new_add\n",
        "                i += 1\n",
        "                for j in range(new_add):\n",
        "                    result_file.write(os.path.basename(file_names_batch[j]) + \": \" + label_dict[results[j]] + \"\\n\")\n",
        "                result_file.flush()\n",
        "            coord.request_stop()\n",
        "            coord.join(threads)\n",
        "\n",
        "        shutil.copy(out_filename, \"./outputs/\")\n",
        "\n",
        "if __name__ == \"__main__\":\n",
        "    tf.app.run()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The pipeline in this tutorial only has one step and writes the output to a file, but for multi-step pipelines, you also use `ArgumentParser` to define a directory to write output data for input to subsequent steps. See the [notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb) for an example of passing data between multiple pipeline steps using the `ArgumentParser` design pattern."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Build and run the pipeline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Before running the pipeline, you create an object that defines the python environment and dependencies needed by your script `batch_scoring.py`. The main dependency required is Tensorflow, but you also install `azureml-defaults` for background processes from the SDK. Create a `RunConfiguration` object using the dependencies, and also specify Docker and Docker-GPU support."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n",
        "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n",
        "\n",
        "cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.13.1\", \"azureml-defaults\"])\n",
        "\n",
        "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n",
        "amlcompute_run_config.environment.docker.enabled = True\n",
        "amlcompute_run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE\n",
        "amlcompute_run_config.environment.spark.precache_packages = False"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Parameterize the pipeline\n",
        "\n",
        "Define a custom parameter for the pipeline to control the batch size. After the pipeline has been published and exposed via a REST endpoint, any configured parameters are also exposed and can be specified in the JSON payload when rerunning the pipeline with an HTTP request.\n",
        "\n",
        "Create a `PipelineParameter` object to enable this behavior, and define a name and default value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core.graph import PipelineParameter\n",
        "batch_size_param = PipelineParameter(name=\"param_batch_size\", default_value=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create the pipeline step\n",
        "\n",
        "A pipeline step is an object that encapsulates everything you need for running a pipeline including:\n",
        "\n",
        "* environment and dependency settings\n",
        "* the compute resource to run the pipeline on\n",
        "* input and output data, and any custom parameters\n",
        "* reference to a script or SDK-logic to run during the step\n",
        "\n",
        "There are multiple classes that inherit from the parent class [`PipelineStep`](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.builder.pipelinestep?view=azure-ml-py) to assist with building a step using certain frameworks and stacks. In this example, you use the [`PythonScriptStep`](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py) class to define your step logic using a custom python script. Note that if an argument to your script is either an input to the step or output of the step, it must be defined **both** in the `arguments` array, **as well as** in either the `input` or `output` parameter, respectively. \n",
        "\n",
        "An object reference in the `outputs` array becomes available as an **input** for a subsequent pipeline step, for scenarios where there is more than one step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.steps import PythonScriptStep\n",
        "\n",
        "batch_score_step = PythonScriptStep(\n",
        "    name=\"batch_scoring\",\n",
        "    script_name=\"batch_scoring.py\",\n",
        "    arguments=[\"--dataset_path\", input_images, \n",
        "               \"--model_name\", \"inception\",\n",
        "               \"--label_dir\", label_dir, \n",
        "               \"--output_dir\", output_dir, \n",
        "               \"--batch_size\", batch_size_param],\n",
        "    compute_target=compute_target,\n",
        "    inputs=[input_images, label_dir],\n",
        "    outputs=[output_dir],\n",
        "    runconfig=amlcompute_run_config\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For a list of all classes for different step types, see the [steps package](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps?view=azure-ml-py)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Run the pipeline\n",
        "\n",
        "Now you run the pipeline. First create a `Pipeline` object with your workspace reference and the pipeline step you created. The `steps` parameter is an array of steps, and in this case there is only one step for batch scoring. To build pipelines with multiple steps, you place the steps in order in this array.\n",
        "\n",
        "Next use the `Experiment.submit()` function to submit the pipeline for execution. You also specify the custom parameter `param_batch_size`. The `wait_for_completion` function will output logs during the pipeline build process, which allows you to see current progress.\n",
        "\n",
        "Note: The first pipeline run takes roughly **15 minutes**, as all dependencies must be downloaded, a Docker image is created, and the Python environment is provisioned/created. Running it again takes significantly less time as those resources are reused. However, total run time depends on the workload of your scripts and processes running in each pipeline step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Experiment\n",
        "from azureml.pipeline.core import Pipeline\n",
        "\n",
        "pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n",
        "pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_parameters={\"param_batch_size\": 20})\n",
        "pipeline_run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Download and review output"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run the following code to download the output file created from the `batch_scoring.py` script, then explore the scoring results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "\n",
        "step_run = list(pipeline_run.get_children())[0]\n",
        "step_run.download_file(\"./outputs/result-labels.txt\")\n",
        "\n",
        "df = pd.read_csv(\"result-labels.txt\", delimiter=\":\", header=None)\n",
        "df.columns = [\"Filename\", \"Prediction\"]\n",
        "df.head(10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Publish and run from REST endpoint"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run the following code to publish the pipeline to your workspace. In your workspace in the portal, you can see metadata for the pipeline including run history and durations. You can also run the pipeline manually from the portal.\n",
        "\n",
        "Additionally, publishing the pipeline enables a REST endpoint to rerun the pipeline from any HTTP library on any platform."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "published_pipeline = pipeline_run.publish_pipeline(\n",
        "    name=\"Inception_v3_scoring\", description=\"Batch scoring using Inception v3 model\", version=\"1.0\")\n",
        "\n",
        "published_pipeline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To run the pipeline from the REST endpoint, you first need an OAuth2 Bearer-type authentication header. This example uses interactive authentication for illustration purposes, but for most production scenarios requiring automated or headless authentication, use service principle authentication as [described in this notebook](https://aka.ms/pl-restep-auth).\n",
        "\n",
        "Service principle authentication involves creating an **App Registration** in **Azure Active Directory**, generating a client secret, and then granting your service principal **role access** to your machine learning workspace. You then use the [`ServicePrincipalAuthentication`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py) class to manage your auth flow. \n",
        "\n",
        "Both `InteractiveLoginAuthentication` and `ServicePrincipalAuthentication` inherit from `AbstractAuthentication`, and in both cases you use the `get_authentication_header()` function in the same way to fetch the header."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.authentication import InteractiveLoginAuthentication\n",
        "\n",
        "interactive_auth = InteractiveLoginAuthentication()\n",
        "auth_header = interactive_auth.get_authentication_header()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Get the REST url from the `endpoint` property of the published pipeline object. You can also find the REST url in your workspace in the portal. Build an HTTP POST request to the endpoint, specifying your authentication header. Additionally, add a JSON payload object with the experiment name and the batch size parameter. As a reminder, the `param_batch_size` is passed through to your `batch_scoring.py` script because you defined it as a `PipelineParameter` object in the step configuration.\n",
        "\n",
        "Make the request to trigger the run. Access the `Id` key from the response dict to get the value of the run id."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import requests\n",
        "\n",
        "rest_endpoint = published_pipeline.endpoint\n",
        "response = requests.post(rest_endpoint, \n",
        "                         headers=auth_header, \n",
        "                         json={\"ExperimentName\": \"batch_scoring\",\n",
        "                               \"ParameterAssignments\": {\"param_batch_size\": 50}})\n",
        "run_id = response.json()[\"Id\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Use the run id to monitor the status of the new run. This will take another 10-15 min to run and will look similar to the previous pipeline run, so if you don't need to see another pipeline run, you can skip watching the full output."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core.run import PipelineRun\n",
        "from azureml.widgets import RunDetails\n",
        "\n",
        "published_pipeline_run = PipelineRun(ws.experiments[\"batch_scoring\"], run_id)\n",
        "RunDetails(published_pipeline_run).show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Clean up resources\n",
        "\n",
        "Do not complete this section if you plan on running other Azure Machine Learning service tutorials.\n",
        "\n",
        "### Stop the notebook VM\n",
        "\n",
        "If you used a cloud notebook server, stop the VM when you are not using it to reduce cost.\n",
        "\n",
        "1. In your workspace, select **Notebook VMs**.\n",
        "1. From the list, select the VM.\n",
        "1. Select **Stop**.\n",
        "1. When you're ready to use the server again, select **Start**.\n",
        "\n",
        "### Delete everything\n",
        "\n",
        "If you don't plan to use the resources you created, delete them, so you don't incur any charges.\n",
        "\n",
        "1. In the Azure portal, select **Resource groups** on the far left.\n",
        "1. From the list, select the resource group you created.\n",
        "1. Select **Delete resource group**.\n",
        "1. Enter the resource group name. Then select **Delete**.\n",
        "\n",
        "You can also keep the resource group but delete a single workspace. Display the workspace properties and select **Delete**."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next steps\n",
        "\n",
        "In this machine learning pipelines tutorial, you did the following tasks:\n",
        "\n",
        "> * Built a pipeline with environment dependencies to run on a remote GPU compute resource\n",
        "> * Created a scoring script to run batch predictions with a pre-trained Tensorflow model\n",
        "> * Published a pipeline and enabled it to be run from a REST endpoint\n",
        "\n",
        "See the [how-to](https://docs.microsoft.com/azure/machine-learning/service/how-to-create-your-first-pipeline?view=azure-devops) for additional detail on building pipelines with the machine learning SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "sanpil"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.5"
    },
    "msauthor": "trbye"
  },
  "nbformat": 4,
  "nbformat_minor": 2
}