MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Azure Machine Learning Pipeline with AzureBatchStep"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This notebook is used to demonstrate the use of AzureBatchStep in Azure Machine Learning Pipeline.\n",
        "An AzureBatchStep will submit a job to an AzureBatch Compute to run a simple windows executable."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Azure Machine Learning and Pipeline SDK-specific Imports"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import azureml.core\n",
        "from azureml.core import Workspace, Experiment\n",
        "from azureml.core.compute import ComputeTarget, BatchCompute\n",
        "from azureml.core.datastore import Datastore\n",
        "from azureml.data.data_reference import DataReference\n",
        "from azureml.exceptions import ComputeTargetException\n",
        "from azureml.pipeline.core import Pipeline, PipelineData\n",
        "from azureml.pipeline.steps import AzureBatchStep\n",
        "\n",
        "import os\n",
        "from os import path\n",
        "from tempfile import mkdtemp\n",
        "\n",
        "\n",
        "# Check core SDK version number\n",
        "print(\"SDK version:\", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initialize Workspace"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n",
        "\n",
        "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, If you don't have a config.json file, please go through the configuration Notebook located [here](https://github.com/Azure/MachineLearningNotebooks).  \n",
        "\n",
        "This sets you up with a working config file that has information on your workspace, subscription id, etc. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "print('Workspace Name: ' + ws.name, \n",
        "      'Azure Region: ' + ws.location, \n",
        "      'Subscription Id: ' + ws.subscription_id, \n",
        "      'Resource Group: ' + ws.resource_group, sep = '\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Attach Batch Compute to Workspace"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To submit jobs to Azure Batch service, you must attach your Azure Batch account to the workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "batch_compute_name = 'mybatchcompute' # Name to associate with new compute in workspace\n",
        "\n",
        "# Batch account details needed to attach as compute to workspace\n",
        "batch_account_name = \"<batch_account_name>\" # Name of the Batch account\n",
        "batch_resource_group = \"<batch_resource_group>\" # Name of the resource group which contains this account\n",
        "\n",
        "try:\n",
        "    # check if already attached\n",
        "    batch_compute = BatchCompute(ws, batch_compute_name)\n",
        "except ComputeTargetException:\n",
        "    print('Attaching Batch compute...')\n",
        "    provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, \n",
        "                                                            account_name=batch_account_name)\n",
        "    batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n",
        "    batch_compute.wait_for_completion()\n",
        "    print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n",
        "    print(\"Provisioning errors:{}\".format(batch_compute.provisioning_errors))\n",
        "\n",
        "print(\"Using Batch compute:{}\".format(batch_compute.cluster_resource_id))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup Datastore"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Setting up the Blob storage associated with the workspace.  \n",
        "The following call retrieves the Azure Blob Store associated with your workspace.  \n",
        "Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is**.  \n",
        "  \n",
        "If you want to register another Datastore, please follow the instructions from here:\n",
        "https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data#register-a-datastore"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "datastore = Datastore(ws, \"workspaceblobstore\")\n",
        "\n",
        "print('Datastore details:')\n",
        "print('Datastore Account Name: ' + datastore.account_name)\n",
        "print('Datastore Workspace Name: ' + datastore.workspace.name)\n",
        "print('Datastore Container Name: ' + datastore.container_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup Input and Output"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For this example we will upload a file in the provided Datastore. These are some helper methods to achieve that."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "def create_local_file(content, file_name):\n",
        "    # create a file in a local temporary directory\n",
        "    temp_dir = mkdtemp()\n",
        "    with open(path.join(temp_dir, file_name), 'w') as f:\n",
        "        f.write(content)\n",
        "    return temp_dir\n",
        "\n",
        "\n",
        "def upload_file_to_datastore(datastore, file_name, content):\n",
        "    src_dir = create_local_file(content=content, file_name=file_name)\n",
        "    datastore.upload(src_dir=src_dir, overwrite=True, show_progress=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Here we associate the input DataReference with an existing file in the provided Datastore. Feel free to upload the file of your choice manually or use the *upload_file_to_datastore* method. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "file_name=\"input.txt\"\n",
        "\n",
        "upload_file_to_datastore(datastore=datastore, \n",
        "                         file_name=file_name, \n",
        "                         content=\"this is the content of the file\")\n",
        "\n",
        "testdata = DataReference(datastore=datastore, \n",
        "                         path_on_datastore=file_name, \n",
        "                         data_reference_name=\"input\")\n",
        "\n",
        "outputdata = PipelineData(name=\"output\", datastore=datastore)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup AzureBatch Job Binaries"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "AzureBatch can run a task within the job and here we put a simple .cmd file to be executed. Feel free to put any binaries in the folder, or modify the .cmd file as needed, they will be uploaded once we create the AzureBatch Step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "binaries_folder = \"azurebatch/job_binaries\"\n",
        "if not os.path.isdir(binaries_folder):\n",
        "    os.mkdir(binaries_folder)\n",
        "\n",
        "file_name=\"azurebatch.cmd\"\n",
        "with open(path.join(binaries_folder, file_name), 'w') as f:\n",
        "    f.write(\"copy \\\"%1\\\" \\\"%2\\\"\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create an AzureBatchStep"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "AzureBatchStep is used to submit a job to the attached Azure Batch compute.\n",
        "- **name:** Name of the step\n",
        "- **pool_id:** Name of the pool, it can be an existing pool, or one that will be created when the job is submitted\n",
        "- **inputs:** List of inputs that will be processed by the job\n",
        "- **outputs:** List of outputs the job will create\n",
        "- **executable:** The executable that will run as part of the job\n",
        "- **arguments:** Arguments for the executable. They can be plain string format, inputs, outputs or parameters\n",
        "- **compute_target:** The compute target where the job will run.\n",
        "- **source_directory:** The local directory with binaries to be executed by the job\n",
        "\n",
        "Optional parameters:\n",
        "\n",
        "- **create_pool:** Boolean flag to indicate whether create the pool before running the jobs\n",
        "- **delete_batch_job_after_finish:** Boolean flag to indicate whether to delete the job from Batch account after it's finished\n",
        "- **delete_batch_pool_after_finish:** Boolean flag to indicate whether to delete the pool after the job finishes\n",
        "- **is_positive_exit_code_failure:** Boolean flag to indicate if the job fails if the task exists with a positive code\n",
        "- **vm_image_urn:** If create_pool is true and VM uses VirtualMachineConfiguration.  \n",
        " Value format: 'urn:publisher:offer:sku'.  \n",
        " Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter  \n",
        " For more details:  \n",
        " https://docs.microsoft.com/en-us/azure/virtual-machines/windows/cli-ps-findimage#table-of-commonly-used-windows-images and  \n",
        " https://docs.microsoft.com/en-us/azure/virtual-machines/linux/cli-ps-findimage#find-specific-images\n",
        "- **run_task_as_admin:** Boolean flag to indicate if the task should run with Admin privileges\n",
        "- **target_compute_nodes:** Assumes create_pool is true, indicates how many compute nodes will be added to the pool\n",
        "- **source_directory:** Local folder that contains the module binaries, executable, assemblies etc.\n",
        "- **executable:** Name of the command/executable that will be executed as part of the job\n",
        "- **arguments:** Arguments for the command/executable\n",
        "- **inputs:** List of input port bindings\n",
        "- **outputs:** List of output port bindings\n",
        "- **vm_size:** If create_pool is true, indicating Virtual machine size of the compute nodes\n",
        "- **compute_target:** BatchCompute compute\n",
        "- **allow_reuse:** Whether the module should reuse previous results when run with the same settings/inputs\n",
        "- **version:** A version tag to denote a change in functionality for the module"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "step = AzureBatchStep(\n",
        "            name=\"Azure Batch Job\",\n",
        "            pool_id=\"MyPoolName\", # Replace this with the pool name of your choice\n",
        "            inputs=[testdata],\n",
        "            outputs=[outputdata],\n",
        "            executable=\"azurebatch.cmd\",\n",
        "            arguments=[testdata, outputdata],\n",
        "            compute_target=batch_compute,\n",
        "            source_directory=binaries_folder,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Build and Submit the Pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline = Pipeline(workspace=ws, steps=[step])\n",
        "pipeline_run = Experiment(ws, 'azurebatch_experiment').submit(pipeline)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize the Running Pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "RunDetails(pipeline_run).show()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "diray"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.7"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}