MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Use MLflow with Azure Machine Learning for Remote Training Run\n",
        "\n",
        "This example shows you how to use MLflow tracking APIs together with Azure Machine Learning services for storing your metrics and artifacts, from local Notebook run. You'll learn how to:\n",
        "\n",
        " 1. Set up MLflow tracking URI so as to use Azure ML\n",
        " 2. Create experiment\n",
        " 3. Train a model on Machine Learning Compute while logging metrics and artifacts\n",
        " 4. View your experiment within your Azure ML Workspace in Azure Portal."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prerequisites\n",
        "\n",
        "Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Set-up\n",
        "\n",
        "Check Azure ML SDK version installed on your computer, and then connect to your Workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Check core SDK version number\n",
        "import azureml.core\n",
        "from azureml.core import Workspace, Experiment\n",
        "\n",
        "print(\"SDK version:\", azureml.core.VERSION)\n",
        "\n",
        "ws = Workspace.from_config()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's also create a Machine Learning Compute cluster for submitting the remote run. \n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your CPU cluster\n",
        "cluster_name = \"cpu-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
        "    print(\"Found existing cpu-cluster\")\n",
        "except ComputeTargetException:\n",
        "    print(\"Creating new cpu-cluster\")\n",
        "    \n",
        "    # Specify the configuration for the new cluster\n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
        "                                                           min_nodes=0,\n",
        "                                                           max_nodes=2)\n",
        "\n",
        "    # Create the cluster with the specified name and configuration\n",
        "    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
        "    \n",
        "    # Wait for the cluster to complete, show the output log\n",
        "    cpu_cluster.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create Azure ML Experiment"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The following steps show how to submit a training Python script to a cluster as an Azure ML run, while logging happens through MLflow APIs to your Azure ML Workspace. Let's first create an experiment to hold the training runs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Experiment\n",
        "\n",
        "experiment_name = \"RemoteTrain-with-mlflow-sample\"\n",
        "exp = Experiment(workspace=ws, name=experiment_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Instrument remote training script using MLflow"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's use [*train_diabetes.py*](train_diabetes.py) to train a regression model against diabetes dataset as the example. Note that the training script uses mlflow.start_run() to start logging, and then logs metrics, saves the trained scikit-learn model, and saves a plot as an artifact.\n",
        "\n",
        "Run following command to view the script file. Notice the mlflow logging statements, and also notice that the script doesn't have explicit dependencies on azureml library."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_script = 'train_diabetes.py'\n",
        "with open(training_script, 'r') as f:\n",
        "    print(f.read())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit Run to Cluster \n",
        "\n",
        "Let's submit the run to cluster. When running on the remote cluster as submitted run, Azure ML sets the MLflow tracking URI to point to your Azure ML Workspace, so that the metrics and artifacts are automatically logged there.\n",
        "\n",
        "Note that you have to specify the packages your script depends on, including *azureml-mlflow* that implicitly enables the MLflow logging to Azure ML. \n",
        "\n",
        "First, create a environment with Docker enable and required package dependencies specified."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "mlflow"
        ]
      },
      "outputs": [],
      "source": [
        "from azureml.core import Environment\n",
        "from azureml.core.conda_dependencies import CondaDependencies\n",
        "\n",
        "env = Environment(name=\"mlflow-env\")\n",
        "\n",
        "# Specify conda dependencies with scikit-learn and temporary pointers to mlflow extensions\n",
        "cd = CondaDependencies.create(\n",
        "    pip_packages=[\"azureml-mlflow\", \"scikit-learn\", \"matplotlib\", \"pandas\", \"numpy\", \"protobuf==5.28.3\"]\n",
        "    )\n",
        "\n",
        "env.python.conda_dependencies = cd"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Next, specify a script run configuration that includes the training script, environment and CPU cluster created earlier."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import ScriptRunConfig\n",
        "\n",
        "src = ScriptRunConfig(source_directory=\".\",\n",
        "                      script=training_script,\n",
        "                      compute_target=cpu_cluster,\n",
        "                      environment=env)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, submit the run. Note that the first instance of the run typically takes longer as the Docker-based environment is created, several minutes. Subsequent runs reuse the image and are faster."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run = exp.submit(src)\n",
        "run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can navigate to your Azure ML Workspace at Azure Portal to view the run metrics and artifacts. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can also get the metrics and bring them to your local notebook, and view the details of the run."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run.get_metrics()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run.get_details()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Next steps\n",
        "\n",
        " * [Deploy the model as a web service](../deploy-model/deploy-model.ipynb)\n",
        " * [Learn more about Azure Machine Learning compute options](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "sanpil"
      }
    ],
    "category": "training",
    "compute": [
      "AML Compute"
    ],
    "datasets": [
      "Diabetes"
    ],
    "deployment": [
      "None"
    ],
    "exclude_from_index": false,
    "framework": [
      "None"
    ],
    "friendly_name": "Use MLflow with AML for a remote training run",
    "index_order": 8,
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.7"
    },
    "tags": [
      "None"
    ],
    "task": "Use MLflow tracking APIs together with AML for storing your metrics and artifacts"
  },
  "nbformat": 4,
  "nbformat_minor": 2
}