MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-local/train-local.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/train-local/train-local.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Use MLflow with Azure Machine Learning for Local Training Run\n",
        "\n",
        "This example shows you how to use mlflow tracking APIs together with Azure Machine Learning services for storing your metrics and artifacts, from local Notebook run. You'll learn how to:\n",
        "\n",
        " 1. Set up MLflow tracking URI so as to use Azure ML\n",
        " 2. Create experiment\n",
        " 3. Train a model on your local computer while logging metrics and artifacts\n",
        " 4. View your experiment within your Azure ML Workspace in Azure Portal.\n",
        "\n",
        "## Prerequisites and Set-up\n",
        "\n",
        "Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n",
        "\n",
        "Install azureml-mlflow package before running this notebook. Note that mlflow itself gets installed as dependency if you haven't installed it yet.\n",
        "\n",
        "```\n",
        "pip install azureml-mlflow\n",
        "```\n",
        "\n",
        "This example also uses scikit-learn and matplotlib packages. Install them:\n",
        "```\n",
        "pip install scikit-learn matplotlib\n",
        "```\n",
        "\n",
        "Then, import necessary packages"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import mlflow\n",
        "import mlflow.sklearn\n",
        "import azureml.core\n",
        "from azureml.core import Workspace\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "# Check core SDK version number\n",
        "print(\"SDK version:\", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Set tracking URI\n",
        "\n",
        "Set the MLflow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLflow APIs will go to Azure ML services and will be tracked under your Workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create Experiment\n",
        "\n",
        "In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "experiment_name = \"LocalTrain-with-mlflow-sample\"\n",
        "mlflow.set_experiment(experiment_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create training and test data set\n",
        "\n",
        "This example uses diabetes dataset to build a simple regression model. Let's load the dataset and split it into training and test sets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "from sklearn.datasets import load_diabetes\n",
        "from sklearn.linear_model import Ridge\n",
        "from sklearn.metrics import mean_squared_error\n",
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "X, y = load_diabetes(return_X_y = True)\n",
        "columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n",
        "data = {\n",
        "    \"train\":{\"X\": X_train, \"y\": y_train},        \n",
        "    \"test\":{\"X\": X_test, \"y\": y_test}\n",
        "}\n",
        "\n",
        "print (\"Data contains\", len(data['train']['X']), \"training samples and\",len(data['test']['X']), \"test samples\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Train while logging metrics and artifacts\n",
        "\n",
        "Next, start a mlflow run to train a scikit-learn regression model. Note that the training script has been instrumented using MLflow to:\n",
        " * Log model hyperparameter alpha value\n",
        " * Log mean squared error against test set\n",
        " * Save the scikit-learn based regression model produced by training\n",
        " * Save an image that shows actuals vs predictions against test set.\n",
        " \n",
        "These metrics and artifacts have been recorded to your Azure ML Workspace; in the next step you'll learn how to view them."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Create a run object in the experiment\n",
        "model_save_path = \"model\"\n",
        "\n",
        "with mlflow.start_run() as run:\n",
        "    # Log the algorithm parameter alpha to the run\n",
        "    mlflow.log_metric('alpha', 0.03)\n",
        "    # Create, fit, and test the scikit-learn Ridge regression model\n",
        "    regression_model = Ridge(alpha=0.03)\n",
        "    regression_model.fit(data['train']['X'], data['train']['y'])\n",
        "    preds = regression_model.predict(data['test']['X'])\n",
        "\n",
        "    # Log mean squared error\n",
        "    print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))\n",
        "    mlflow.log_metric('mse', mean_squared_error(data['test']['y'], preds))\n",
        "    \n",
        "    # Save the model to the outputs directory for capture\n",
        "    mlflow.sklearn.log_model(regression_model,model_save_path)\n",
        "    \n",
        "    # Plot actuals vs predictions and save the plot within the run\n",
        "    fig = plt.figure(1)\n",
        "    idx = np.argsort(data['test']['y'])\n",
        "    plt.plot(data['test']['y'][idx],preds[idx])\n",
        "    fig.savefig(\"actuals_vs_predictions.png\")\n",
        "    mlflow.log_artifact(\"actuals_vs_predictions.png\") "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can open the report page for your experiment and runs within it from Azure Portal.\n",
        "\n",
        "Select one of the runs to view the metrics, and the plot you saved. The saved scikit-learn model appears under **outputs** tab."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws.experiments[experiment_name]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Next steps\n",
        "\n",
        "Try out these notebooks to learn more about MLflow-Azure Machine Learning integration:\n",
        "\n",
        " * [Train a model using remote compute on Azure Cloud](../train-on-remote/train-on-remote.ipynb)\n",
        " * [Deploy the model as a web service](../deploy-model/deploy-model.ipynb)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "sanpil"
      }
    ],
    "category": "training",
    "compute": [
      "Local"
    ],
    "datasets": [
      "Diabetes"
    ],
    "deployment": [
      "None"
    ],
    "exclude_from_index": false,
    "framework": [
      "None"
    ],
    "friendly_name": "Use MLflow with AML for a local training run",
    "index_order": 7,
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.4"
    },
    "tags": [
      "None"
    ],
    "task": "Use MLflow tracking APIs together with Azure Machine Learning for storing your metrics and artifacts"
  },
  "nbformat": 4,
  "nbformat_minor": 2
}