MachineLearningNotebooks/tutorials/compute-instance-quickstarts/quickstart-azureml-in-10mins/quickstart-azureml-in-10mins.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/quickstart-ci/AzureMLin10mins.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "# Quickstart: Train and deploy a model in Azure Machine Learning in 10 minutes\n",
        "\n",
        "In this quickstart, learn how to get started with Azure Machine Learning. You'll train an image classification model using the [MNIST](https://docs.microsoft.com/azure/open-datasets/dataset-mnist) dataset.\n",
        "\n",
        "You'll learn how to:\n",
        "\n",
        "* Download a dataset and look at the data\n",
        "* Train an image classification model and log metrics using MLflow\n",
        "* Deploy the model to do real-time inference"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Import Data\n",
        "\n",
        "Before you train a model, you need to understand the data you're using to train it. In this section, learn how to:\n",
        "\n",
        "* Download the MNIST dataset\n",
        "* Display some sample images\n",
        "\n",
        "You'll use Azure Open Datasets to get the raw MNIST data files. [Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for better models. Each dataset has a corresponding class, `MNIST` in this case, to retrieve the data in different ways."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "from azureml.opendatasets import MNIST\n",
        "\n",
        "data_folder = os.path.join(os.getcwd(), \"/tmp/qs_data\")\n",
        "os.makedirs(data_folder, exist_ok=True)\n",
        "\n",
        "mnist_file_dataset = MNIST.get_file_dataset()\n",
        "mnist_file_dataset.download(data_folder, overwrite=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### Take a look at the data\n",
        "\n",
        "Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. \n",
        "\n",
        "Note this step requires a `load_data` function that's included in an `utils.py` file. This file is placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from utils import load_data\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import glob\n",
        "\n",
        "\n",
        "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n",
        "X_train = (\n",
        "    load_data(\n",
        "        glob.glob(\n",
        "            os.path.join(data_folder, \"**/train-images-idx3-ubyte.gz\"), recursive=True\n",
        "        )[0],\n",
        "        False,\n",
        "    )\n",
        "    / 255.0\n",
        ")\n",
        "X_test = (\n",
        "    load_data(\n",
        "        glob.glob(\n",
        "            os.path.join(data_folder, \"**/t10k-images-idx3-ubyte.gz\"), recursive=True\n",
        "        )[0],\n",
        "        False,\n",
        "    )\n",
        "    / 255.0\n",
        ")\n",
        "y_train = load_data(\n",
        "    glob.glob(\n",
        "        os.path.join(data_folder, \"**/train-labels-idx1-ubyte.gz\"), recursive=True\n",
        "    )[0],\n",
        "    True,\n",
        ").reshape(-1)\n",
        "y_test = load_data(\n",
        "    glob.glob(\n",
        "        os.path.join(data_folder, \"**/t10k-labels-idx1-ubyte.gz\"), recursive=True\n",
        "    )[0],\n",
        "    True,\n",
        ").reshape(-1)\n",
        "\n",
        "\n",
        "# now let's show some randomly chosen images from the traininng set.\n",
        "count = 0\n",
        "sample_size = 30\n",
        "plt.figure(figsize=(16, 6))\n",
        "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
        "    count = count + 1\n",
        "    plt.subplot(1, sample_size, count)\n",
        "    plt.axhline(\"\")\n",
        "    plt.axvline(\"\")\n",
        "    plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n",
        "    plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Train model and log metrics with MLflow\n",
        "\n",
        "You'll train the model using the code below. Note that you are using MLflow autologging to track metrics and log model artefacts.\n",
        "\n",
        "You'll be using the [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) classifier from the [SciKit Learn framework](https://scikit-learn.org/) to classify the data.\n",
        "\n",
        "**Note: The model training takes approximately 2 minutes to complete.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612966046970
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# create the model\n",
        "import mlflow\n",
        "import numpy as np\n",
        "from sklearn.linear_model import LogisticRegression\n",
        "from azureml.core import Workspace\n",
        "\n",
        "# connect to your workspace\n",
        "ws = Workspace.from_config()\n",
        "\n",
        "# create experiment and start logging to a new run in the experiment\n",
        "experiment_name = \"azure-ml-in10-mins-tutorial\"\n",
        "\n",
        "# set up MLflow to track the metrics\n",
        "mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())\n",
        "mlflow.set_experiment(experiment_name)\n",
        "mlflow.autolog()\n",
        "\n",
        "# set up the Logistic regression model\n",
        "reg = 0.5\n",
        "clf = LogisticRegression(\n",
        "    C=1.0 / reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42\n",
        ")\n",
        "\n",
        "# train the model\n",
        "with mlflow.start_run() as run:\n",
        "    clf.fit(X_train, y_train)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## View Experiment\n",
        "In the left-hand menu in Azure Machine Learning Studio, select __Jobs__ and then select your experiment (azure-ml-in10-mins-tutorial). An experiment is a grouping of many runs from a specified script or piece of code. Information for the run is stored under that experiment. If the name doesn't exist when you submit an experiment, if you select your run you will see various tabs containing metrics, logs, explanations, etc.\n",
        "\n",
        "## Version control your models with the model registry\n",
        "\n",
        "You can use model registration to store and version your models in your workspace. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. The code below registers and versions the model you trained above. Once you have executed the code cell below you will be able to see the model in the registry by selecting __Models__ in the left-hand menu in Azure Machine Learning Studio."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881042710
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# register the model\n",
        "model_uri = \"runs:/{}/model\".format(run.info.run_id)\n",
        "model = mlflow.register_model(model_uri, \"sklearn_mnist_model\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Deploy the model for real-time inference\n",
        "In this section you learn how to deploy a model so that an application can consume (inference) the model over REST.\n",
        "\n",
        "### Create deployment configuration\n",
        "The code cell gets a _curated environment_, which specifies all the dependencies required to host the model (for example, the packages like scikit-learn). Also, you create a _deployment configuration_, which specifies the amount of compute required to host the model. In this case, the compute will have 1CPU and 1GB memory."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881061728
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# create environment for the deploy\n",
        "from azureml.core.environment import Environment\n",
        "from azureml.core.conda_dependencies import CondaDependencies\n",
        "from azureml.core.webservice import AciWebservice\n",
        "\n",
        "# get a curated environment\n",
        "env = Environment.get(\n",
        "    workspace=ws, \n",
        "    name=\"AzureML-sklearn-1.0-ubuntu20.04-py38-cpu\",\n",
        "    version=1\n",
        ")\n",
        "env.inferencing_stack_version='latest'\n",
        "\n",
        "# create deployment config i.e. compute resources\n",
        "aciconfig = AciWebservice.deploy_configuration(\n",
        "    cpu_cores=1,\n",
        "    memory_gb=1,\n",
        "    tags={\"data\": \"MNIST\", \"method\": \"sklearn\"},\n",
        "    description=\"Predict MNIST with sklearn\",\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### Deploy model\n",
        "\n",
        "This next code cell deploys the model to Azure Container Instance (ACI).\n",
        "\n",
        "**Note: The deployment takes approximately 3 minutes to complete.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "%%time\n",
        "import uuid\n",
        "from azureml.core.model import InferenceConfig\n",
        "from azureml.core.environment import Environment\n",
        "from azureml.core.model import Model\n",
        "\n",
        "# get the registered model\n",
        "model = Model(ws, \"sklearn_mnist_model\")\n",
        "\n",
        "# create an inference config i.e. the scoring script and environment\n",
        "inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)\n",
        "\n",
        "# deploy the service\n",
        "service_name = \"sklearn-mnist-svc-\" + str(uuid.uuid4())[:4]\n",
        "service = Model.deploy(\n",
        "    workspace=ws,\n",
        "    name=service_name,\n",
        "    models=[model],\n",
        "    inference_config=inference_config,\n",
        "    deployment_config=aciconfig,\n",
        ")\n",
        "\n",
        "service.wait_for_deployment(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The [*scoring script*](score.py) file referenced in the code above can be found in the same folder as this notebook, and has two functions:\n",
        "\n",
        "1. an `init` function that executes once when the service starts - in this function you normally get the model from the registry and set global variables\n",
        "1. a `run(data)` function that executes each time a call is made to the service. In this function, you normally format the input data, run a prediction, and output the predicted result.\n",
        "\n",
        "### View Endpoint\n",
        "Once the model has been successfully deployed, you can view the endpoint by navigating to __Endpoints__ in the left-hand menu in Azure Machine Learning Studio. You will be able to see the state of the endpoint (healthy/unhealthy), logs, and consume (how applications can consume the model)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Test the model service\n",
        "\n",
        "You can test the model by sending a raw HTTP request to test the web service. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881538381
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# send raw HTTP request to test the web service.\n",
        "import requests\n",
        "\n",
        "# send a random row from the test set to score\n",
        "random_index = np.random.randint(0, len(X_test) - 1)\n",
        "input_data = '{\"data\": [' + str(list(X_test[random_index])) + \"]}\"\n",
        "\n",
        "headers = {\"Content-Type\": \"application/json\"}\n",
        "\n",
        "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
        "\n",
        "print(\"POST to url\", service.scoring_uri)\n",
        "print(\"label:\", y_test[random_index])\n",
        "print(\"prediction:\", resp.text)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Clean up resources\n",
        "\n",
        "If you're not going to continue to use this model, delete the Model service using:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881556520
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# if you want to keep workspace and only delete endpoint (it will incur cost while running)\n",
        "service.delete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If you want to control cost further, stop the compute instance by selecting the \"Stop compute\" button next to the **Compute** dropdown.  Then start the compute instance again the next time you need it."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n",
        "## Next Steps\n",
        "\n",
        "In this quickstart, you learned how to run machine learning code in Azure Machine Learning.\n",
        "\n",
        "Now that you have working code in a development environment, learn how to submit a **_job_** - ideally on a schedule or trigger (for example, arrival of new data).\n",
        "\n",
        " [**Learn how to get started with Azure ML Job Submission**](../quickstart-azureml-python-sdk/quickstart-azureml-python-sdk.ipynb) "
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "cewidste"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    },
    "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
    "nteract": {
      "version": "nteract-front-end@1.0.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}