MachineLearningNotebooks/how-to-use-azureml/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved.\n",
    "\n",
    "Licensed under the MIT License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/train-deploy-pytorch/train-deploy-pytorch.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier\n",
    "\n",
    "This example shows you how to use MLflow together with Azure Machine Learning services for tracking the metrics and artifacts while training a PyTorch model to classify MNIST digit images, and then deploy the model  as a web service. You'll learn how to:\n",
    "\n",
    " 1. Set up MLflow tracking URI so as to use Azure ML\n",
    " 2. Create experiment\n",
    " 3. Instrument your model with MLflow tracking\n",
    " 4. Train a PyTorch model locally\n",
    " 5. Train a model on GPU compute on Azure\n",
    " 6. View your experiment within your Azure ML Workspace in Azure Portal\n",
    " 7. Create a Docker image from the trained model\n",
    " 8. Deploy the model as a web service on Azure Container Instance\n",
    " 9. Call the model to make predictions\n",
    " \n",
    "### Pre-requisites\n",
    " \n",
    "Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n",
    "\n",
    "Also, install mlflow-azureml package using ```pip install mlflow-azureml```. Note that mlflow-azureml installs mlflow package itself as a dependency, if you haven't done so previously.\n",
    "\n",
    "### Set-up\n",
    "\n",
    "Import packages and check versions of Azure ML SDK and MLflow installed on your computer. Then connect to your Workspace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys, os\n",
    "import mlflow\n",
    "import mlflow.azureml\n",
    "import mlflow.sklearn\n",
    "\n",
    "import azureml.core\n",
    "from azureml.core import Workspace\n",
    "\n",
    "\n",
    "print(\"SDK version:\", azureml.core.VERSION)\n",
    "print(\"MLflow version:\", mlflow.version.VERSION)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ws = Workspace.from_config()\n",
    "ws.get_details()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set tracking URI\n",
    "\n",
    "Set the MLFlow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLFlow APIs will go to Azure ML services and will be tracked under your Workspace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Experiment\n",
    "\n",
    "In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment_name = \"pytorch-with-mlflow\"\n",
    "mlflow.set_experiment(experiment_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train model locally while logging metrics and artifacts\n",
    "\n",
    "The ```scripts/train.py``` program contains the code to load the image dataset, and train and test the model. Within this program, the train.driver function wraps the end-to-end workflow.\n",
    "\n",
    "Within the driver, the ```mlflow.start_run``` starts MLflow tracking. Then, ```mlflow.log_metric``` functions are used to track the convergence of the neural network training iterations. Finally ```mlflow.pytorch.save_model``` is used to save the trained model in framework-aware manner.\n",
    "\n",
    "Let's add the program to search path, import it as a module, and then invoke the driver function. Note that the training can take few minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lib_path = os.path.abspath(\"scripts\")\n",
    "sys.path.append(lib_path)\n",
    "\n",
    "import train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run = train.driver()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can view the metrics of the run at Azure Portal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(azureml.mlflow.get_portal_url(run))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train model on GPU compute on Azure\n",
    "\n",
    "Next, let's run the same script on GPU-enabled compute for faster training. If you've completed the the [Configuration](../../../configuration.ipnyb) notebook, you should have a GPU cluster named \"gpu-cluster\" available in your workspace. Otherwise, follow the instructions in the notebook to create one. For simplicity, this example uses single process on single VM to train the model.\n",
    "\n",
    "Create a PyTorch estimator to specify the training configuration: script, compute as well as additional packages needed. To enable MLflow tracking, include ```azureml-mlflow``` as pip package. The low-level specifications for the training run are encapsulated in the estimator instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.train.dnn import PyTorch\n",
    "\n",
    "pt = PyTorch(source_directory=\"./scripts\", \n",
    "             entry_script = \"train.py\", \n",
    "             compute_target = \"gpu-cluster\", \n",
    "             node_count = 1, \n",
    "             process_count_per_node = 1, \n",
    "             use_gpu=True,\n",
    "             pip_packages = [\"azureml-mlflow\", \"Pillow==6.0.0\"])\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get a reference to the experiment you created previously, but this time, as Azure Machine Learning experiment object.\n",
    "\n",
    "Then, use ```Experiment.submit``` method to start the remote training run. Note that the first training run often takes longer as Azure Machine Learning service builds the Docker image for executing the script. Subsequent runs will be faster as cached image is used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core import Experiment\n",
    "\n",
    "exp = Experiment(ws, experiment_name)\n",
    "run = exp.submit(pt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can monitor the run and its metrics on Azure Portal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also, you can wait for run to complete."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run.wait_for_completion(show_output=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deploy model as web service\n",
    "\n",
    "To deploy a web service, first create a Docker image, and then deploy that Docker image on inferencing compute.\n",
    "\n",
    "The ```mlflow.azureml.build_image``` function builds a Docker image from saved PyTorch model in a framework-aware manner. It automatically creates the PyTorch-specific inferencing wrapper code and specififies package dependencies for you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run.get_file_names()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then build a docker image using *runs:/&lt;run.id&gt;/model* as the model_uri path.\n",
    "\n",
    "Note that the image building can take several minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_path = \"model\"\n",
    "\n",
    "\n",
    "azure_image, azure_model = mlflow.azureml.build_image(model_uri='runs:/{}/{}'.format(run.id, model_path),\n",
    "                                                      workspace=ws,\n",
    "                                                      model_name='pytorch_mnist',\n",
    "                                                      image_name='pytorch-mnist-img',\n",
    "                                                      synchronous=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, deploy the Docker image to Azure Container Instance: a serverless compute capable of running a single container. You can tag and add descriptions to help keep track of your web service. \n",
    "\n",
    "[Other inferencing compute choices](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where) include Azure Kubernetes Service which provides scalable endpoint suitable for production use.\n",
    "\n",
    "Note that the service deployment can take several minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core.webservice import AciWebservice, Webservice\n",
    "\n",
    "aci_config = AciWebservice.deploy_configuration(cpu_cores=2, \n",
    "                                                memory_gb=5, \n",
    "                                                tags={\"data\": \"MNIST\",  \"method\" : \"pytorch\"}, \n",
    "                                                description=\"Predict using webservice\")\n",
    "\n",
    "\n",
    "# Deploy the image to Azure Container Instances (ACI) for real-time serving\n",
    "webservice = Webservice.deploy_from_image(\n",
    "    image=azure_image, workspace=ws, name=\"pytorch-mnist-1\", deployment_config=aci_config)\n",
    "\n",
    "\n",
    "webservice.wait_for_deployment()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the deployment has completed you can check the scoring URI of the web service."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Scoring URI is: {}\".format(webservice.scoring_uri))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case of a service creation issue, you can use ```webservice.get_logs()``` to get logs to debug."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make predictions using web service\n",
    "\n",
    "To make the web service, create a test data set as normalized PyTorch tensors. \n",
    "\n",
    "Then, let's define a utility function that takes a random image and converts it into format and shape suitable for as input to PyTorch inferencing end-point. The conversion is done by: \n",
    "\n",
    " 1. Select a random (image, label) tuple\n",
    " 2. Take the image and converting the tensor to NumPy array \n",
    " 3. Reshape array into 1 x 1 x N array\n",
    "    * 1 image in batch, 1 color channel, N = 784 pixels for MNIST images\n",
    "    * Note also ```x = x.view(-1, 1, 28, 28)``` in net definition in ```train.py``` program to shape incoming scoring requests.\n",
    " 4. Convert the NumPy array to list to make it into a built-in type.\n",
    " 5. Create a dictionary {\"data\", &lt;list&gt;} that can be converted to JSON string for web service requests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torchvision import datasets, transforms\n",
    "import random\n",
    "import numpy as np\n",
    "\n",
    "test_data = datasets.MNIST('../data', train=False, transform=transforms.Compose([\n",
    "                       transforms.ToTensor(),\n",
    "                       transforms.Normalize((0.1307,), (0.3081,))]))\n",
    "\n",
    "\n",
    "def get_random_image():\n",
    "    image_idx = random.randint(0,len(test_data))\n",
    "    image_as_tensor = test_data[image_idx][0]\n",
    "    return {\"data\": elem for elem in image_as_tensor.numpy().reshape(1,1,-1).tolist()}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, invoke the web service using a random test image. Convert the dictionary containing the image to JSON string before passing it to web service.\n",
    "\n",
    "The response contains the raw scores for each label, with greater value indicating higher probability. Sort the labels and select the one with greatest score to get the prediction. Let's also plot the image sent to web service for comparison purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import json\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "test_image = get_random_image()\n",
    "\n",
    "response = webservice.run(json.dumps(test_image))\n",
    "\n",
    "response = sorted(response[0].items(), key = lambda x: x[1], reverse = True)\n",
    "\n",
    "\n",
    "print(\"Predicted label:\", response[0][0])\n",
    "plt.imshow(np.array(test_image[\"data\"]).reshape(28,28), cmap = \"gray\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also call the web service using a raw POST method against the web service"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "response = requests.post(url=webservice.scoring_uri, data=json.dumps(test_image),headers={\"Content-type\": \"application/json\"})\n",
    "print(response.text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "authors": [
   {
    "name": "roastala"
   }
  ],
  "celltoolbar": "Edit Metadata",
  "kernelspec": {
   "display_name": "Python 3.6",
   "language": "python",
   "name": "python36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "name": "mlflow-sparksummit-pytorch",
  "notebookId": 2495374963457641
 },
 "nbformat": 4,
 "nbformat_minor": 1
}