MachineLearningNotebooks/tutorials/compute-instance-quickstarts/quickstart-azureml-in-10mins/quickstart-azureml-in-10mins.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/quickstart-ci/AzureMLin10mins.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "# Quickstart: Train and deploy a model in Azure Machine Learning in 10 minutes\n",
        "\n",
        "In this quickstart, learn how to get started with Azure Machine Learning. You'll train an image classification model using the [MNIST](https://azure.microsoft.com/services/open-datasets/catalog/mnist/) dataset.\n",
        "\n",
        "You'll learn how to:\n",
        "\n",
        "> * Download a dataset and look at the data\n",
        "> * Train an image classification model and log metrics\n",
        "> * Deploy the model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Connect to your workspace and create an experiment\n",
        "\n",
        "Import some libraries and create an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all  users that have access to the workspace can collaborate on them."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612965916889
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core import Workspace\n",
        "from azureml.core import Experiment\n",
        "\n",
        "# connect to your workspace\n",
        "ws = Workspace.from_config()\n",
        "\n",
        "# create experiment and start logging to a new run in the experiment\n",
        "experiment_name = \"azure-ml-in10-mins-tutorial\"\n",
        "exp = Experiment(workspace=ws, name=experiment_name)\n",
        "run = exp.start_logging(snapshot_directory=None)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Import Data\n",
        "\n",
        "Before you train a model, you need to understand the data you're using to train it. In this section, learn how to:\n",
        "\n",
        "* Download the MNIST dataset\n",
        "* Display some sample images\n",
        "\n",
        "### Download the MNIST dataset\n",
        "\n",
        "You'll use Azure Open Datasets to get the raw MNIST data files. [Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for better models. Each dataset has a corresponding class, `MNIST` in this case, to retrieve the data in different ways."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612965922274
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from azureml.core import Dataset\n",
        "from azureml.opendatasets import MNIST\n",
        "\n",
        "data_folder = os.path.join(os.getcwd(), \"data\")\n",
        "os.makedirs(data_folder, exist_ok=True)\n",
        "\n",
        "mnist_file_dataset = MNIST.get_file_dataset()\n",
        "mnist_file_dataset.download(data_folder, overwrite=True)\n",
        "\n",
        "mnist_file_dataset = mnist_file_dataset.register(\n",
        "    workspace=ws,\n",
        "    name=\"mnist_opendataset\",\n",
        "    description=\"training and test dataset\",\n",
        "    create_new_version=True,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### Take a look at the data\n",
        "\n",
        "Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. \n",
        "\n",
        "Note this step requires a `load_data` function that's included in an `utils.py` file. This file is placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612965929041
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "from utils import load_data\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import glob\n",
        "\n",
        "\n",
        "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n",
        "X_train = (\n",
        "    load_data(\n",
        "        glob.glob(\n",
        "            os.path.join(data_folder, \"**/train-images-idx3-ubyte.gz\"), recursive=True\n",
        "        )[0],\n",
        "        False,\n",
        "    )\n",
        "    / 255.0\n",
        ")\n",
        "X_test = (\n",
        "    load_data(\n",
        "        glob.glob(\n",
        "            os.path.join(data_folder, \"**/t10k-images-idx3-ubyte.gz\"), recursive=True\n",
        "        )[0],\n",
        "        False,\n",
        "    )\n",
        "    / 255.0\n",
        ")\n",
        "y_train = load_data(\n",
        "    glob.glob(\n",
        "        os.path.join(data_folder, \"**/train-labels-idx1-ubyte.gz\"), recursive=True\n",
        "    )[0],\n",
        "    True,\n",
        ").reshape(-1)\n",
        "y_test = load_data(\n",
        "    glob.glob(\n",
        "        os.path.join(data_folder, \"**/t10k-labels-idx1-ubyte.gz\"), recursive=True\n",
        "    )[0],\n",
        "    True,\n",
        ").reshape(-1)\n",
        "\n",
        "\n",
        "# now let's show some randomly chosen images from the traininng set.\n",
        "count = 0\n",
        "sample_size = 30\n",
        "plt.figure(figsize=(16, 6))\n",
        "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
        "    count = count + 1\n",
        "    plt.subplot(1, sample_size, count)\n",
        "    plt.axhline(\"\")\n",
        "    plt.axvline(\"\")\n",
        "    plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n",
        "    plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Train model and log metrics\n",
        "\n",
        "You'll train the model using the code below. Your training runs and metrics will be registered in the experiment you created, so that this information is available after you've finished.\n",
        "\n",
        "You'll be using the [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) classifier from the [SciKit Learn framework](https://scikit-learn.org/) to classify the data.\n",
        "\n",
        "> **Note: The model training takes around 1 minute to complete.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612966046970
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# create the model\n",
        "import numpy as np\n",
        "from sklearn.linear_model import LogisticRegression\n",
        "\n",
        "reg = 0.5\n",
        "clf = LogisticRegression(\n",
        "    C=1.0 / reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42\n",
        ")\n",
        "clf.fit(X_train, y_train)\n",
        "\n",
        "# make predictions using the test set and calculate the accuracy\n",
        "y_hat = clf.predict(X_test)\n",
        "\n",
        "# calculate accuracy on the prediction\n",
        "acc = np.average(y_hat == y_test)\n",
        "print(\"Accuracy is\", acc)\n",
        "\n",
        "run.log(\"regularization rate\", np.float(reg))\n",
        "run.log(\"accuracy\", np.float(acc))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n",
        "## Version control your models with the model registry\n",
        "\n",
        "You can use model registration to store and version your models in your workspace. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. Azure Machine Learning supports any model that can be loaded through Python 3.\n",
        "\n",
        "The code below:\n",
        "\n",
        "1. Saves the model to disk\n",
        "1. Uploads the model file to the run \n",
        "1. Registers the uploaded model file\n",
        "1. Transitions the run to a completed state"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881042710
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "import joblib\n",
        "from azureml.core.model import Model\n",
        "\n",
        "path = \"sklearn_mnist_model.pkl\"\n",
        "joblib.dump(value=clf, filename=path)\n",
        "\n",
        "run.upload_file(name=path, path_or_stream=path)\n",
        "\n",
        "model = run.register_model(\n",
        "    model_name=\"sklearn_mnist_model\",\n",
        "    model_path=path,\n",
        "    description=\"Mnist handwriting recognition\",\n",
        ")\n",
        "\n",
        "run.complete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Deploy the model\n",
        "\n",
        "The next cell deploys the model to an Azure Container Instance so that you can score data in real-time (Azure Machine Learning also provides mechanisms to do batch scoring). A real-time endpoint allows application developers to integrate machine learning into their apps."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881061728
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# create environment for the deploy\n",
        "from azureml.core.environment import Environment\n",
        "from azureml.core.conda_dependencies import CondaDependencies\n",
        "\n",
        "# to install required packages\n",
        "env = Environment(\"quickstart-env\")\n",
        "cd = CondaDependencies.create(\n",
        "    pip_packages=[\"azureml-dataset-runtime[pandas,fuse]\", \"azureml-defaults\"],\n",
        "    conda_packages=[\"scikit-learn==0.22.1\"],\n",
        ")\n",
        "\n",
        "env.python.conda_dependencies = cd\n",
        "\n",
        "# Register environment to re-use later\n",
        "env.register(workspace=ws)\n",
        "\n",
        "# create config file\n",
        "from azureml.core.webservice import AciWebservice\n",
        "\n",
        "aciconfig = AciWebservice.deploy_configuration(\n",
        "    cpu_cores=1,\n",
        "    memory_gb=1,\n",
        "    tags={\"data\": \"MNIST\", \"method\": \"sklearn\"},\n",
        "    description=\"Predict MNIST with sklearn\",\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "> **Note: The deployment takes around 3 minutes to complete.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "%%time\n",
        "import uuid\n",
        "from azureml.core.webservice import Webservice\n",
        "from azureml.core.model import InferenceConfig\n",
        "from azureml.core.environment import Environment\n",
        "from azureml.core import Workspace\n",
        "from azureml.core.model import Model\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "model = Model(ws, \"sklearn_mnist_model\")\n",
        "\n",
        "\n",
        "myenv = Environment.get(workspace=ws, name=\"quickstart-env\", version=\"1\")\n",
        "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n",
        "\n",
        "service_name = \"sklearn-mnist-svc-\" + str(uuid.uuid4())[:4]\n",
        "service = Model.deploy(\n",
        "    workspace=ws,\n",
        "    name=service_name,\n",
        "    models=[model],\n",
        "    inference_config=inference_config,\n",
        "    deployment_config=aciconfig,\n",
        ")\n",
        "\n",
        "service.wait_for_deployment(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The [*scoring script*](score.py) file referenced in the code above can be found in the same folder as this notebook, and has two functions:\n",
        "\n",
        "1. an `init` function that executes once when the service starts - in this function you normally get the model from the registry and set global variables\n",
        "1. a `run(data)` function that executes each time a call is made to the service. In this function, you normally format the input data, run a prediction, and output the predicted result."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Test the model service\n",
        "\n",
        "You can test the model by sending a raw HTTP request to test the web service. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881527399
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# scoring web service HTTP endpoint\n",
        "print(service.scoring_uri)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881538381
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# send raw HTTP request to test the web service.\n",
        "import requests\n",
        "\n",
        "# send a random row from the test set to score\n",
        "random_index = np.random.randint(0, len(X_test) - 1)\n",
        "input_data = '{\"data\": [' + str(list(X_test[random_index])) + \"]}\"\n",
        "\n",
        "headers = {\"Content-Type\": \"application/json\"}\n",
        "\n",
        "# for AKS deployment you'd need to the service key in the header as well\n",
        "# api_key = service.get_key()\n",
        "# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)}\n",
        "\n",
        "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
        "\n",
        "print(\"POST to url\", service.scoring_uri)\n",
        "# print(\"input data:\", input_data)\n",
        "print(\"label:\", y_test[random_index])\n",
        "print(\"prediction:\", resp.text)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "\n",
        "### View the results of your training\n",
        "\n",
        "When you're finished with an experiment run, you can always return to view the results of your model training here in the Azure Machine Learning studio:\n",
        "\n",
        "1. Select **Experiments** (left-hand menu)\n",
        "1. Select **azure-ml-in10-mins-tutorial**\n",
        "1. Select **Run 1**\n",
        "1. Select the **Metrics** Tab\n",
        "\n",
        "The metrics tab will display the parameter values that were logged to the run."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### View the model in the model registry\n",
        "\n",
        "You can see the stored model by navigating to **Models** in the left-hand menu bar. Select the **sklearn_mnist_model** to see the details of the model, including the experiment run ID that created the model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Clean up resources\n",
        "\n",
        "If you're not going to continue to use this model, delete the Model service using:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612881556520
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# if you want to keep workspace and only delete endpoint (it will incur cost while running)\n",
        "service.delete()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If you want to control cost further, stop the compute instance by selecting the \"Stop compute\" button next to the **Compute** dropdown.  Then start the compute instance again the next time you need it."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n",
        "## Next Steps\n",
        "\n",
        "In this quickstart, you learned how to run machine learning code in Azure Machine Learning.\n",
        "\n",
        "Now that you have working code in a development environment, learn how to submit a **_job_** - ideally on a schedule or trigger (for example, arrival of new data).\n",
        "\n",
        " [**Learn how to get started with Azure ML Job Submission**](../quickstart-azureml-python-sdk/quickstart-azureml-python-sdk.ipynb) "
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "cewidste"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    },
    "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
    "nteract": {
      "version": "nteract-front-end@1.0.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}