MachineLearningNotebooks/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Enabling Data Collection for Models in Production\n",
        "With this notebook, you can learn how to collect input model data from your Azure Machine Learning service in an Azure Blob storage. Once enabled, this data collected gives you the opportunity:\n",
        "\n",
        "* Monitor data drifts as production data enters your model\n",
        "* Make better decisions on when to retrain or optimize your model\n",
        "* Retrain your model with the data collected\n",
        "\n",
        "## What data is collected?\n",
        "* Model input data (voice, images, and video are not supported) from services deployed in Azure Kubernetes Cluster (AKS)\n",
        "* Model predictions using production input data.\n",
        "\n",
        "**Note:** pre-aggregation or pre-calculations on this data are done by user and not included in this version of the product.\n",
        "\n",
        "## What is different compared to standard production deployment process?\n",
        "1. Update scoring file.\n",
        "2. Update yml file with new dependency.\n",
        "3. Update aks configuration.\n",
        "4. Build new image and deploy it. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1. Import your dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "from azureml.core.compute import AksCompute, ComputeTarget\n",
        "from azureml.core.webservice import Webservice, AksWebservice\n",
        "import azureml.core\n",
        "print(azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2. Set up your configuration and create a workspace"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3. Register Model\n",
        "Register an existing trained model, add descirption and tags."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#Register the model\n",
        "from azureml.core.model import Model\n",
        "model = Model.register(model_path = \"sklearn_regression_model.pkl\", # this points to a local file\n",
        "                       model_name = \"sklearn_regression_model.pkl\", # this is the name the model is registered as\n",
        "                       tags = {'area': \"diabetes\", 'type': \"regression\"},\n",
        "                       description = \"Ridge regression model to predict diabetes\",\n",
        "                       workspace = ws)\n",
        "\n",
        "print(model.name, model.description, model.version)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4. *Update your scoring file with Data Collection*\n",
        "The file below, compared to the file used in notebook 11, has the following changes:\n",
        "### a. Import the module\n",
        "```python \n",
        "from azureml.monitoring import ModelDataCollector```\n",
        "### b. In your init function add:\n",
        "```python \n",
        "global inputs_dc, prediction_d\n",
        "inputs_dc = ModelDataCollector(\"best_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\", \"feat3\", \"feat4\", \"feat5\", \"Feat6\"])\n",
        "prediction_dc = ModelDataCollector(\"best_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"])```\n",
        "    \n",
        "* Identifier: Identifier is later used for building the folder structure in your Blob, it can be used to divide \"raw\" data versus \"processed\".\n",
        "* CorrelationId: is an optional parameter, you do not need to set it up if your model doesn't require it. Having a correlationId in place does help you for easier mapping with other data. (Examples include: LoanNumber, CustomerId, etc.)\n",
        "* Feature Names: These need to be set up in the order of your features in order for them to have column names when the .csv is created.\n",
        "\n",
        "### c. In your run function add:\n",
        "```python\n",
        "inputs_dc.collect(data)\n",
        "prediction_dc.collect(result)```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%writefile score.py\n",
        "import pickle\n",
        "import json\n",
        "import numpy \n",
        "from sklearn.externals import joblib\n",
        "from sklearn.linear_model import Ridge\n",
        "from azureml.core.model import Model\n",
        "from azureml.monitoring import ModelDataCollector\n",
        "import time\n",
        "\n",
        "def init():\n",
        "    global model\n",
        "    print (\"model initialized\" + time.strftime(\"%H:%M:%S\"))\n",
        "    # note here \"sklearn_regression_model.pkl\" is the name of the model registered under the workspace\n",
        "    # this call should return the path to the model.pkl file on the local disk.\n",
        "    model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')\n",
        "    # deserialize the model file back into a sklearn model\n",
        "    model = joblib.load(model_path)\n",
        "    global inputs_dc, prediction_dc\n",
        "    # this setup will help us save our inputs under the \"inputs\" path in our Azure Blob\n",
        "    inputs_dc = ModelDataCollector(model_name=\"sklearn_regression_model\", identifier=\"inputs\", feature_names=[\"feat1\", \"feat2\"]) \n",
        "    # this setup will help us save our ipredictions under the \"predictions\" path in our Azure Blob\n",
        "    prediction_dc = ModelDataCollector(\"sklearn_regression_model\", identifier=\"predictions\", feature_names=[\"prediction1\", \"prediction2\"]) \n",
        "  \n",
        "# note you can pass in multiple rows for scoring\n",
        "def run(raw_data):\n",
        "    global inputs_dc, prediction_dc\n",
        "    try:\n",
        "        data = json.loads(raw_data)['data']\n",
        "        data = numpy.array(data)\n",
        "        result = model.predict(data)\n",
        "        print (\"saving input data\" + time.strftime(\"%H:%M:%S\"))\n",
        "        inputs_dc.collect(data) #this call is saving our input data into our blob\n",
        "        prediction_dc.collect(result)#this call is saving our prediction data into our blob\n",
        "        print (\"saving prediction data\" + time.strftime(\"%H:%M:%S\"))\n",
        "        # you can return any data type as long as it is JSON-serializable\n",
        "        return result.tolist()\n",
        "    except Exception as e:\n",
        "        error = str(e)\n",
        "        print (error + time.strftime(\"%H:%M:%S\"))\n",
        "        return error"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5. *Update your myenv.yml file with the required module*"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.conda_dependencies import CondaDependencies \n",
        "\n",
        "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
        "myenv.add_pip_package(\"azureml-monitoring\")\n",
        "\n",
        "with open(\"myenv.yml\",\"w\") as f:\n",
        "    f.write(myenv.serialize_to_string())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 6. Create your new Image"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.image import ContainerImage\n",
        "\n",
        "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
        "                                                  runtime = \"python\",\n",
        "                                                  conda_file = \"myenv.yml\",\n",
        "                                                  description = \"Image with ridge regression model\",\n",
        "                                                  tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
        "                                                 )\n",
        "\n",
        "image = ContainerImage.create(name = \"myimage1\",\n",
        "                              # this is the model object\n",
        "                              models = [model],\n",
        "                              image_config = image_config,\n",
        "                              workspace = ws)\n",
        "\n",
        "image.wait_for_creation(show_output = True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(model.name, model.description, model.version)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 7. Deploy to AKS service"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create AKS compute if you haven't done so."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Use the default configuration (can also provide parameters to customize)\n",
        "prov_config = AksCompute.provisioning_configuration()\n",
        "\n",
        "aks_name = 'my-aks-test1' \n",
        "# Create the cluster\n",
        "aks_target = ComputeTarget.create(workspace = ws, \n",
        "                                  name = aks_name, \n",
        "                                  provisioning_configuration = prov_config)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%time\n",
        "aks_target.wait_for_completion(show_output = True)\n",
        "print(aks_target.provisioning_state)\n",
        "print(aks_target.provisioning_errors)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If you already have a cluster you can attach the service to it:"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "```python \n",
        "    %%time\n",
        "    resource_id = '/subscriptions/<subscriptionid>/resourcegroups/<resourcegroupname>/providers/Microsoft.ContainerService/managedClusters/<aksservername>'\n",
        "    create_name= 'myaks4'\n",
        "    attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
        "    aks_target = ComputeTarget.attach(workspace = ws, \n",
        "                                      name = create_name, \n",
        "                                      attach_configuration=attach_config)\n",
        "    ## Wait for the operation to complete\n",
        "    aks_target.wait_for_provisioning(True)```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### a. *Activate Data Collection and App Insights through updating AKS Webservice configuration*\n",
        "In order to enable Data Collection and App Insights in your service you will need to update your AKS configuration file:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#Set the web service configuration\n",
        "aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### b. Deploy your service"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "if aks_target.provisioning_state== \"Succeeded\": \n",
        "    aks_service_name ='aks-w-dc0'\n",
        "    aks_service = Webservice.deploy_from_image(workspace = ws, \n",
        "                                               name = aks_service_name,\n",
        "                                               image = image,\n",
        "                                               deployment_config = aks_config,\n",
        "                                               deployment_target = aks_target\n",
        "                                               )\n",
        "    aks_service.wait_for_deployment(show_output = True)\n",
        "    print(aks_service.state)\n",
        "else: \n",
        "    raise ValueError(\"aks provisioning failed, can't deploy service. Error: \", aks_service.error)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 8. Test your service and send some data\n",
        "**Note**: It will take around 15 mins for your data to appear in your blob.\n",
        "The data will appear in your Azure Blob following this format:\n",
        "\n",
        "/modeldata/subscriptionid/resourcegroupname/workspacename/webservicename/modelname/modelversion/identifier/year/month/day/data.csv "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%time\n",
        "import json\n",
        "\n",
        "test_sample = json.dumps({'data': [\n",
        "    [1,2,3,4,54,6,7,8,88,10], \n",
        "    [10,9,8,37,36,45,4,33,2,1]\n",
        "]})\n",
        "test_sample = bytes(test_sample,encoding = 'utf8')\n",
        "\n",
        "if aks_service.state == \"Healthy\":\n",
        "    prediction = aks_service.run(input_data=test_sample)\n",
        "    print(prediction)\n",
        "else:\n",
        "    raise ValueError(\"Service deployment isn't healthy, can't call the service. Error: \", aks_service.error)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 9. Validate you data and analyze it\n",
        "You can look into your data following this path format in your Azure Blob (it takes up to 15 minutes for the data to appear):\n",
        "\n",
        "/modeldata/**subscriptionid>**/**resourcegroupname>**/**workspacename>**/**webservicename>**/**modelname>**/**modelversion>>**/**identifier>**/*year/month/day*/data.csv \n",
        "\n",
        "For doing further analysis you have multiple options:"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### a. Create DataBricks cluter and connect it to your blob\n",
        "https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal or in your databricks workspace you can look for the template \"Azure Blob Storage Import Example Notebook\".\n",
        "\n",
        "\n",
        "Here is an example for setting up the file location to extract the relevant data:\n",
        "\n",
        "<code> file_location = \"wasbs://mycontainer@storageaccountname.blob.core.windows.net/unknown/unknown/unknown-bigdataset-unknown/my_iterate_parking_inputs/2018/&deg;/&deg;/data.csv\" \n",
        "file_type = \"csv\"</code>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### b. Connect Blob to Power Bi (Small Data only)\n",
        "1. Download and Open PowerBi Desktop\n",
        "2. Select \"Get Data\" and click on \"Azure Blob Storage\" >> Connect\n",
        "3. Add your storage account and enter your storage key.\n",
        "4. Select the container where your Data Collection is stored and click on Edit. \n",
        "5. In the query editor, click under \"Name\" column and add your Storage account Model path into the filter. Note: if you want to only look into files from a specific year or month, just expand the filter path. For example, just look into March data: /modeldata/subscriptionid>/resourcegroupname>/workspacename>/webservicename>/modelname>/modelversion>/identifier>/year>/3\n",
        "6. Click on the double arrow aside the \"Content\" column to combine the files. \n",
        "7. Click OK and the data will preload.\n",
        "8. You can now click Close and Apply and start building your custom reports on your Model Input data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Disable Data Collection"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "aks_service.update(collect_model_data=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Clean up"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "%%time\n",
        "aks_service.delete()\n",
        "image.delete()\n",
        "model.delete()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "shipatel"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}