MachineLearningNotebooks/tutorials/compute-instance-quickstarts/quickstart-azureml-python-sdk/quickstart-azureml-python-sdk.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/quickstart-ci/GettingStartedWithPythonSDK.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "# Quickstart: Learn how to submit batch jobs with the Azure Machine Learning Python SDK\n",
        "\n",
        "In this quickstart, you learn how to submit a batch training job using the Python SDK. In this example, we submit the job to the 'local' machine (the compute instance you are running this notebook on). However, you can use exactly the same method to submit the job to different compute targets (for example, AKS, Azure Machine Learning Compute Cluster, Synapse, etc) by changing a single line of code. A full list of support compute targets can be viewed [here](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target). \n",
        "\n",
        "This quickstart trains a simple logistic regression using the [MNIST](https://docs.microsoft.com/azure/open-datasets/dataset-mnist) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning.  MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n",
        "\n",
        "You will learn how to:\n",
        "\n",
        "> * Download a dataset and look at the data\n",
        "> * Train an image classification model by submitting a batch job to a compute resource\n",
        "> * Use MLflow autologging to track model metrics and log the model artefact\n",
        "> * Review training results, find and register the best model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### Connect to your workspace and create an experiment\n",
        "\n",
        "You start with importing some libraries and creating an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all the users that have access to the workspace can collaborate on them. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612965838618
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "\n",
        "from azureml.core import Workspace\n",
        "from azureml.core import Experiment\n",
        "\n",
        "# connect to your workspace\n",
        "ws = Workspace.from_config()\n",
        "\n",
        "experiment_name = \"get-started-with-jobsubmission-tutorial\"\n",
        "exp = Experiment(workspace=ws, name=experiment_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### The MNIST dataset\n",
        "\n",
        "Use Azure Open Datasets to get the raw MNIST data files. [Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for more accurate models. Each dataset has a corresponding class, `MNIST` in this case, to retrieve the data in different ways.\n",
        "\n",
        "Follow this [how-to](https://aka.ms/azureml/howto/createdatasets) if you want to learn more about Datasets and how to use them.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612965850391
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "from azureml.opendatasets import MNIST\n",
        "\n",
        "mnist_file_dataset = MNIST.get_file_dataset()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Define the Environment\n",
        "An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments. Here you will be using a curated environment that has already been made available through the workspace. \n",
        "\n",
        "Read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) if you want to learn more about Environments and how to use them."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612965877458
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core.environment import Environment\n",
        "\n",
        "# use a curated environment that has already been built for you\n",
        "\n",
        "env = Environment.get(workspace=ws, \n",
        "                      name=\"AzureML-sklearn-1.5\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### Configure the training job\n",
        "\n",
        "Create a [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig?view=azure-ml-py) object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. Configure the ScriptRunConfig by specifying:\n",
        "\n",
        "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n",
        "* The compute target.  In this case you will point to local compute\n",
        "* The training script name, train.py\n",
        "* An environment that contains the libraries needed to run the script\n",
        "* Arguments required from the training script. \n",
        "\n",
        "In this run we will be submitting to \"local\", which is the compute instance you are running this notebook. If you have another compute target (for example: AKS, Azure ML Compute Cluster, Azure Databricks, etc) then you just need to change the `compute_target` argument below. You can learn more about other compute targets [here](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets). "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612965882781
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core import ScriptRunConfig\n",
        "\n",
        "args = [\"--data-folder\", mnist_file_dataset.as_mount(), \"--regularization\", 0.5]\n",
        "\n",
        "src = ScriptRunConfig(\n",
        "    source_directory=\"src\",\n",
        "    script=\"train.py\",\n",
        "    arguments=args,\n",
        "    compute_target=\"local\",\n",
        "    environment=env,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### Submit the job\n",
        "\n",
        "Run the experiment by submitting the ScriptRunConfig object. After this there are many options for monitoring your run. Once submitted, you can either navigate to the experiment \"get-started-with-jobsubmission-tutorial\" in the left menu item __Jobs__ to monitor the run, or you can monitor the run inline as the `run.wait_for_completion(show_output=True)` will stream the logs of the run. You will see that the environment is built for you to ensure reproducibility - this adds a couple of minutes to the run time. On subsequent runs, the environment is re-used making the runtime shorter."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612965911435
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "run = exp.submit(config=src)\n",
        "run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Register model\n",
        "\n",
        "The training script used the MLflow autologging feature and therefore the model was captured and stored on your behalf. Below we register the model into the Azure Machine Learning Model registry, which lets you keep track of all the models in your Azure Machine Learning workspace.\n",
        "\n",
        "Models are identified by name and version. Each time you register a model with the same name as an existing one, the registry assumes that it's a new version. The version is incremented, and the new model is registered under the same name.\n",
        "\n",
        "When you register the model, you can provide additional metadata tags and then use the tags when you search for models."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1612966068862
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# register model\n",
        "model = run.register_model(\n",
        "    model_name=\"sklearn_mnist\", model_path=\"model/model.pkl\"\n",
        ")\n",
        "print(model.name, model.id, model.version, sep=\"\\t\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You will now be able to see the model in the regsitry by selecting __Models__ in the left-hand menu of the Azure Machine Learning Studio."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "## Control Cost\n",
        "\n",
        "If you want to control cost you can stop the compute instance this notebook is running on by clicking the \"Stop compute\" button next to the status dropdown in the menu above.\n",
        "\n",
        " ## Next Steps\n",
        "\n",
        "In this quickstart, you have seen how to run jobs-based machine learning code in Azure Machine Learning. \n",
        "\n",
        "It is also possible to use automated machine learning in Azure Machine Learning service to find the best model in an automated fashion. To see how this works, we recommend that you follow the next quickstart in this series, [**Fraud Classification using Automated ML**](../quickstart-azureml-automl/quickstart-azureml-automl.ipynb). This quickstart is focused on AutoML using the Python SDK."
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "cewidste"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    },
    "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
    "nteract": {
      "version": "nteract-front-end@1.0.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}