mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 09:37:04 -05:00
380 lines
13 KiB
Plaintext
380 lines
13 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier\n",
|
|
"\n",
|
|
"This example shows you how to use MLflow together with Azure Machine Learning services for tracking the metrics and artifacts while training a PyTorch model to classify MNIST digit images and deploy the model as a web service. You'll learn how to:\n",
|
|
"\n",
|
|
" 1. Set up MLflow tracking URI so as to use Azure ML\n",
|
|
" 2. Create experiment\n",
|
|
" 3. Instrument your model with MLflow tracking\n",
|
|
" 4. Train a PyTorch model locally\n",
|
|
" 5. Train a model on GPU compute on Azure\n",
|
|
" 6. View your experiment within your Azure ML Workspace in Azure Portal\n",
|
|
" 7. Deploy the model as a web service on Azure Container Instance\n",
|
|
" 8. Call the model to make predictions\n",
|
|
" \n",
|
|
"## Pre-requisites\n",
|
|
" \n",
|
|
"If you are using a Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n",
|
|
"\n",
|
|
"Install PyTorch, this notebook has been tested with torch==1.4\n",
|
|
"\n",
|
|
"Also, install azureml-mlflow package using ```pip install azureml-mlflow```. Note that azureml-mlflow installs mlflow package itself as a dependency if you haven't done so previously.\n",
|
|
"\n",
|
|
"## Set-up\n",
|
|
"\n",
|
|
"Import packages and check versions of Azure ML SDK and MLflow installed on your computer. Then connect to your Workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import sys, os\n",
|
|
"import mlflow\n",
|
|
"import mlflow.azureml\n",
|
|
"\n",
|
|
"import azureml.core\n",
|
|
"from azureml.core import Workspace\n",
|
|
"\n",
|
|
"\n",
|
|
"print(\"SDK version:\", azureml.core.VERSION)\n",
|
|
"print(\"MLflow version:\", mlflow.version.VERSION)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ws = Workspace.from_config()\n",
|
|
"ws.get_details()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set tracking URI\n",
|
|
"\n",
|
|
"Set the MLflow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLflow APIs will go to Azure ML services and will be tracked under your Workspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create Experiment\n",
|
|
"\n",
|
|
"In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"experiment_name = \"pytorch-with-mlflow\"\n",
|
|
"mlflow.set_experiment(experiment_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train model locally while logging metrics and artifacts\n",
|
|
"\n",
|
|
"The ```scripts/train.py``` program contains the code to load the image dataset, train and test the model. Within this program, the train.driver function wraps the end-to-end workflow.\n",
|
|
"\n",
|
|
"Within the driver, the ```mlflow.start_run``` starts MLflow tracking. Then, ```mlflow.log_metric``` functions are used to track the convergence of the neural network training iterations. Finally ```mlflow.pytorch.save_model``` is used to save the trained model in framework-aware manner.\n",
|
|
"\n",
|
|
"Let's add the program to search path, import it as a module and invoke the driver function. Note that the training can take few minutes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"lib_path = os.path.abspath(\"scripts\")\n",
|
|
"sys.path.append(lib_path)\n",
|
|
"\n",
|
|
"import train"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"run = train.driver()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Train model on GPU compute on Azure\n",
|
|
"\n",
|
|
"Next, let's run the same script on GPU-enabled compute for faster training. If you've completed the the [Configuration](../../../configuration.ipnyb) notebook, you should have a GPU cluster named \"gpu-cluster\" available in your workspace. Otherwise, follow the instructions in the notebook to create one. For simplicity, this example uses single process on single VM to train the model.\n",
|
|
"\n",
|
|
"Clone an environment object from the PyTorch 1.4 Azure ML curated environment. Azure ML curated environments are pre-configured environments to simplify ML setup, reference [this doc](https://docs.microsoft.com/en-us/azure/machine-learning/v1/how-to-use-environments#use-a-curated-environment) for more information. To enable MLflow tracking, add ```azureml-mlflow``` as pip package."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"\n",
|
|
"env = Environment.get(workspace=ws, name=\"azureml-acpt-pytorch-1.11-cuda11.3\").clone(\"mlflow-env\")\n",
|
|
"\n",
|
|
"env.python.conda_dependencies.add_pip_package(\"azureml-mlflow\")\n",
|
|
"env.python.conda_dependencies.add_pip_package(\"Pillow==6.0.0\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Create a ScriptRunConfig to specify the training configuration: script, compute as well as environment."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import ScriptRunConfig\n",
|
|
"\n",
|
|
"src = ScriptRunConfig(source_directory=\"./scripts\", script=\"train.py\")\n",
|
|
"src.run_config.environment = env\n",
|
|
"src.run_config.target = \"gpu-cluster\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Get a reference to the experiment you created previously, but this time, as an Azure Machine Learning experiment object.\n",
|
|
"\n",
|
|
"Then, use the ```Experiment.submit``` method to start the remote training run. Note that the first training run often takes longer as Azure Machine Learning service builds the Docker image for executing the script. Subsequent runs will be faster as the cached image is used."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Experiment\n",
|
|
"\n",
|
|
"exp = Experiment(ws, experiment_name)\n",
|
|
"run = exp.submit(src)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can monitor the run and its metrics on Azure Portal."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"run"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Also, you can wait for run to complete."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"run.wait_for_completion(show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Deploy model as web service\n",
|
|
"\n",
|
|
"The ```client.create_deployment``` function registers the logged PyTorch model and deploys the model in a framework-aware manner. It automatically creates the PyTorch-specific inferencing wrapper code and specifies package dependencies for you. See [this doc](https://mlflow.org/docs/latest/models.html#id34) for more information on deploying models on Azure ML using MLflow.\n",
|
|
"\n",
|
|
"In this example, we deploy the Docker image to Azure Container Instance: a serverless compute capable of running a single container. You can tag and add descriptions to help keep track of your web service. \n",
|
|
"\n",
|
|
"[Other inferencing compute choices](https://docs.microsoft.com/azure/machine-learning/v1/how-to-deploy-and-where) include Azure Kubernetes Service which provides scalable endpoint suitable for production use.\n",
|
|
"\n",
|
|
"Note that the service deployment can take several minutes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"First define your deployment target and customize parameters in the deployment config. Refer to [this documentation](https://docs.microsoft.com/azure/machine-learning/reference-azure-machine-learning-cli#azure-container-instance-deployment-configuration-schema) for more information. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import json\n",
|
|
" \n",
|
|
"# Data to be written\n",
|
|
"deploy_config ={\n",
|
|
" \"computeType\": \"aci\"\n",
|
|
"}\n",
|
|
"# Serializing json \n",
|
|
"json_object = json.dumps(deploy_config)\n",
|
|
" \n",
|
|
"# Writing to sample.json\n",
|
|
"with open(\"deployment_config.json\", \"w\") as outfile:\n",
|
|
" outfile.write(json_object)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mlflow.deployments import get_deploy_client\n",
|
|
"\n",
|
|
"# set the tracking uri as the deployment client\n",
|
|
"client = get_deploy_client(mlflow.get_tracking_uri())\n",
|
|
"\n",
|
|
"# set the model path \n",
|
|
"model_path = \"model\"\n",
|
|
"\n",
|
|
"# set the deployment config\n",
|
|
"deployment_config_path = \"deployment_config.json\"\n",
|
|
"test_config = {'deploy-config-file': deployment_config_path}\n",
|
|
"\n",
|
|
"# define the model path and the name is the service name\n",
|
|
"# the model gets registered automatically and a name is autogenerated using the \"name\" parameter below \n",
|
|
"client.create_deployment(model_uri='runs:/{}/{}'.format(run.id, model_path),\n",
|
|
" config=test_config,\n",
|
|
" name=\"keras-aci-deployment\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Once the deployment has completed you can check the scoring URI of the web service in AzureML studio UI in the endpoints tab. Refer [mlflow predict](https://mlflow.org/docs/latest/python_api/mlflow.deployments.html#mlflow.deployments.BaseDeploymentClient.predict) on how to test your deployment. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"client.delete(\"keras-aci-deployment\")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "shipatel"
|
|
}
|
|
],
|
|
"category": "tutorial",
|
|
"celltoolbar": "Edit Metadata",
|
|
"compute": [
|
|
"Local",
|
|
"AML Compute"
|
|
],
|
|
"datasets": [
|
|
"MNIST"
|
|
],
|
|
"deployment": [
|
|
"Azure Container Instance"
|
|
],
|
|
"exclude_from_index": false,
|
|
"framework": [
|
|
"PyTorch"
|
|
],
|
|
"friendly_name": "Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier",
|
|
"kernelspec": {
|
|
"display_name": "Python 3.8 - AzureML",
|
|
"language": "python",
|
|
"name": "python38-azureml"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.7"
|
|
},
|
|
"name": "mlflow-sparksummit-pytorch",
|
|
"notebookId": 2495374963457641,
|
|
"tags": [
|
|
"mlflow",
|
|
"pytorch"
|
|
],
|
|
"task": "Use MLflow with Azure Machine Learning to train and deploy PyTorch image classifier model"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
} |