mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
544 lines
18 KiB
Plaintext
544 lines
18 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Object detection with PyTorch, Mask R-CNN, and a custom Dockerfile\n",
|
|
"\n",
|
|
"In this tutorial, you will finetune a pre-trained [Mask R-CNN](https://arxiv.org/abs/1703.06870) model on images from the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/). The dataset has 170 images with 345 instances of pedestrians. After running this tutorial, you will have a model that can outline the silhouettes of all pedestrians within an image.\n",
|
|
"\n",
|
|
"You\u00e2\u20ac\u2122ll use Azure Machine Learning to: \n",
|
|
"\n",
|
|
"- Initialize a workspace \n",
|
|
"- Create a compute cluster\n",
|
|
"- Define a training environment\n",
|
|
"- Train a model remotely\n",
|
|
"- Register your model\n",
|
|
"- Generate predictions locally\n",
|
|
"\n",
|
|
"## Prerequisities\n",
|
|
"\n",
|
|
"- If you are using an Azure Machine Learning Notebook VM, your environment already meets these prerequisites. Otherwise, go through the [configuration notebook](../../../../../configuration.ipynb) to install the Azure Machine Learning Python SDK and [create an Azure ML Workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace#create-a-workspace). You also need matplotlib 3.2, pycocotools-2.0.0, torchvision >= 0.5.0 and torch >= 1.4.0.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Check core SDK version number, check other dependencies\n",
|
|
"import azureml.core\n",
|
|
"import matplotlib\n",
|
|
"import pycocotools\n",
|
|
"import torch\n",
|
|
"import torchvision\n",
|
|
"\n",
|
|
"print(\"SDK version:\", azureml.core.VERSION)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Diagnostics\n",
|
|
"\n",
|
|
"Opt-in diagnostics for better experience, quality, and security in future releases."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
|
"\n",
|
|
"set_diagnostics_collection(send_diagnostics=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initialize a workspace\n",
|
|
"\n",
|
|
"Initialize a [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`, using the [from_config()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#from-config-path-none--auth-none---logger-none---file-name-none-) method."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.workspace import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"print('Workspace name: ' + ws.name, \n",
|
|
" 'Azure region: ' + ws.location, \n",
|
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
|
" 'Resource group: ' + ws.resource_group, sep='\\n')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create or attach existing Azure ML Managed Compute\n",
|
|
"\n",
|
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-compute-target) for training your model. In this tutorial, we use [Azure ML managed compute](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for our remote training compute resource. Specifically, the below code creates a `STANDARD_NC6` GPU cluster that autoscales from 0 to 4 nodes.\n",
|
|
"\n",
|
|
"**Creation of Compute takes approximately 5 minutes.** If the Aauzre ML Compute with that name is already in your workspace, this code will skip the creation process. \n",
|
|
"\n",
|
|
"As with other Azure servies, there are limits on certain resources associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota.\n",
|
|
"\n",
|
|
"> Note that the below code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
|
"\n",
|
|
"\n",
|
|
"# choose a name for your cluster\n",
|
|
"cluster_name = 'gpu-cluster'\n",
|
|
"\n",
|
|
"try:\n",
|
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
|
" print('Found existing compute target.')\n",
|
|
"except ComputeTargetException:\n",
|
|
" print('Creating a new compute target...')\n",
|
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
|
" max_nodes=4)\n",
|
|
"\n",
|
|
" # create the cluster\n",
|
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
|
"\n",
|
|
" compute_target.wait_for_completion(show_output=True)\n",
|
|
"\n",
|
|
"# use get_status() to get a detailed status for the current cluster. \n",
|
|
"print(compute_target.get_status().serialize())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Define a training environment\n",
|
|
"\n",
|
|
"### Create a project directory\n",
|
|
"Create a directory that will contain all the code from your local machine that you will need access to on the remote resource. This includes the training script an any additional files your training script depends on."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"project_folder = './pytorch-peds'\n",
|
|
"\n",
|
|
"try:\n",
|
|
" os.makedirs(project_folder, exist_ok=False)\n",
|
|
"except FileExistsError:\n",
|
|
" print('project folder {} exists, moving on...'.format(project_folder))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Copy training script and dependencies into project directory"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import shutil\n",
|
|
"\n",
|
|
"files_to_copy = ['data', 'model', 'script', 'utils', 'transforms', 'coco_eval', 'engine', 'coco_utils']\n",
|
|
"for file in files_to_copy:\n",
|
|
" shutil.copy(os.path.join(os.getcwd(), (file + '.py')), project_folder)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create an experiment"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Experiment\n",
|
|
"\n",
|
|
"experiment_name = 'pytorch-peds'\n",
|
|
"experiment = Experiment(ws, name=experiment_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Specify dependencies with a custom Dockerfile\n",
|
|
"\n",
|
|
"There are a number of ways to [use environments](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) for specifying dependencies during model training. In this case, we use a custom Dockerfile."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"\n",
|
|
"my_env = Environment(name='maskr-docker')\n",
|
|
"my_env.docker.enabled = True\n",
|
|
"with open(\"dockerfiles/Dockerfile\", \"r\") as f:\n",
|
|
" dockerfile_contents=f.read()\n",
|
|
"my_env.docker.base_dockerfile=dockerfile_contents\n",
|
|
"my_env.docker.base_image = None\n",
|
|
"my_env.python.interpreter_path = '/opt/miniconda/bin/python'\n",
|
|
"my_env.python.user_managed_dependencies = True\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a ScriptRunConfig\n",
|
|
"\n",
|
|
"Use the [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) class to define your run. Specify the source directory, compute target, and environment."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.train.dnn import PyTorch\n",
|
|
"from azureml.core import ScriptRunConfig\n",
|
|
"\n",
|
|
"model_name = 'pytorch-peds'\n",
|
|
"output_dir = './outputs/'\n",
|
|
"n_epochs = 2\n",
|
|
"\n",
|
|
"script_args = [\n",
|
|
" '--model_name', model_name,\n",
|
|
" '--output_dir', output_dir,\n",
|
|
" '--n_epochs', n_epochs,\n",
|
|
"]\n",
|
|
"# Add training script to run config\n",
|
|
"runconfig = ScriptRunConfig(\n",
|
|
" source_directory=project_folder,\n",
|
|
" script=\"script.py\",\n",
|
|
" arguments=script_args)\n",
|
|
"\n",
|
|
"# Attach compute target to run config\n",
|
|
"runconfig.run_config.target = cluster_name\n",
|
|
"\n",
|
|
"# Uncomment the line below if you want to try this locally first\n",
|
|
"#runconfig.run_config.target = \"local\"\n",
|
|
"\n",
|
|
"# Attach environment to run config\n",
|
|
"runconfig.run_config.environment = my_env"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train remotely\n",
|
|
"\n",
|
|
"### Submit your run"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Submit run \n",
|
|
"run = experiment.submit(runconfig)\n",
|
|
"\n",
|
|
"# to get more details of your run\n",
|
|
"print(run.get_details())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Monitor your run\n",
|
|
"\n",
|
|
"Use a widget to keep track of your run. You can also view the status of the run within the [Azure Machine Learning service portal](https://ml.azure.com)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.widgets import RunDetails\n",
|
|
"\n",
|
|
"RunDetails(run).show()\n",
|
|
"run.wait_for_completion(show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Test your model\n",
|
|
"\n",
|
|
"Now that we are done training, let's see how well this model actually performs.\n",
|
|
"\n",
|
|
"### Get your latest run\n",
|
|
"First, pull the latest run using `experiment.get_runs()`, which lists runs from `experiment` in reverse chronological order."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Run\n",
|
|
"\n",
|
|
"last_run = next(experiment.get_runs())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Register your model\n",
|
|
"Next, [register the model](https://docs.microsoft.com/azure/machine-learning/concept-model-management-and-deployment#register-package-and-deploy-models-from-anywhere) from your run. Registering your model assigns it a version and helps you with auditability."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"last_run.register_model(model_name=model_name, model_path=os.path.join(output_dir, model_name))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Download your model\n",
|
|
"Next, download this registered model. Notice how we can initialize the `Model` object with the name of the registered model, rather than a path to the file itself."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Model\n",
|
|
"\n",
|
|
"model = Model(workspace=ws, name=model_name)\n",
|
|
"path = model.download(target_dir='model', exist_ok=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Use your model to make a prediction\n",
|
|
"\n",
|
|
"Run inferencing on a single test image and display the results."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import torch\n",
|
|
"from azureml.core import Dataset\n",
|
|
"from data import PennFudanDataset\n",
|
|
"from script import get_transform, download_data, NUM_CLASSES\n",
|
|
"from model import get_instance_segmentation_model\n",
|
|
"\n",
|
|
"if torch.cuda.is_available():\n",
|
|
" device = torch.device('cuda')\n",
|
|
"else:\n",
|
|
" device = torch.device('cpu')\n",
|
|
"\n",
|
|
"# Instantiate model with correct weights, cast to correct device, place in evaluation mode\n",
|
|
"predict_model = get_instance_segmentation_model(NUM_CLASSES)\n",
|
|
"predict_model.to(device)\n",
|
|
"predict_model.load_state_dict(torch.load(path, map_location=device))\n",
|
|
"predict_model.eval()\n",
|
|
"\n",
|
|
"# Load dataset\n",
|
|
"root_dir=download_data()\n",
|
|
"dataset_test = PennFudanDataset(root=root_dir, transforms=get_transform(train=False))\n",
|
|
"\n",
|
|
"# pick one image from the test set\n",
|
|
"img, _ = dataset_test[0]\n",
|
|
"\n",
|
|
"with torch.no_grad():\n",
|
|
" prediction = predict_model([img.to(device)])\n",
|
|
"\n",
|
|
"# model = torch.load(path)\n",
|
|
"#torch.load(model.get_model_path(model_name='outputs/model.pt'))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Display the input image\n",
|
|
"\n",
|
|
"While tensors are great for computers, a tensor of RGB values doesn't mean much to a human. Let's display the input image in a way that a human could understand."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from PIL import Image\n",
|
|
"\n",
|
|
"\n",
|
|
"Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Display the predicted masks\n",
|
|
"\n",
|
|
"The prediction consists of masks, displaying the outline of pedestrians in the image. Let's take a look at the first two masks, below."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"Image.fromarray(prediction[0]['masks'][1, 0].mul(255).byte().cpu().numpy())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next steps\n",
|
|
"\n",
|
|
"Congratulations! You just trained a Mask R-CNN model with PyTorch in Azure Machine Learning. As next steps, consider:\n",
|
|
"1. Learn more about using PyTorch in Azure Machine Learning service by checking out the [README](./README.md]\n",
|
|
"2. Try exporting your model to [ONNX](https://docs.microsoft.com/azure/machine-learning/concept-onnx) for accelerated inferencing."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "gopalv"
|
|
}
|
|
],
|
|
"category": "training",
|
|
"compute": [
|
|
"AML Compute"
|
|
],
|
|
"datasets": [
|
|
"Custom"
|
|
],
|
|
"deployment": [
|
|
"None"
|
|
],
|
|
"exclude_from_index": false,
|
|
"framework": [
|
|
"PyTorch"
|
|
],
|
|
"friendly_name": "PyTorch object detection",
|
|
"index_order": 1,
|
|
"kernel_info": {
|
|
"name": "python3"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.5-final"
|
|
},
|
|
"nteract": {
|
|
"version": "nteract-front-end@1.0.0"
|
|
},
|
|
"tags": [
|
|
"remote run",
|
|
"docker"
|
|
],
|
|
"task": "Fine-tune PyTorch object detection model with a custom dockerfile"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |