Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.png)

# Object detection with PyTorch, Mask R-CNN, and a custom Dockerfile

In this tutorial, you will finetune a pre-trained [Mask R-CNN](https://arxiv.org/abs/1703.06870) model on images from the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/). The dataset has 170 images with 345 instances of pedestrians. After running this tutorial, you will have a model that can outline the silhouettes of all pedestrians within an image.

Youâ€™ll use Azure Machine Learning to: 

- Initialize a workspace 
- Create a compute cluster
- Define a training environment
- Train a model remotely
- Register your model
- Generate predictions locally

## Prerequisities

- If you are using an Azure Machine Learning Notebook VM, your environment already meets these prerequisites. Otherwise, go through the [configuration notebook](../../../../../configuration.ipynb) to install the Azure Machine Learning Python SDK and [create an Azure ML Workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace#create-a-workspace). You also need matplotlib 3.2, pycocotools-2.0.0, torchvision >= 0.5.0 and torch >= 1.4.0.


In [None]:
# Check core SDK version number, check other dependencies
import azureml.core
import matplotlib
import pycocotools
import torch
import torchvision

print("SDK version:", azureml.core.VERSION)


## Diagnostics

Opt-in diagnostics for better experience, quality, and security in future releases.

In [None]:
from azureml.telemetry import set_diagnostics_collection

set_diagnostics_collection(send_diagnostics=True)

## Initialize a workspace

Initialize a [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`, using the [from_config()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#from-config-path-none--auth-none---logger-none---file-name-none-) method.

In [None]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

## Create or attach existing Azure ML Managed Compute

You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-compute-target) for training your model. In this tutorial, we use [Azure ML managed compute](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for our remote training compute resource. Specifically, the below code creates a `STANDARD_NC6` GPU cluster that autoscales from 0 to 4 nodes.

**Creation of Compute takes approximately 5 minutes.** If the Aauzre ML Compute with that name is already in your workspace, this code will skip the creation process. 

As with other Azure servies, there are limits on certain resources associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota.

> Note that the below code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException


# choose a name for your cluster
cluster_name = 'gpu-cluster'

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', 
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

## Define a training environment

### Create a project directory
Create a directory that will contain all the code from your local machine that you will need access to on the remote resource. This includes the training script an any additional files your training script depends on.

In [None]:
import os

project_folder = './pytorch-peds'

try:
    os.makedirs(project_folder, exist_ok=False)
except FileExistsError:
    print('project folder {} exists, moving on...'.format(project_folder))

### Copy training script and dependencies into project directory

In [None]:
import shutil

files_to_copy = ['data', 'model', 'script', 'utils', 'transforms', 'coco_eval', 'engine', 'coco_utils']
for file in files_to_copy:
    shutil.copy(os.path.join(os.getcwd(), (file + '.py')), project_folder)


### Create an experiment

In [None]:
from azureml.core import Experiment

experiment_name = 'pytorch-peds'
experiment = Experiment(ws, name=experiment_name)

### Specify dependencies with a custom Dockerfile

There are a number of ways to [use environments](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) for specifying dependencies during model training. In this case, we use a custom Dockerfile.

In [None]:
from azureml.core import Environment

my_env = Environment(name='maskr-docker')
my_env.docker.enabled = True
with open("dockerfiles/Dockerfile", "r") as f:
    dockerfile_contents=f.read()
my_env.docker.base_dockerfile=dockerfile_contents
my_env.docker.base_image = None
my_env.python.interpreter_path = '/opt/miniconda/bin/python'
my_env.python.user_managed_dependencies = True



### Create a ScriptRunConfig

Use the [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) class to define your run. Specify the source directory, compute target, and environment.

In [None]:
from azureml.train.dnn import PyTorch
from azureml.core import ScriptRunConfig

model_name = 'pytorch-peds'
output_dir = './outputs/'
n_epochs = 2

script_args = [
    '--model_name', model_name,
    '--output_dir', output_dir,
    '--n_epochs', n_epochs,
]
# Add training script to run config
runconfig = ScriptRunConfig(
    source_directory=project_folder,
    script="script.py",
    arguments=script_args)

# Attach compute target to run config
runconfig.run_config.target = cluster_name

# Uncomment the line below if you want to try this locally first
#runconfig.run_config.target = "local"

# Attach environment to run config
runconfig.run_config.environment = my_env

## Train remotely

### Submit your run

In [None]:
# Submit run 
run = experiment.submit(runconfig)

# to get more details of your run
print(run.get_details())

### Monitor your run

Use a widget to keep track of your run. You can also view the status of the run within the [Azure Machine Learning service portal](https://ml.azure.com).

In [None]:
from azureml.widgets import RunDetails

RunDetails(run).show()
run.wait_for_completion(show_output=True)

## Test your model

Now that we are done training, let's see how well this model actually performs.

### Get your latest run
First, pull the latest run using `experiment.get_runs()`, which lists runs from `experiment` in reverse chronological order.

In [None]:
from azureml.core import Run

last_run = next(experiment.get_runs())

### Register your model
Next, [register the model](https://docs.microsoft.com/azure/machine-learning/concept-model-management-and-deployment#register-package-and-deploy-models-from-anywhere) from your run. Registering your model assigns it a version and helps you with auditability.

In [None]:
last_run.register_model(model_name=model_name, model_path=os.path.join(output_dir, model_name))

### Download your model
Next, download this registered model. Notice how we can initialize the `Model` object with the name of the registered model, rather than a path to the file itself.

In [None]:
from azureml.core import Model

model = Model(workspace=ws, name=model_name)
path = model.download(target_dir='model', exist_ok=True)

### Use your model to make a prediction

Run inferencing on a single test image and display the results.

In [None]:
import torch
from azureml.core import Dataset
from data import PennFudanDataset
from script import get_transform, download_data, NUM_CLASSES
from model import get_instance_segmentation_model

if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Instantiate model with correct weights, cast to correct device, place in evaluation mode
predict_model = get_instance_segmentation_model(NUM_CLASSES)
predict_model.to(device)
predict_model.load_state_dict(torch.load(path, map_location=device))
predict_model.eval()

# Load dataset
root_dir=download_data()
dataset_test = PennFudanDataset(root=root_dir, transforms=get_transform(train=False))

# pick one image from the test set
img, _ = dataset_test[0]

with torch.no_grad():
    prediction = predict_model([img.to(device)])

# model = torch.load(path)
#torch.load(model.get_model_path(model_name='outputs/model.pt'))

### Display the input image

While tensors are great for computers, a tensor of RGB values doesn't mean much to a human. Let's display the input image in a way that a human could understand.

In [None]:
from PIL import Image


Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())

### Display the predicted masks

The prediction consists of masks, displaying the outline of pedestrians in the image. Let's take a look at the first two masks, below.

In [None]:
Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())

In [None]:
Image.fromarray(prediction[0]['masks'][1, 0].mul(255).byte().cpu().numpy())

## Next steps

Congratulations! You just trained a Mask R-CNN model with PyTorch in Azure Machine Learning. As next steps, consider:
1. Learn more about using PyTorch in Azure Machine Learning service by checking out the [README](./README.md]
2. Try exporting your model to [ONNX](https://docs.microsoft.com/azure/machine-learning/concept-onnx) for accelerated inferencing.