Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Model Development with Custom Weights

This example shows how to retrain a model with custom weights and fine-tune the model with quantization, then deploy the model running on FPGA. Only Windows is supported. We use TensorFlow and Keras to build our model. We are going to use transfer learning, with ResNet50 as a featurizer. We don't use the last layer of ResNet50 in this case and instead add our own classification layer using Keras.

The custom wegiths are trained with ImageNet on ResNet50. We will use the Kaggle Cats and Dogs dataset to retrain and fine-tune the model. The dataset can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). Download the zip and extract to a directory named 'catsanddogs' under your user directory ("~/catsanddogs"). 

Please set up your environment as described in the [quick start](project-brainwave-quickstart.ipynb).

In [None]:
import os
import sys
import tensorflow as tf
import numpy as np
from keras import backend as K

## Setup Environment
After you train your model in float32, you'll write the weights to a place on disk. We also need a location to store the models that get downloaded.

In [None]:
custom_weights_dir = os.path.expanduser("~/custom-weights")
saved_model_dir = os.path.expanduser("~/models")

## Prepare Data
Load the files we are going to use for training and testing. By default this notebook uses only a very small subset of the Cats and Dogs dataset. That makes it run relatively quickly.

In [None]:
import glob
import imghdr
datadir = os.path.expanduser("~/catsanddogs")

cat_files = glob.glob(os.path.join(datadir, 'PetImages', 'Cat', '*.jpg'))
dog_files = glob.glob(os.path.join(datadir, 'PetImages', 'Dog', '*.jpg'))

# Limit the data set to make the notebook execute quickly.
cat_files = cat_files[:64]
dog_files = dog_files[:64]

# The data set has a few images that are not jpeg. Remove them.
cat_files = [f for f in cat_files if imghdr.what(f) == 'jpeg']
dog_files = [f for f in dog_files if imghdr.what(f) == 'jpeg']

if(not len(cat_files) or not len(dog_files)):
    print("Please download the Kaggle Cats and Dogs dataset form https://www.microsoft.com/en-us/download/details.aspx?id=54765 and extract the zip to " + datadir)    
    raise ValueError("Data not found")
else:
    print(cat_files[0])
    print(dog_files[0])

In [None]:
# Construct a numpy array as labels
image_paths = cat_files + dog_files
total_files = len(cat_files) + len(dog_files)
labels = np.zeros(total_files)
labels[len(cat_files):] = 1

In [None]:
# Split images data as training data and test data
from sklearn.model_selection import train_test_split
onehot_labels = np.array([[0,1] if i else [1,0] for i in labels])
img_train, img_test, label_train, label_test = train_test_split(image_paths, onehot_labels, random_state=42, shuffle=True)

print(len(img_train), len(img_test), label_train.shape, label_test.shape)

## Construct Model
We use ResNet50 for the featuirzer and build our own classifier using Keras layers. We train the featurizer and the classifier as one model. The weights trained on ImageNet are used as the starting point for the retraining of our featurizer. The weights are loaded from tensorflow chkeckpoint files.

Before passing image dataset to the ResNet50 featurizer, we need to preprocess the input file to get it into the form expected by ResNet50. ResNet50 expects float tensors representing the images in BGR, channel last order. We've provided a default implementation of the preprocessing that you can use.

In [None]:
import azureml.contrib.brainwave.models.utils as utils

def preprocess_images():
    # Convert images to 3D tensors [width,height,channel] - channels are in BGR order.
    in_images = tf.placeholder(tf.string)
    image_tensors = utils.preprocess_array(in_images)
    return in_images, image_tensors

We use Keras layer APIs to construct the classifier. Because we're using the tensorflow backend, we can train this classifier in one session with our Resnet50 model.

In [None]:
def construct_classifier(in_tensor):
    from keras.layers import Dropout, Dense, Flatten
    K.set_session(tf.get_default_session())
    
    FC_SIZE = 1024
    NUM_CLASSES = 2

    x = Dropout(0.2, input_shape=(1, 1, 2048,))(in_tensor)
    x = Dense(FC_SIZE, activation='relu', input_dim=(1, 1, 2048,))(x)
    x = Flatten()(x)
    preds = Dense(NUM_CLASSES, activation='softmax', input_dim=FC_SIZE, name='classifier_output')(x)
    return preds

Now every component of the model is defined, we can construct the model. Constructing the model with the project brainwave models is two steps - first we import the graph definition, then we restore the weights of the model into a tensorflow session. Because the quantized graph defintion and the float32 graph defintion share the same node names in the graph definitions, we can initally train the weights in float32, and then reload them with the quantized operations (which take longer) to fine-tune the model.

In [None]:
def construct_model(quantized, starting_weights_directory = None):
    from azureml.contrib.brainwave.models import Resnet50, QuantizedResnet50
    
    # Convert images to 3D tensors [width,height,channel]
    in_images, image_tensors = preprocess_images()

    # Construct featurizer using quantized or unquantized ResNet50 model
    if not quantized:
        featurizer = Resnet50(saved_model_dir)
    else:
        featurizer = QuantizedResnet50(saved_model_dir, custom_weights_directory = starting_weights_directory)


    features = featurizer.import_graph_def(input_tensor=image_tensors)
    # Construct classifier
    preds = construct_classifier(features)
    
    # Initialize weights
    sess = tf.get_default_session()
    tf.global_variables_initializer().run()

    featurizer.restore_weights(sess)

    return in_images, image_tensors, features, preds, featurizer

## Train Model
First we train the model with custom weights but without quantization. Training is done with native float precision (32-bit floats). We load the traing data set and batch the training with 10 epochs. When the performance reaches desired level or starts decredation, we stop the training iteration and save the weights as tensorflow checkpoint files. 

In [None]:
def read_files(files):
    """ Read files to array"""
    contents = []
    for path in files:
        with open(path, 'rb') as f:
            contents.append(f.read())
    return contents

In [None]:
def train_model(preds, in_images, img_train, label_train, is_retrain = False, train_epoch = 10):
    """ training model """
    from keras.objectives import binary_crossentropy
    from tqdm import tqdm
    
    learning_rate = 0.001 if is_retrain else 0.01
        
    # Specify the loss function
    in_labels = tf.placeholder(tf.float32, shape=(None, 2))   
    cross_entropy = tf.reduce_mean(binary_crossentropy(in_labels, preds))
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

    def chunks(a, b, n):
        """Yield successive n-sized chunks from a and b."""
        if (len(a) != len(b)):
            print("a and b are not equal in chunks(a,b,n)")
            raise ValueError("Parameter error")

        for i in range(0, len(a), n):
            yield a[i:i + n], b[i:i + n]

    chunk_size = 16
    chunk_num = len(label_train) / chunk_size

    sess = tf.get_default_session()
    for epoch in range(train_epoch):
        avg_loss = 0
        for img_chunk, label_chunk in tqdm(chunks(img_train, label_train, chunk_size)):
            contents = read_files(img_chunk)
            _, loss = sess.run([optimizer, cross_entropy],
                                feed_dict={in_images: contents,
                                           in_labels: label_chunk,
                                           K.learning_phase(): 1})
            avg_loss += loss / chunk_num
        print("Epoch:", (epoch + 1), "loss = ", "{:.3f}".format(avg_loss))
            
        # Reach desired performance
        if (avg_loss < 0.001):
            break

In [None]:
def test_model(preds, in_images, img_test, label_test):
    """Test the model"""
    from keras.metrics import categorical_accuracy

    in_labels = tf.placeholder(tf.float32, shape=(None, 2))
    accuracy = tf.reduce_mean(categorical_accuracy(in_labels, preds))
    contents = read_files(img_test)

    accuracy = accuracy.eval(feed_dict={in_images: contents,
                                        in_labels: label_test,
                                        K.learning_phase(): 0})
    return accuracy

In [None]:
# Launch the training
tf.reset_default_graph()
sess = tf.Session(graph=tf.get_default_graph())

with sess.as_default():
    in_images, image_tensors, features, preds, featurizer = construct_model(quantized=False)
    train_model(preds, in_images, img_train, label_train, is_retrain=False, train_epoch=10)    
    accuracy = test_model(preds, in_images, img_test, label_test)  
    print("Accuracy:", accuracy)
    featurizer.save_weights(custom_weights_dir + "/rn50", tf.get_default_session())

## Test Model
After training, we evaluate the trained model's accuracy on test dataset with quantization. So that we know the model's performance if it is deployed on the FPGA.

In [None]:
tf.reset_default_graph()
sess = tf.Session(graph=tf.get_default_graph())

with sess.as_default():
    print("Testing trained model with quantization")
    in_images, image_tensors, features, preds, quantized_featurizer = construct_model(quantized=True, starting_weights_directory=custom_weights_dir)
    accuracy = test_model(preds, in_images, img_test, label_test)      
    print("Accuracy:", accuracy)

## Fine-Tune Model
Sometimes, the model's accuracy can drop significantly after quantization. In those cases, we need to retrain the model enabled with quantization to get better model accuracy.

In [None]:
if (accuracy < 0.93):
    with sess.as_default():
        print("Fine-tuning model with quantization")
        train_model(preds, in_images, img_train, label_train, is_retrain=True, train_epoch=10)
        accuracy = test_model(preds, in_images, img_test, label_test)        
        print("Accuracy:", accuracy)

## Service Definition
Like in the QuickStart notebook our service definition pipeline consists of three stages. 

In [None]:
from azureml.contrib.brainwave.pipeline import ModelDefinition, TensorflowStage, BrainWaveStage

model_def_path = os.path.join(saved_model_dir, 'model_def.zip')

model_def = ModelDefinition()
model_def.pipeline.append(TensorflowStage(sess, in_images, image_tensors))
model_def.pipeline.append(BrainWaveStage(sess, quantized_featurizer))
model_def.pipeline.append(TensorflowStage(sess, features, preds))
model_def.save(model_def_path)
print(model_def_path)

## Deploy
Go to our [GitHub repo](https://aka.ms/aml-real-time-ai) "docs" folder to learn how to create a Model Management Account and find the required information below.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

The first time the code below runs it will create a new service running your model. If you want to change the model you can make changes above in this notebook and save a new service definition. Then this code will update the running service in place to run the new model.

In [None]:
from azureml.core.model import Model
from azureml.core.image import Image
from azureml.core.webservice import Webservice
from azureml.contrib.brainwave import BrainwaveWebservice, BrainwaveImage

model_name = "catsanddogs-resnet50-model"
image_name = "catsanddogs-resnet50-image"
service_name = "modelbuild-service"

registered_model = Model.register(ws, service_def_path, model_name)

image_config = BrainwaveImage.image_configuration()
deployment_config = BrainwaveWebservice.deploy_configuration()
    
try:
    service = Webservice(ws, service_name)
    service.delete()
    service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)
except WebserviceException:
    service = Webservice.deploy_from_model(ws, service_name, [registered_model], image_config, deployment_config)

The service is now running in Azure and ready to serve requests. We can check the address and port.

In [None]:
print(service.ipAddress + ':' + str(service.port))

## Client
There is a simple test client at amlrealtimeai.PredictionClient which can be used for testing. We'll use this client to score an image with our new service.

In [None]:
from azureml.contrib.brainwave.client import PredictionClient
client = PredictionClient(service.ipAddress, service.port)

You can adapt the client [code](../../pythonlib/amlrealtimeai/client.py) to meet your needs. There is also an example C# [client](../../sample-clients/csharp).

The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client [here](https://www.tensorflow.org/serving/setup).

## Request
Let's see how our service does on a few images. It may get a few wrong.

In [None]:
# Specify an image to classify
print('CATS')
for image_file in cat_files[:8]:
    results = client.score_image(image_file)
    result = 'CORRECT ' if results[0] > results[1] else 'WRONG '
    print(result + str(results))
print('DOGS')
for image_file in dog_files[:8]:
    results = client.score_image(image_file)
    result = 'CORRECT ' if results[1] > results[0] else 'WRONG '
    print(result + str(results))

## Cleanup
Run the cell below to delete your service.

In [None]:
service.delete()

## Appendix

License for plot_confusion_matrix:

New BSD License

Copyright (c) 2007-2018 The scikit-learn developers.
All rights reserved.


Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

  a. Redistributions of source code must retain the above copyright notice,
     this list of conditions and the following disclaimer.
  b. Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in the
     documentation and/or other materials provided with the distribution.
  c. Neither the name of the Scikit-learn Developers  nor the names of
     its contributors may be used to endorse or promote products
     derived from this software without specific prior written
     permission. 


THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.
