Table of Contents

Here is what we are going to build in this post 😊

Jetson Screen

Introduction

In a previous blog post, I explained how to set up Jetson-Nano developer kit (it can be seen as a small and cheap server with GPUs for inference). In this post, I will go through steps to train and deploy a Machine Learning model with a web interface. The goal is to not build a state of the art recognition model but rather to illustrate and build a simple computer vision (alphanumeric recognition) web application that is based on a Convolutional Neural Network model. The tools needed for this mini-project are:

  • A server for inference: Cloud instances, Jetson-Nano or simply your powerful laptop 💻.
  • Data: EMNIST Balanced an extended version of MNIST with 131,600 characters, 47 balanced classes.
  • PyTorch: To train the deep learning model.
  • Docker: Make our life easier to create a container for our application.
  • Flask: For API and user interface.

Workflow

The steps are as follows:

  • Prepare the docker image.
  • Get data and train the machine learning model on Colab. Take a look on this post to setup Colab with ssh connection.
  • Show a simple inference example
  • Build the API and the user-interface

Configuration of Nvidia-Docker

If we are planning to deploy the application on a GPU instance, using Nvidia-docker is the easiest way to create GPU supported containers. Unfortunately, Nvidia-docker is not supported on Jetson Nano. This is why we have to do it manually by mounting the device and drivers inside the docker by creating a simple script to do it. In a nutshell, it is like creating Nvidia-docker from scratch.

The script can be found on my Gist and will simply replace the docker command to run a container.


# instead of using
# "sudo docker run IMAGE_NAME", the following command should be used
sudo ./mydocker.sh run IMAGE_NAME

Preparing Docker image and requirements

After having Docker running with the GPU capabilities, we are ready to prepare the Dockefile for the application. Usually, this step should be done after training and preparing the model/app. But for our case, since the project is simple and we know what tools we are going to use, we can build docker images before moving forward. The following Dockerfile has a wide range of tools and somehow heavy but contains everything we may need.

  • Jetson-Nano dockerfile: It is not recommended to build it (it may take few minutes). Instead you can find the image on Docker Hub ready to use.

  • Cloud/PC: A minimalist Dockerfile is provided here .

Jetson Nano cannot install all packages as we do on a usual instance on the cloud. For PyTorch and TorchVision, we install them from wheels/source inside the Dockerfile.

########## ON JETSON-Nano ##########

#On Jetson-Nano we pull the image from Docker Hub
#Don't build unless you want to wait for so long
docker pull imadelh/jetson_pytorch_flask:arm_v1

#Then we can run using mydocker.sh (kind of Nvidia-docker) to get to the bash
# the
sudo ./mydocker.sh run -i -t --rm -v /home/imad:/home/root/ imadelh/jetson_pytorch_flask:arm_v1



########## On Cloud/PC ##########
# we build the image using provided Dockerfile
sudo docker build -t flaskml .

#Then run it
sudo docker run -i -t --rm -p 8888:8888 -v **absolute path to app directory**:/app flaskml


Dataset and Machine Learning Model

Now we can tackle the Machine Learning problem. Our goal is to detect a written alphabet or a number.

Dataset and training

For training the model, I used EMNIST Balanced, an extended version of MNIST with 131,600 characters and 47 balanced classes. Training has been done on Google Colab with ssh access as explained here. This script download the data and train the model while saving logs and best weights on disk.

# Download the script cnn.py from https://git.io/fjyyU and run it
python cnn.py

Training should look like this (I was training several models in this screenshot). Training

Other important steps in training ML models has been skipped here (for simplicity of the project) like hyper-parameters tuning or training different models and do model selection/cross validation.

Inference

After training the model and saving the weights, we can easily test the prediction of our models against random examples from the testing dataset. In this step, it is important to set the PyTorch model to an evaluation mode and to only compute outputs (no training is needed).

The following notebook https://colab.research.google.com/github/imadelh/ML-web-app/blob/master/Notebooks/emnist_inference_cnn-2.ipynb contains necessary steps for inference. It downloads the weights of a trained model and runs inference on a random image.

It is important to do the same transformation as in the training dataset. For some reason, the initial inputs from EMNIST dataset are transposed and therefore if we train the network with transposed examples we have to keep this for validation/testing dataset.

Inference

Now we need to think about how to “ship” our model. This what we call packaging. It is simply to put the inference steps together and have a simple API to use it later.

# Load model
inference_model = MyModel(weights = './ml_model/trained_weights.pth', device = 'cpu')

# Get raw data
input_img = BytesIO(base64.urlsafe_b64decode(request.form['img']))

# Do inference
# inference_model.predict method takes the raw data
# do all necessary  transformations and output a vector of probabilities

res = inference_model.predict(input_img)

This way it is easier to update the Network/weights later. And the application does not care about what is happening during the inference as long as we get the desired outputs. 


Flask API and user interface (Web Application)

Finally, we get to the exciting part where we make our Machine Learning working in a real environment with the user interface. To do this we use Flask a simple and powerful python web framework that can be used to build an API and therefore call the inference function from the browser. The application file looks like this.

from flask import Flask, request, jsonify, render_template
import base64, json
from io import BytesIO
from ml_model.model import MyModel
import numpy as np

# declare constants
HOST = '0.0.0.0'
PORT = 8888

# initialize flask application
app = Flask(__name__)

# Read model to keep it ready all the time
model = MyModel('./ml_model/trained_weights.pth', 'cpu')
CLASS_MAPPING = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghnqrt'

# Application template html/css/js
@app.route('/')
def home():
    return render_template("home.html")

# Prediction method
@app.route('/predict', methods=['GET','POST'])
def predict():
    results = {"prediction" :"Empty", "probability" :{}}

    # get data
    input_img = BytesIO(base64.urlsafe_b64decode(request.form['img']))

    # model.predict method takes the raw data and output a vector of probabilities
    res =  model.predict(input_img)

    results["prediction"] = str(CLASS_MAPPING[np.argmax(res)])
    results["probability"] = float(np.max(res))*100

    # output data
    return json.dumps(results)

if __name__ == '__main__':
    # run web server
    app.run(host=HOST,
            debug=True,  # automatic reloading enabled
            port=PORT)

This will read an input image and returns the prediction with associated probability to show it on the Html template file home.html available in the folder templates (https://github.com/imadelh/ML-web-app/tree/master/app/templates). To run this application using the docker image we can do as follows

# Run a container
sudo ./mydocker.sh run -i -t --rm -v /home/imad:/home/root/ imadelh/jetson_pytorch_flask:arm_v1

# Now you are inside the container bash, go to the application directory and run
python3 app.py

# See github repo for more details https://github.com/imadelh/ML-web-app
# The service will be running at localhost:8888

For information, Flask native “webserver” app.run() is not meant for a production environment that would scale to large number of requests. Other tools maybe used to that purpose such as Gunicorn (https://gunicorn.org).

Conclusion

This was a mini-project done over a weekend and it still very far from a real-world Machine Learning applications that would scale to a high number of users. But out of small things, greater things have been produced and we got to see the necessary steps to train and deploy an ML model in a very simplified way. Here are some points for improvement

  • Hyperparameter optimization ;
  • Neural network pruning ;
  • Center and crop image ;
  • Show top-n predictions ;
  • Model versionning ;
  • Load balancer in a cluster ;
  • Add correction and submit options - Online Re-training.