Table of Contents

Here is what we are going to build in this post 😊

Jetson Screen


In a previous blog post, I explained how to set up Jetson Nano developer kit. In this post, I will go through the main steps to train and deploy a Machine Learning model with a web interface. The goal is not to build a state of the art recognition model, but rather to illustrate and build a simple computer vision (alphanumeric recognition) web application that is based on Convolutional Neural Networks. The tools needed for this mini-project are:

  • Inference instance: Cloud instance, Jetson Nano or your personal computer 💻.
  • Data: EMNIST Balanced an extended version of MNIST with 131,600 characters, 47 balanced classes.
  • PyTorch: To train the Deep Learning model.
  • Docker: To create containerized application.
  • Flask: For API and user interface.


The steps are as follows:

  • Prepare the Docker image
  • Get the data and train the machine learning model. Take a look at this post to setup Colab with an SSH connection
  • Show a simple inference example
  • Build the API and the user interface

Configuration of Nvidia Docker

Update: Nvidia Docker is supported on Jetson:

If we are planning to deploy the application on a GPU instance, using Nvidia Docker is the easiest way to create GPU supported containers. Unfortunately, Nvidia-docker is not supported on Jetson Nano. On Jetson Nano, we can mount the GPU and the drivers in the container and make it available for inference.

The script provided in this Gist mount all required drivers and could be used to replace the docker command to run a container.

# instead of using `sudo docker run IMAGE_NAME``
# use `./`
sudo ./ run IMAGE_NAME

Preparing Docker image and requirements

After adding GPU support to Docker, we can prepare the Dockerfile for the application. The following Dockerfile has all the needed tools and packages for this project.

  • Jetson Nano dockerfile: It is not recommended to build it (it may take few minutes). Instead you can find the image on Docker Hub ready to use (with the tag arm_v1)

  • Cloud/PC: A minimalist Dockerfile is provided here .

Jetson Nano cannot install all packages as we do on a usual instance in the cloud. For PyTorch and TorchVision, we install them from wheels/source inside the Dockerfile.

########## ON JETSON Nano ##########

#On Jetson Nano we pull the image from Docker Hub
#Avoid build - Slow on Jetson
docker pull imadelh/jetson_pytorch_flask:arm_v1

#Then we can run using to get to the bash
sudo ./ run -i -t --rm -v $(pwd):/home/root/ imadelh/jetson_pytorch_flask:arm_v1

########## On Cloud/PC ##########
#Build the image using the provided Dockerfile
sudo docker build -t flaskml .

#then run it
sudo docker run -i -t --rm -p 8888:8888 -v $(pwd):/app flaskml

Dataset and Machine Learning Model

Now we can tackle the Machine Learning problem to detect written letters of the alphabet or numbers.

Dataset and training

To train the model, we use data from EMNIST Balanced, an extended version of MNIST with 131,600 characters and 47 balanced classes. Training is run on Google Colab with ssh access as explained here. This script download the data and train the model while saving logs and the best weights on disk.

# Download the script from and run it

Running the training script looks like this (training multiple models). Training

Other important steps in training the ML model has been skipped for simplicity (hyper-parameters tuning, training different architectures, model selection/cross validation).


After training the model and saving the weights, we can easily test the prediction of our model against random examples from the testing dataset. It is important to set the PyTorch model to evaluation mode (no training is needed).

The following notebook contains necessary steps for inference. It downloads the weights of a trained model and runs inference on a random image.

During inference, we need to apply the same transformations as in the training step. For some reason, the initial inputs from EMNIST dataset are transposed and therefore if we train the network with transposed examples we have to keep this transformation in the validation/testing dataset.


At this step, creating an API that takes an input and returns the expected output is a simple solution to use the model on new images.

# Load model
inference_model = MyModel(weights = './ml_model/trained_weights.pth', device = 'cpu')

# Get raw data
input_img = BytesIO(base64.urlsafe_b64decode(request.form['img']))

# Do inference
# inference_model.predict method takes the raw data
# do all necessary transformations and output a vector of probabilities

res = inference_model.predict(input_img)

This approach gives the flexibility of updating the Network/model later while keep the same user-interface.

Flask API and user interface (Web Application)

To build the user interface, we use Flask, a python web framework to build simple APIs. The application file is as follows:

from flask import Flask, request, jsonify, render_template
import base64, json
from io import BytesIO
from ml_model.model import MyModel
import numpy as np

# declare constants
HOST = ''
PORT = 8888

# initialize flask application
app = Flask(__name__)

# Read model to keep it ready all the time
model = MyModel('./ml_model/trained_weights.pth', 'cpu')

# Application template html/css/js
def home():
    return render_template("home.html")

# Prediction method
@app.route('/predict', methods=['GET','POST'])
def predict():
    results = {"prediction" :"Empty", "probability" :{}}

    # get data
    input_img = BytesIO(base64.urlsafe_b64decode(request.form['img']))

    # model.predict method takes the raw data and output a vector of probabilities
    res =  model.predict(input_img)

    results["prediction"] = str(CLASS_MAPPING[np.argmax(res)])
    results["probability"] = float(np.max(res))*100

    # output data
    return json.dumps(results)

if __name__ == '__main__':
    # run web server,
            debug=True,  # automatic reloading enabled

This will read an input image and return the prediction with the associated probability and show it in the HTML template file home.html available in the folder templates ( Commands to run this application using the docker image.

# Run a container
sudo ./ run -i -t --rm -v $(pwd):/home/root/ imadelh/jetson_pytorch_flask:arm_v1

# Now you are inside the container bash, go to the application directory and run
cd app

# See the github repo for more details
# The service will be running at localhost:8888

For information, Flask native “webserver” is not meant for a production environment that would scale to a large number of requests. Other tools may be used for that purpose such as Gunicorn (


This project is a proof of concept on how to deploy an ML model on Jetson Nano. Real-world Machine Learning applications that would scale to a higher number of users would require a better architecture.


  • Hyperparameter optimization ;
  • Neural network pruning ;
  • Center and crop image ;
  • Show top-n predictions ;
  • Model versionning ;
  • Load balancer in a cluster ;
  • Add correction and submit options - Online Re-training.

‘Out of small things, greater things have been produced’.