AI: Serving HandWriting Model at Edge Device

Profile

This document provides an instruction to integrate the Monkey's model about handwriting to edge devices.

Tech Stack

TensorFlow Lite
Unity

Introduction

Handwriting is an AI model, which was converted to TensorFlow lite format supported by Google for multi-platforms.
The model can be downloaded at here

1. Hyperparameters

Some configurations of the model:

Name config	Description	Value
Input shape	Shape of input: It is an image converted to size width x height x channel 128 x 128 x 3 (BGR)	1 x 128 x 128 x 3
Output	It is a score array to predict each class. To get the prediction label, we will choose a max value in the output array	1 x 26

The mapping index to the prediction character will be shown below.

{"0": "a", "1": "b", "2": "c", "3": "d", "4": "e", "5": "f", "6": "g", "7": "h", "8": "i", "9": "j", "10": "k", "11": "l", "12": "m", "13": "n", "14": "o", "15": "p", "16": "q", "17": "r", "18": "s", "19": "t", "20": "u", "21": "v", "22": "w", "23": "x", "24": "y", "25": "z"}

Example: The model predicts for an image with output:

[[7.8083836e-03 1.0330592e-02 3.6540066e-04 1.1240702e-01 1.2986563e-01
  8.1596321e-05 2.7041843e-03 1.8760953e-02 1.5376755e-03 8.4590465e-05
  1.9241240e-02 2.4502007e-02 2.1457224e-01 1.2494331e-02 1.9096583e-02
  2.9417273e-04 2.1153286e-02 1.8904490e-02 6.2950579e-03 3.8062898e-03
  1.2752166e-01 2.5853007e-03 1.6490310e-01 3.3960843e-03 2.3415815e-03
  7.4946553e-02]]

We can see that index 22 is the maximum value (index starts from 0) with a confidence score of 16%. Mapping to label, it is character 'v'.

3. Pre-install for model

The model requires an image with a shape of 128 x 128 x 3
The model could recognize exactly. The image should be handled by logic and requirements

Text: black
Background: white
Object: center of image
Size of object = 50 - 60 % of the image.
Resize image (128 x 128 x 3)
Value type: Fp32

"""
def pre_process(self, img):

    # Cropping 
    mask = img != 255
    mask = mask.any(2)
    mask0,mask1 = mask.any(0),mask.any(1)
    colstart, colend = mask0.argmax(), len(mask0)-mask0[::-1].argmax()+1
    rowstart, rowend = mask1.argmax(), len(mask1)-mask1[::-1].argmax()+1
    img = img[rowstart:rowend, colstart:colend]
    img_h, img_w = img.shape[0], img.shape[1]

    # Padding 
    # Create white image
    img_size = img_w if img_w > img_h else img_h
    img_size = int(1.9 * img_size) 
    new_img = np.zeros([img_size , img_size , 3], dtype=np.uint8)
    new_img.fill(255)
    
    # insert text image into white image
    start_w = int((new_img.shape[1] - img_w ) /2)
    start_h = int((new_img.shape[0] - img_h ) /2)
    new_img[start_h : start_h + img_h,start_w : start_w + img_w ,:] = img[:,:,:]
    
    # thicken text.
    iterations = int(img_size / 128)
    if iterations > 1 :
        
        kernel = np.ones((5, 5), np.uint8)
        new_img = cv2.erode(new_img, kernel, iterations=iterations)
        
    # resize image
    new_img = cv2.resize(new_img, (self.size, self.size), interpolation = cv2.INTER_AREA)
    new_img = np.array(new_img, dtype=np.float32)
    return new_img

"""

Example input:

2. Installation

3. Usage API from Server

Live:https://app.monkeyenglish.net/mspeak/handwriting
Dev: https://ai.monkeyenglish.net/handwriting

Field	Description
URL
Method	POST
Header	'APIKEY': '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'
Body	{"image" : string base 64 of an image}, "pre_process": true , "target" : ""
Response	{

"status": true,
"text": [
    {
        "character": "z",
        "conf_score": 100.0
    },
    {
        "character": "o",
        "conf_score": 100.0
    },
    {
        "character": "c",
        "conf_score": 100.0
    }
],
"msg": ""

} |

Note:

pre_process: The system will pre-process the image to enhance the performance of the model
target: "a" - text wants to compare with predict (nullable). You can push anything
In response: text - is a prediction of the image, conf is a confident score (0-100%)