Character recognition

Working with images is one of the most common tasks in machine learning. We will show an example of image processing, obtaining matrices (tensors) of numbers, preparing data for a training set, an example of a neural network architecture.

Working with images is one of the most common tasks in machine learning. An ordinary picture, perceived by a person unambiguously, has no meaning and interpretation for a computer only if there is no pre-trained neural network that is capable of assigning the image to one specific class. For such a neural network to work, it is necessary to train it on training data, images previously processed and fed to the input of the neural network in the form of a matrix of numbers characterizing a certain tone (color) at a certain position in the image. This article provides an example of image processing, obtaining matrices (tensors) of numbers, preparing training set data, an example of a neural network architecture.

: (CAPTCHA). , . :

  • ;

  • ;

  • ;

  • , .

Fig. 1 example images (CAPTCHA)

100 Β«.pngΒ». 29 Β«12345789Β». ( -1Β° –+15Β°), , . , . ( ). , python 3 opencv, matplotlib, pillow. :

import cv2 #    . 
image = cv2.imread('.\Captcha.png') #  .  numpy array
#   (img, (x1, y1), (x2, y2), (255, 255, 255), 4) – 
    #    ,   ,   , 
    #     BGR,  .
image = cv2.line(image, (14, 0), (14, 50), (0, 0, 255), 1)
#     ,   .,  
def view_image(image, name_wind='default'):
    cv2.namedWindow(name_wind, cv2.WINDOW_NORMAL) # #   
    cv2.imshow(name_wind, image) # #     image
    cv2.waitKey(0) # #    , 0  .
    cv2.destroyAllWindows() # #  ()  
view_image(image)  # #       

Figure:  2 example defining a character range
. 2

matplotlib, RGB: BGR , , . matplotlib ( ) .

, . . 3 , 47. ( ): 14–44 , : 32–62 , : 48 –72. opencv numpy array, (50, 100, 3). 3 , 50 100 . BGR (blue , green , red ),   3- 0-255.

Fig. 3 RGB color model
.3 RGB

. , , , . B(n-m) G(k-l) R(y-z). HSV (Hue, Saturation, Value β€” , , ). opencv Heu 0 – 179, S 0 – 255, V 0 –255. Heu S 10 – 255, V 0 – 234, , .

Figure 4 RGB (BGR) and HSV color models
# #   BGR    HSV
image = cv2.imread('.\captcha_png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

HSV (50, 100, 3) (3 numpy array (50, 100), 50 , 100 ). β€” [:, :, 0] Hue, [:, :, 1] Saturation, [:, :, 2] Value.

Original image

(0 – 255 – ).

[:,:, 0] , .. 179, 160 – 179 0~30 , 60 ~ 100 , 110 ~ 150 . 9 .. 160, «» .. 0~30

[:,:, 1] , , 0~10, >10

[:,:, 2] , 240 ~ 255, < 240.

S V ( ), Hue ().

mask_S = image[:, :, 1]&lt; 10; mask_V = image[:, :, 1] > 240

: (50, 100) [[True, False, ..,], …, [..]]. . (Hue) , 255, 0-179, 255 – ( , Hue).

#  255      0 - 179
image[:, :, 0][mask_S] = 255 ; image[:, :, 0][mask_V] = 255

Fig. 5 Result background and part of noise have values ​​of 255
.5 255

β€” . , .

Fig. 6 Separating characters into specific ranges
img_char1 = image[3: 47, 14: 44, 0].copy()
img_char2 = image[3: 47, 32: 62, 0].copy()
img_char3 = image[3: 47, 48: 78, 0].copy()

, , ( 255 (500 – 800 ), , ). N -10, N + 10.

Fig. 7 Definition of areas of 1 and 3 characters where there is no data of the 2nd character
.7 1 3 , 2-

1 3 2 . , .

#     ,   
val_count_1 = img_char1[3: 47, 14: 32, 0].copy().reshape(-1) 
val_color_hue_1 = pd.Series(val_count_1).value_counts()
# val_color_hue_1 ->255 – 741, 106 – 11, 104 – 11, 20 – 1, 99 – 1.
val_color_hue_1 = pd.Series(val_count_1).value_counts().index[1] 
#    ,    Hue -10, +10.
val_color_char_hue_1_min = val_base_hue_1 – 10 = 106 - 10 = 96
val_color_char_hue_1_max = val_base_hue_1 + 10 = 106+ 10 = 116

Hue 1, 3 , 0, 255.

mask_char1 = (img_char1> 96) &amp; (img_char1&lt;116)
img_char1[~mask_char1] = 255 #    (  ) img_char1[mask_char1] = 0 #  

Fig. 8 Displaying the result as a pandas dataframe
.8 pandas dataframe

0 1 .

img_char1[img_char1 == 0] = 1; img_char1[img_char1 == 255] = 0

2- , , 1 3 , 2- 255 2, 1 3 .

Fig. 9 Removing the 1st and 3rd characters from the 2nd data character
.9 2- 1 3-

2. 1, 2, 3 – 0, 1. , . ,   opencv, ( ) ,

kernel = np.ones((3, 3), np.uint8)
closing = cv2.morphologyEx(np_matrix, cv2.MORPH_CLOSE, kernel)

Fig. 10 Correction of data, filling in gaps
.10 ,

, , , .

Fig. 11 Location of symbols in the middle of the matrix

. . ~100 , . 300 ( 44Γ—30 0 1). . , . pillow python, 44Γ—30, , nympy array. .

shift_x = [1, 1, -1, -1, -2, 2, 0, 0, 0]
shift_y = [1, 1, -1, -1, -2, 2, 0, 0, 0]
rotor_char = [15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1]
char = '12345789'
#     – ~10_000 – 60_000
shift _x_r = random.choice(shift_x)
shift _y_r = random.choice(shift_y)
rotor_r = random.choice(rotor_char)
char_r = random.choice(char)

Fig. 13 An example of matched text and placement of symbols in matrices
train_x = []
train_x = np.array(train_x)
train_x = train_x.reshape(train_x.shape[0], train_x[1], train_x[2], 1)


(50000, 44, 30, 1), (1) .

: char_y = [0, 4, …, 29] – 50_000 ( 0-29

char = '12345789' # 29 
dict_char = {char[i]: i for i in range(len(char))}
dict_char_reverse = {i[1]: i[0] for i in dict_char.items()}


(one-hot encoding). , 29. . . , «» β€˜000000000100000000000000000000’.

Img_y = utils.to_categorical(Img_y)
#  1 -> (array( [1, 0, 0, 0, …, 0, 0],  dtype=float32)
#  2 -> (array( [0, 1, 0, 0, …, 0, 0],  dtype=float32)



x_train, x_test, y_train, y_test = sklearn.train_test_split(
                             out_train_x_rsh, out_train_y_sh, 
                             test_size=0.1, shuffle=True)


, mnist ( 28Γ—28) kaggle . :

Import tensorflow as tf

def model_detection():
        tf.keras.layers.Conv2D(input_shape=(44,30, 1), filters=32, 
                kernel_size=(5, 5), padding='same', activation='relu'),
        tf.keras.layers.Conv2D( filters=32, kernel_size=(5, 5), 
                               padding='same', activation='relu'),
        tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
        tf.keras.layers.Conv2D( filters=64, kernel_size=(3, 3), 
                padding='same', activation='relu'),
        tf.keras.layers.Conv2D( filters=64, kernel_size=(3, 3), 
                padding='same', activation='relu'),
        tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
        tf.keras.layers.Dense(256, activation='relu'),
        tf.keras.layers.Dense(29, activation=tf.nn.softmax)])

    model.compile(optimizer='adam', loss='categorical_crossentropy',

model = model_detection()


, (valaccuracy).

checkpoint = ModelCheckpoint('captcha_1.hdf5', monitor='val_accuracy',
                                        save_best_only=True, verbose=1), y_train, epochs=5, validation_data=(x_test, y_test), 
          verbose=1, callbacks=[checkpoint])


valaccuracy, . : , . β€” numpy array (). , . (1, 2, 3 ). . Β« – Β» .

model2 = model_detection() # 
model2.load_weights('captcha_1.hdf5') #  
prediction_ch_1 = model2.predict(char_1) #  29  
#     ,    
prediction_ch_1 = np.argmax(prediction_ch_1, axis=1)
#    ,      


This algorithm processes color images containing letters and numbers, the result of character recognition by a neural network is 95% (accuracy), and captcha recognition is 82% (accuracy). Using the example of parsing the character recognition algorithm, you can see that the main part of the development is occupied by the preparation, processing and generation of data. Choosing an architecture and training a neural network is an essential part of the task, but not the most time-consuming. Options for solving the problem of recognizing numbers, letters, images of objects, etc. set, this article provides only one example of a solution, shows the steps of the solution, the difficulties that can be encountered as a result of the work and examples of how to overcome them. How do you work with captchas?

All Articles