Working with images is one of the most common tasks in machine learning. We will show an example of image processing, obtaining matrices (tensors) of numbers, preparing data for a training set, an example of a neural network architecture.
Working with images is one of the most common tasks in machine learning. An ordinary picture, perceived by a person unambiguously, has no meaning and interpretation for a computer only if there is no pre-trained neural network that is capable of assigning the image to one specific class. For such a neural network to work, it is necessary to train it on training data, images previously processed and fed to the input of the neural network in the form of a matrix of numbers characterizing a certain tone (color) at a certain position in the image. This article provides an example of image processing, obtaining matrices (tensors) of numbers, preparing training set data, an example of a neural network architecture.
: (CAPTCHA). , . :
;
;
;
, .
100 Β«.pngΒ». 29 Β«12345789Β». ( -1Β° β+15Β°), , . , . ( ). , python 3 opencv, matplotlib, pillow. :
import cv2 # .
image = cv2.imread('.\Captcha.png') # . numpy array
# (img, (x1, y1), (x2, y2), (255, 255, 255), 4) β
# , , ,
# BGR, .
image = cv2.line(image, (14, 0), (14, 50), (0, 0, 255), 1)
β¦
# , .,
def view_image(image, name_wind='default'):
cv2.namedWindow(name_wind, cv2.WINDOW_NORMAL) # #
cv2.imshow(name_wind, image) # # image
cv2.waitKey(0) # # , 0 .
cv2.destroyAllWindows() # # ()
view_image(image) # #
matplotlib, RGB: BGR , , . matplotlib ( ) .
, . . 3 , 47. ( ): 14β44 , : 32β62 , : 48 β72. opencv numpy array, (50, 100, 3). 3 , 50 100 . BGR (blue , green , red ), 3- 0-255.
. , , , . B(n-m) G(k-l) R(y-z). HSV (Hue, Saturation, Value β , , ). opencv Heu 0 β 179, S 0 β 255, V 0 β255. Heu S 10 β 255, V 0 β 234, , .
# # BGR HSV
image = cv2.imread('.\captcha_png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
HSV (50, 100, 3) (3 numpy array (50, 100), 50 , 100 ). β [:, :, 0] Hue, [:, :, 1] Saturation, [:, :, 2] Value.
(0 β 255 β ).
[:,:, 0] , .. 179, 160 β 179 0~30 , 60 ~ 100 , 110 ~ 150 . 9 .. 160, «» .. 0~30
[:,:, 1] , , 0~10, >10
[:,:, 2] , 240 ~ 255, < 240.
S V ( ), Hue ().
mask_S = image[:, :, 1]< 10; mask_V = image[:, :, 1] > 240
: (50, 100) [[True, False, ..,], β¦, [..]]. . (Hue) , 255, 0-179, 255 β ( , Hue).
# 255 0 - 179
image[:, :, 0][mask_S] = 255 ; image[:, :, 0][mask_V] = 255
β . , .
img_char1 = image[3: 47, 14: 44, 0].copy()
img_char2 = image[3: 47, 32: 62, 0].copy()
img_char3 = image[3: 47, 48: 78, 0].copy()
, , ( 255 (500 β 800 ), , ). N -10, N + 10.
1 3 2 . , .
# ,
val_count_1 = img_char1[3: 47, 14: 32, 0].copy().reshape(-1)
val_color_hue_1 = pd.Series(val_count_1).value_counts()
# val_color_hue_1 ->255 β 741, 106 β 11, 104 β 11, 20 β 1, 99 β 1.
val_color_hue_1 = pd.Series(val_count_1).value_counts().index[1]
# , Hue -10, +10.
val_color_char_hue_1_min = val_base_hue_1 β 10 = 106 - 10 = 96
val_color_char_hue_1_max = val_base_hue_1 + 10 = 106+ 10 = 116
Hue 1, 3 , 0, 255.
mask_char1 = (img_char1> 96) & (img_char1<116)
img_char1[~mask_char1] = 255 # ( ) img_char1[mask_char1] = 0 #
0 1 .
img_char1[img_char1 == 0] = 1; img_char1[img_char1 == 255] = 0
2- , , 1 3 , 2- 255 2, 1 3 .
2. 1, 2, 3 β 0, 1. , . , opencv, ( ) ,
kernel = np.ones((3, 3), np.uint8)
closing = cv2.morphologyEx(np_matrix, cv2.MORPH_CLOSE, kernel)
, , , .
. . ~100 , . 300 ( 44Γ30 0 1). . , . pillow python, 44Γ30, , nympy array. .
shift_x = [1, 1, -1, -1, -2, 2, 0, 0, 0]
shift_y = [1, 1, -1, -1, -2, 2, 0, 0, 0]
rotor_char = [15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1]
char = '12345789'
# β ~10_000 β 60_000
shift _x_r = random.choice(shift_x)
shift _y_r = random.choice(shift_y)
rotor_r = random.choice(rotor_char)
char_r = random.choice(char)
train_x = []
train_x.append(char)
train_x = np.array(train_x)
train_x = train_x.reshape(train_x.shape[0], train_x[1], train_x[2], 1)
(50000, 44, 30, 1), (1) .
: char_y = [0, 4, β¦, 29] β 50_000 ( 0-29
char = '12345789' # 29
dict_char = {char[i]: i for i in range(len(char))}
dict_char_reverse = {i[1]: i[0] for i in dict_char.items()}
(one-hot encoding). , 29. . . , «» β000000000100000000000000000000β.
Img_y = utils.to_categorical(Img_y)
# 1 -> (array( [1, 0, 0, 0, β¦, 0, 0], dtype=float32)
# 2 -> (array( [0, 1, 0, 0, β¦, 0, 0], dtype=float32)
.
x_train, x_test, y_train, y_test = sklearn.train_test_split(
out_train_x_rsh, out_train_y_sh,
test_size=0.1, shuffle=True)
, mnist ( 28Γ28) kaggle . :
#
Import tensorflow as tf
def model_detection():
model=tf.keras.models.Sequential([
tf.keras.layers.Conv2D(input_shape=(44,30, 1), filters=32,
kernel_size=(5, 5), padding='same', activation='relu'),
tf.keras.layers.Conv2D( filters=32, kernel_size=(5, 5),
padding='same', activation='relu'),
tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Conv2D( filters=64, kernel_size=(3, 3),
padding='same', activation='relu'),
tf.keras.layers.Conv2D( filters=64, kernel_size=(3, 3),
padding='same', activation='relu'),
tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(29, activation=tf.nn.softmax)])
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
returnmodel
#
model = model_detection()
, (valaccuracy).
checkpoint = ModelCheckpoint('captcha_1.hdf5', monitor='val_accuracy',
save_best_only=True, verbose=1)
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test),
verbose=1, callbacks=[checkpoint])
valaccuracy, . : , . β numpy array (). , . (1, 2, 3 ). . Β« β Β» .
model2 = model_detection() #
model2.load_weights('captcha_1.hdf5') #
prediction_ch_1 = model2.predict(char_1) # 29
# ,
prediction_ch_1 = np.argmax(prediction_ch_1, axis=1)
# ,
dict_char_reverse[prediction_ch_1]
This algorithm processes color images containing letters and numbers, the result of character recognition by a neural network is 95% (accuracy), and captcha recognition is 82% (accuracy). Using the example of parsing the character recognition algorithm, you can see that the main part of the development is occupied by the preparation, processing and generation of data. Choosing an architecture and training a neural network is an essential part of the task, but not the most time-consuming. Options for solving the problem of recognizing numbers, letters, images of objects, etc. set, this article provides only one example of a solution, shows the steps of the solution, the difficulties that can be encountered as a result of the work and examples of how to overcome them. How do you work with captchas?