Short description
The distinction between a real person's face and a false one in a camera is still one of the most difficult problems in control systems and access to premises. An algorithm is proposed for detecting blinking of eyes in real time in a video sequence from a standard camera, which gives us the fact that the person in the camera is real. The latest landmark detectors, trained on datasets in the field, show excellent resistance to head orientation relative to the camera, changing lighting conditions and facial expressions. We show that landmarks are detected accurately enough to reliably estimate the level of eye opening. Thus, the proposed algorithm estimates the positions of landmarks, extracts one scalar - fundus ratio (EAR) - characterizing the eye opening in each frame. Finally,the SVM classifier detects eye blinking as a pattern of EAR values โโin a short time window. The simple algorithm outperforms modern results on two standard datasets.
Dlib library
In this article, I use the facial indexes for the dlib face regions . The facial landmark detector implemented inside the dlib library produces 68 (x, y) coordinates that are mapped to specific facial structures. These 68 point mappings were obtained by training the shape predictor on the labeled iBUG 300-W dataset .
Below we can visualize what each of these 68 coordinates is being mapped to: Figure 1 - Rendering each of the 68 facial coordinate points from the iBUG 300-W dataset By examining the image, we can see that the facial regions can be accessed through simple Python indexing ( assuming zero indexing with Python as the image above is one indexed):
- The mouth can be accessed through points [49, 68].
- Right eyebrow through points [18, 22].
- Left eyebrow through points [23, 27].
- The right eye through the points [37, 42].
- The left eye through the points [43, 48].
- The nose through the points [28, 36].
- And the jaw through the points [1, 17].
Understanding Eye Proportion (EAR). We can apply facial landmark detection to localize important areas of the face, including the eyes, eyebrows, nose, ears, and mouth:
Figure 2 - Real-time detection of facial landmarks in a picture
This also implies that we can extract certain facial structures by knowing the indices of certain parts of the face:
Figure 3- Application of facial landmarks to localize different areas of the face, namely the right eye and mouth
Blink detection
In terms of blink detection, we are only interested in two sets of facial structures - the eyes. Each eye is represented by 6 (x, y) coordinates, starting at the left corner of the eye (as if you were looking at a human) and then working clockwise around the rest of the area:
Figure 4 - 6 facial landmarks associated with the eye
Based on In this image, we must pick up at a key point: There is a relationship between the width and height of these coordinates. Building on the work of Soukupova and Cech in their 2016 paper "Detecting Real Time Eye Blinking Using Facial Landmarks," we can then derive an equation that reflects this relationship, called Eye Aspect Ratio (EAR):
Figure 5 - Eye Proportion Equation
where p1,โฆ, p6 are 2D landmarks on the face. The numerator of this equation calculates the distance between the landmarks of the vertical eye, while the denominator calculates the distance between the landmarks of the horizontal eye, weighing the denominator accordingly, since there is only one set of horizontal points, but two sets of vertical points.
Well, as we know, the aspect ratio of the eye is approximately constant when the eye is open, but quickly drops to zero when the blinking occurs. Using this simple
equation, we can avoid image processing techniques and simply rely on the ratio of the distance to the eye's point of view to determine if a person is blinking. To make it clearer, consider the following figure:
Figure 6 - Visualization of eye landmarks
In the upper left corner, we have a fully open eye - the aspect ratio here will be large ยฎ and relatively constant over time. However, as soon as a person blinks (top right), the aspect ratio of the eye decreases dramatically, approaching zero. The top figure shows a graph of the aspect ratio of an eye for a video clip. As we can see, the aspect ratio of the eye is constant, then quickly drops to close to zero, and then increases again, which indicates one blink.
Figure 7 - Eye Blink Detection
Exceptions
However, due to noise in the video stream, low detection of facial landmarks, or rapid changes in viewing angle, a simple threshold of the eye aspect ratio can lead to a false positive detection, indicating that blinking occurred when the subject did not actually blink. As we read in one medical article, a person blinks an average of 20 times per minute, which tells us that he blinks once every 3 seconds.
Based on this, in order to make our blinking detector more resistant to these problems, we made the time interval before reading the blinking, 3 seconds must pass, and at least 3 frames must be taken when blinking is detected. The results of our research have given very good results. The detector worked exactly. Of the twenty tests, eighteen tested positive.
The problems of this approach
There are also unsolved problems with this approach. If you show a video on the camera through technical means that shows the face of a person who blinks the system can lead to a false-positive detection. The solution to this problem can be carried out with the help of an image steopair where, using two cameras, we can get a depth map and calculate the distance to the object.
Problem solving
This shows the operation of stereopair cameras. After the images are rectified, a search is performed for the corresponding pairs of points from the two images. The easiest way is illustrated in Figure 8 and is as follows. For each pixel of the left picture with coordinates (x0, y0), a pixel is searched for in the right picture. It is assumed that the pixel on the right picture should have coordinates (x0 - d, y0), where d is a quantity called disparity. The search for the corresponding pixel is performed by calculating the maximum of the response function, which can be, for example, the correlation of the pixel neighborhoods.
Figure 8 - Depth map calculations