An example of eye tracking for a participant without fatigue (left) and with mental fatigue (right) as they track an object following a circular path.
Eye movement is widely studied by vision , language and usability professionalssince the 1970s. Beyond basic research, better understanding of eye movement can be useful in a wide variety of applications, including usability and user experience research, gaming, driving, and gaze-based interactions for health accessibility. However, progress was limited because most of the previous research focused on specialized hardware eye trackers that were expensive and difficult to scale.
In «the Accelerating eye movement research Via an affordable and accurate eye tracking a smartphone» , published in Nature Communications , and « the Digital a biomarker of mental not fatigue », published in npj Digital Medicine , we are introducing accurate smartphone eye tracking and machine learning that has the potential to unlock new app research in the areas of vision, accessibility, health and wellness, while further enabling scalability to different populations around the world, all with using the front camera on your smartphone. We are also discussing the potential use of this technology as a digital biomarker for mental fatigue, which may be useful in improving well-being.
Model overview
The core of our gaze model was a feed-forward multilayer convolutional neural network (ConvNet) trained on the MIT GazeCapture dataset . The face detection algorithm selected an area of the face with corresponding landmarks at the corners of the eyes, which were used to crop the images just to the area of the eyes. These cropped frames were passed through two identical ConvNet towers with the same weight. Each convolutional layer was followed by a middle merging layer . The landmarks at the corners of the eyes were merged with the exit of the two towers through fully connected layers. Straightened linear units (ReLU) was used for all layers except the last fully connected output level (FC6), which was not activated.
The architecture of the non-personalized gaze model. The eye areas extracted from the front camera image serve as input to the convolutional neural network. Fully linked (FC) layers combine the output with landmarks at the corners of the eye to output the on-screen X and Y coordinates through the multiple regression output layer.
The accuracy of the non-personalized gaze model has been improved through fine tuning and personalization for each participant. For the latter, a lightweight regression model was fitted to the penultimate layer of the ReLU model and data for a specific participant.
Model evaluation
To evaluate the model, we collected data from concordant study participants when they viewed dots that appeared at random locations on a blank screen. The model error was calculated as the distance (in cm) between the location of the stimulus and the model's prediction. The results show that although the non-personalized model has a high margin of error, personalization with ~ 30 seconds of calibration data resulted in a more than fourfold reduction in error (from 1.92 cm to 0.46 cm). With a viewing distance of 25–40 cm, this corresponds to an accuracy of 0.6–1 °, which is a significant improvement over the 2.4–3 ° reported in previous work [1, 2].
Additional experiments show that the accuracy of the smartphone's eye-tracker model is comparable to the accuracy of modern wearable eye-trackers, both when the phone is placed on a device stand and when users freely hold the phone in their hand almost in front of their head. Unlike specialized eye tracking equipment with multiple infrared cameras near each eye, launching our model using a single front-facing RGB camera on a smartphone is significantly more economical (about 100 times cheaper) and more scalable.
Using this smartphone technology, we were able to replicate key findings from previous eye movement research in neuroscience and psychology, including standard oculomotor tasks (for understanding basic visual brain functions) and natural understanding of images. For example, in a simple prosaccade problem that tests a person's ability to quickly move their eyes in the direction of a stimulus that appears on the screen, we found that the average saccade delay (time to move the eyes) was consistent with previous work.for basic ophthalmic health (210 ms versus 200-250 ms). In guided visual search tasks, we were able to reproduce key results such as the effects of target visibility and clutter on eye movements.
Examples of gaze scan trajectories show the effect of target visibility (ie, color contrast) on the effectiveness of visual search. Fewer fixations are required to find a high-signature target (left) (other than distractors), while more fixations are required to find a low-signature target (right) (similar to distractors).
For complex stimuli such as natural images, we found that the gaze distribution (calculated by aggregating gaze positions across all participants) from our smartphone eye tracker was similar to that obtained from bulky, expensive eye trackers that used tightly controlled settings such as laboratory focus systems for chin. Although gaze heatmaps on a smartphone are more widespread (ie, appear more "blurry") than hardware eye trackers, they are highly correlated at both the pixel level (r = 0.74) and at the object level (r = 0 , 90). These results suggest that this technology can be used to scale up gaze analysis for complex stimuli such as natural and medical imaging (e.g. radiologists review MRI / PET scans).
Heatmap of the gaze when using our smartphone compared to the more expensive (100x) eye tracker ( OSIE dataset )
We found that the smartphone can also help detect reading comprehension difficulties. Participants who read passages spent significantly more time searching for relevant passages when they answered correctly. However, as the difficulty of understanding increased, they spent more time studying irrelevant passages in the text before finding a suitable passage containing the answer. The proportion of gaze time spent at the relevant passage was a good indicator of comprehension and strongly negatively correlated with difficulty in comprehension (r = -0.72).
Digital biomarker of mental fatigue
Gaze detection is an important tool for determining alertness and health status, and it is widely studied in medicine, sleep research and critical conditions such as medical operations, flight safety, etc. However, existing fatigue tests are subjective and often take time. In our recent article published in npj Digital Medicine, we demonstrated that smartphone gaze is significantly impaired due to mental fatigue and can be used to track the onset and progression of fatigue.
A simple model reliably predicts mental fatigue using gaze data from participants on an assignment in just a few minutes. We validated these results in two different experiments — a language-independent object tracking task and a language-dependent validation task. As shown below, in the task of tracking an object, the gaze of the participants first follows the circular path of the object, but when they are tired, their gaze shows large errors and deviations. Given the ubiquity of phones, these results suggest that looking at a smartphone can serve as a scalable digital biomarker for mental fatigue.
An example of eye tracking for a participant without fatigue (left) and with mental fatigue (right) as they track an object following a circular path.
Corresponding progression of fatigue estimates (confidence) and model prediction as a function of task execution time.
Besides feeling good, looking at a smartphone can also provide a digital phenotype for screening or monitoring health conditions such as autism spectrum disorder , dyslexia , concussion , etc. This could allow timely and early intervention, especially for countries with limited access to health services.
Another area that can be of tremendous benefit is accessibility. In people with conditions such as ALS , locked-in person syndromeand stroke, speech and motor abilities are impaired. Looking at a smartphone can provide a powerful way to simplify everyday tasks by using the gaze to interact, as recently demonstrated with Look to Speak .
Ethical considerations
Eye research requires careful consideration, including the correct use of such technology - applications must receive full approval and fully informed consent from users to complete a specific task. In our work, all data was collected for research purposes with the full approval and consent of users. In addition, users were allowed to opt out at any time and request the deletion of their data. We continue to explore additional ways to make machine learning fair and to improve the accuracy and reliability of gaze technology across demographics in a responsible and confidential manner.
Conclusion
Our results in accurate and affordable machine learning-based eye tracking on smartphones open up the potential for large-scale studies of eye movement across domains (e.g., neuroscience, psychology, and human-computer interaction). They open up potential new applications for public good, such as eye-to-eye interaction for accessibility, and smartphone-based screening and monitoring tools for wellness and health.
Acknowledgments
-, . , : , , , ; , , ; , ; UXR: , . , .