Watch my shoulders: the developers have created an algorithm that recognizes printed text from video



You can read not only on the lips during a conversation, but also on the hands that flutter over the keyboard. Thus, by the movements of the hands seen by the attackers during the video call, it is possible to guess passwords and code words typed on a traditional QWERTY keyboard.



American developers have created a special algorithm that can read the movements of the contours of the shoulders and arms of the user typing on the keyboard, analyze them and compare them with the entered characters. Next, the AI โ€‹โ€‹prepares a list of probably typed letters and numbers, which makes it possible, for example, to recover characters typed during authorization. And all this - in almost real time, during a video conference.



There is a way to determine the typed characters on the keyboard by the sound of keystrokes. This requires access to the target PC. The method, however, is inaccurate, as natural noise greatly interferes with perception and analysis. In addition, acoustic cryptanalysis does not work for keyboards with low-amplitude keystrokes.



The algorithm, invented by developers from the University of Texas at San Antonio, takes into account the speed of typing, the order in which the hands are used, monitors their movement, and counts the probable number of letters in a word. The application's arsenal includes a dictionary of the most popular words used as passwords. According to the researchers, video signal is less prone to distortion than audio.





The video analysis and decoding algorithm works as follows:



  • .
  • .
  • .
  • .
  • : .
  • .
  • , .






The researchers tested the algorithm under various conditions.



In one case, they used a dictionary of 65 thousand of the most popular words and gave out the 50 most probable ones in a selection. The accuracy depended on the web platforms used. The most accurate predicted words entered into Skype. Under identical conditions, this turned out to be 3.4% more accurate than Zoom and 8% more accurate than Hangouts.



In another case, they took a dictionary of 4 thousand words. But then 75% of the words entered were on the list of 200 most likely words.



An interesting nuance: the work of the algorithm is highly dependent on the clothes of the subjects. For example, people with bare hands are more susceptible to attack. The recognition accuracy of the entered characters when the participants in the experiment were in sleeveless clothes was 81.7%, versus 74.4% and 73% of the accuracy with long and short sleeves, respectively.



The type of keyboard and the distance between keys affect the typing style and recognition accuracy. But, as it turned out, this is not so essential for accuracy. The Logitech keyboard is significantly larger than the Anker, but the accuracy is almost identical.



In addition to testing in a laboratory setting, the developers observed 10 participants in their typical home environment: seven men and three women. All participants had roughly the same typing speed of 3.7 clicks per second and an error rate of 86.7%. For the purity of the experiment, a number of limiting conditions were introduced: call duration of 30 minutes, recommended ten-minute PC activities, etc.



As a result of the experiment, it turned out that at home, not everyone used the position of the cameras, similar to laboratory conditions. In addition, a different resolution of webcams affected the accuracy of the information produced by the algorithm. In one case, the hair completely covered the forearm area, depriving the algorithm of the ability to analyze. So in general, it's not that difficult to defend yourself.






All Articles