Determine gender and age from the photo





In the practice of internal audit, there are tasks in which it is necessary to carry out a check to identify incorrect input of customer data. One of these problems may be the discrepancy between the entered data and the client's photo at the time of registration of the product.



For example, the following information is available: gender, age and link to the photo. To check compliance, we will use the py-agender library of the Python language.



The library works in two stages. First, opencv determines the position of the face in the photo. On the second, the neural network of the EfficientNetB3 architecture, which is trained on the UTKFace DataSet, determines the gender and age of the owner of the face in the photo.



First, let's import the required libraries:



import cv2
from pyagender import PyAgender


Let's create a gender and age detector object:



agender = PyAgender()


Upload a photo using opencv:



img = cv2.imread("habensky.jpeg")


Next, we define the characteristics of the face using the detect_genders_ages method of the agender object:



face_info = agender.detect_genders_ages(img)


where the variable face_info contains the following information:



[{'left': 0,
  'top': 5,
  'right': 299,
  'bottom': 299,
  'width': 299,
  'height': 294,
  'gender': 0.0075379927,
  'age': 41.585840644804094}]


Here the parameters ('left', 'top', 'right', 'bottom', 'width', 'height') characterize the position of the face in the photo. The gender parameter characterizes belonging to a particular gender, where 0 corresponds to a man, 1 - to a woman. Those. after image processing, at a threshold of 0.5, we divide the processed sample into men and women.







The algorithm determined that this image represents a man (the gender value is very close to zero: 0.0075379927), and also that in this photo he is 41 and a half years old (41.5858), I don’t know how old Konstantin Khabensky is in this photo, but I think the algorithm is close to the truth.



A good example, a crisp image, and a pretty impressive result. However, when you start applying the algorithm to real data, things are not as rosy as we would like. And the point here is not so much in the algorithm, but in the quality of the initial data.



In my case, it was a set of 1542 images with a resolution of 300x300. For 64 images, the algorithm was unable to determine the characteristics of the face. The main reason is poor illumination at the time of photographing (faces are almost invisible). For 1478 images, the median age error was 4.96 years. The figure below shows the distribution of the error:







For 8.5% of images (125 out of 1478), the algorithm made a mistake in determining the sex of a person. Of these, in 122 cases, the algorithm mistook a woman for a man. Again, don't blame the algorithm in all cases. In most of the erroneous examples, there are many faces with glasses that may obscure some facial features. The figure below shows the age distribution for the UTKFace DataSet:







You can see that most of the dataset contains images of people aged 20-40. Despite this, the algorithm most often made a mistake precisely for the specified interval, i.e. most likely the errors are related to the peculiarity of the data set for which the algorithm was applied. The figure below shows the distribution of the age of people on which the algorithm made a mistake:







Py-Agender is an interesting tool that can help automate a number of routine tasks, or at least reduce the sample size for manual analysis. The article provides an estimate for a specific dataset, perhaps in your case the algorithm will work more efficiently.



All Articles