How I Taught Samsung AI / ML / DL Course

Hello. I will tell you about my view of AI, so to speak, from within the process. In the sense of the educational and scientific process.



It so happened that in 1998 I entered graduate school at the Russian State Agricultural Academy and chose AI / ML as the topic of my scientific work. These were the harsh times of the next ice age of neural networks. It was at this time that Yang Lecun published his famous work "Gradient-Based Learning Applied to Document Recognition" on the principles of organizing convolutional networks, which, in my opinion, was just the beginning of a new thaw. It's funny that at that time I was working on some similar elements, because it is true that the idea, when its time comes, is in the air. However, not everyone is given to bring it to life. Unfortunately, I never finished my work until defense, but I always wanted to finish it someday.





Source:Hitecher



And now, after 20 years, when I began to work as a teacher at the Southern Federal University and at the same time teach in the additional education program "Samsung IT School", I had a second chance. Samsung offered SFedU to be the first to launch the training track "Samsung IT Academy" on artificial intelligence for bachelors and masters. I had some concerns that it would be possible to implement the entire curriculum in full, but I enthusiastically responded to the offer to read the course. I realized that the circle was closed, and I still had a second chance to do what I had once failed. It should be noted here that the Samsung AI / ML course is one of the best currently open Russian-language courses available for free on the Stepik platform ( https://stepik.org/org/srr). However, in the case of a university program, in addition to the theoretical / practical course, the project part was added. That is, the annual curriculum of "Samsung IT Academy" was considered mastered in the case of studying two modules "Neural Networks and Computer Vision", "Neural Networks and Text Processing" with the receipt of the corresponding Stepik certificates, as well as the implementation of an individual project. The course ended with the defense of students' projects, to which experts were invited, incl. employees of the Moscow Center for Artificial Intelligence Samsung.



And since September 2019 we have started a course at the Institute of High Technologies and Piezotechnics of the SFedU. Of course, a fairly large number of students came to the HYIP and subsequently there was a serious dropout. The program was not that very complicated, but voluminous - knowledge was required:



  • linear algebra,

  • probability theory,

  • differential calculus,

  • the Python programming language.



Of course, all the required knowledge and skills do not go beyond the curriculum of the 3rd year undergraduate program of the university. I will give a couple of examples, of those that are more complicated:



  • Find the derivative of the activation function of the hyperbolic tangent and express the result in terms ofth(x)... 

  • Find the derivative of the sigmoid activation function and express the result in terms of the sigmoidσ(x)... 

  • In the graph of calculations shown in Fig. 1 shows a complex functiony with parameters b1,b2,c1,c2... For convenience, added intermediate results of operations asz1÷z9... It is necessary to determine what the derivative will be equal toy by parameter b1









To be honest, I hastily studied something, especially from modern algorithms for working with neural networks, with students. Initially, it was assumed that students themselves would study video lectures of the Samsung online course on Stepik, and in the classroom we would only do workshops. However, I made the decision to read the theory as well. This decision is due to the fact that with the teacher you can sort out an incomprehensible topic, discuss the ideas that have arisen, etc. Students received practical tasks in the form of homework assignments. The approach turned out to be correct - in the classroom, a lively atmosphere was obtained, I saw that the students in general were quite successful in mastering the material. 



A month later, we smoothly moved from the neuron model to the first simple fully connected architectures, from simple regression to multi-class classification, from simple gradient computation to gradient descent optimization algorithms SGD, ADAM, etc. We completed the first half of the course with convolutional networks and modern deep network architectures. The final task of the first Computer Vision module was to take part in the " Dirty vs Cleaned " competition at Kaggle, overcoming the accuracy threshold of 80%. 



Another, in my opinion, important factor: we were not closed inside the university. The track organizers held webinars and master classes for us with invited experts from Samsung laboratories. Such events increased the motivation of students, and mine, to be honest :). For example, there was an interesting career guidance event - an online bridge between the classrooms of SFedU, Moscow State University and Samsung, where employees of the Moscow AI Center Samsung talked about modern trends in AI / ML development and answered students' questions.



The second part of the course, devoted to text processing, began with a general theory of linguistic analysis. Then the students were introduced to the vector and TF-IDF text models, and then the distribution semantics and word2vec. Based on the results, several interesting workshops were held: generating word2wec embeddings, generating names and slogans. Then we moved on to the theory and practice of using convolutional and recurrent networks for text analysis.



While the point is yes, I published an article in the VAK journal and began to prepare the next one, gradually gathering material for a new dissertation. My students also did not sit still, but began to work on their first projects. Students chose topics on their own, and as a result, they got 7 graduation projects in different areas of application of neural networks:



  1. « » , .

  2. « » .

  3. « » .

  4. « » .

  5. « » .

  6. « » .

  7. « » , .







All projects were defended, but the degree of complexity and elaboration was different, which, quite rightly, was reflected in the assessments for the projects. Based on the defense results, four projects were selected for the annual Samsung IT Academy competition . And I can proudly say that the jury awarded two of our projects top places. Below I will give a brief description of these projects, based on the materials provided by my students Grateful Alexander, Krikunov Stanislav and Pandov Vyacheslav, for which many thanks to them. I believe that the solutions they have demonstrated may well be assessed as serious research work.



1st place in the "Artificial Intelligence" nomination of the "Samsung IT Academy" competition.

"Monitoring of human physical activity", Alexander Grateful, Stanislav Krikunov



The project was to create a mobile application that identifies and quantifies physical activity in training using mobile phone sensors. Now there are many mobile applications that can recognize a person's physical activity: Google Fit, Nike Training Club, MapMyFitness and others. However, these apps cannot recognize certain types of exercise and count the number of repetitions.

One of the authors of the project, Grateful Alexander, my 2015 graduate of the Samsung IT School program, and I, not without pride, was glad that the knowledge gained in mobile development at school was applied in such a way.





How is physical activity recognized? Let's start with how the timing of the exercise is determined. To detect the beginning and end of the exercises, the students decided to use the acceleration modulus, calculated as the root of the sum of the squares of the accelerations along the axes. A certain threshold was chosen, with which the current acceleration value was compared. If the threshold is exceeded (the derivative of the acceleration is positive), then we consider that the exercise has begun. If the current acceleration is below the threshold (the derivative of the acceleration is negative), then we consider that the exercise is over. Unfortunately, this approach does not allow for real-time processing. A possible improvement is the use of a sliding window on the data with the calculation of the result at each step of the shift.



The dataset was collected by the authors themselves. When performing 7 different exercises, 3 types of smartphones were used (Android versions 4.4, 9.0, 10.0). The smartphone was attached to the hand using a special pocket. A total of 1800 repetitions were performed by three volunteers. During execution, errors in the technique could arise for any reason, therefore, a sample cleaning procedure was carried out. For this, the distributions of cross-correlations were built for all types of exercises. Then, for each exercise, a correlation threshold was selected, below which the exercise is considered unsuitable and is excluded from the sample.   



The same exercise, depending on the repetition, has a different execution time. To combat this, it was decided to interpolate the data with a fixed number of samples, regardless of how many came from the sensors. Received 50 - double the sampling rate, calculating intermediate positions as the arithmetic mean of the neighboring ones. Received 200 - discard every 2 count. In this case, the number of samples will be constant. Similarly, for any ratio of the input number of samples to the desired output number.



For the neural network, it was decided to apply data in the frequency domain. Since a person's body weight is quite large, one can expect that the characteristic signal frequencies will lie in the low-frequency region of the spectrum in most standard exercises. In this case, high frequencies can be considered either a jitter during execution, or noise from the sensors. What does it mean? This means that we can find the spectrum of the signal using the FFT and only use 10-20% of the data for analysis. Why so little? Since 1) the spectrum is symmetric, you can immediately cut off half of the components 2) basic information - only 20-40% of the informative part of the spectrum. These assumptions describe slow strength exercises especially well.





Normalized time series for different exercises





Normalized spectrum for different exercises



Before processing by the neural network, the data spectrum is normalized to the maximum value among the three axes in order to fit all exercise samples into the 0-1 amplitude range. In this case, the proportions between the axes are preserved.



The neural network performs the task of classifying exercises. This means that it produces a vector of probabilities for all exercises from the list by which it was trained. The index of the maximum element in this vector is the number of the exercise performed. Moreover, if the confidence in the performed exercise is less than 85%, then it is considered that none of the exercises was performed. The network consists of 3 layers: 4 convolutional, 3 fully connected, the number of output neurons is equal to the number of exercises that we want to recognize. In the architecture, to save computational resources, only convolutions with a 3x3 core size are used. The relatively simple network architecture is justified by the limited computing resources of smartphones; in our task, recognition with a minimum delay is required. 





Description of the neural network architecture



The neural network training strategy is training by epochs using batch normalization to the training data until the loss function on the training sample reaches its minimum value.



Results: with more or less high-quality exercise performance, the network confidence is 95-99%. On the validation set, the accuracy was 99.8%.





Error during training on a validation set





Error matrix for a neural



network The neural network was built into a mobile application and showed similar results as in training.



The study also tested other machine learning models used today to solve classification problems: logistic regression, Random forests, XG Boost. For these architectures, Tikhonov regularization (L2), cross-validation and gridsearch were used to find the optimal parameters. As a result, the accuracy indicators were as follows:



  • Logistic regression: 99.4%

  • Random forests: 99.1%

  • XG Boost: 97.5%



The knowledge gained during training at the Samsung IT Academy helped the authors of the project expand the horizons of their interests and made an invaluable contribution when entering the master's program at the Skolkovo Institute of Science and Technology. At the moment, my students are doing research there in the field of machine learning for communication systems. 



Code on GitHub



II « » «IT Samsung».

« »,







The model's work is well described on this slide:







It all starts with a photograph. In the presented implementation, it comes from a Telegram bot. Using it, Dlib frontal_face_detector finds all the faces in the image. Then 68 key 2D points of each face are detected using Dlib shape_predictor_68_face_landmarks. Each set is normalized as follows: centered (subtracting the average of X and Y) and scaled (dividing by the absolute maximum of X and Y). Each coordinate of the normalized point belongs to the interval [-1, +1].



Then the neural network comes into play, which predicts the depth of each key point of the face - the Z coordinate, using the normalized coordinates (X, Y). This model was trained on the AFLW2000 dataset.



Further, these points are connected to each other, forming a mesh mask. It can also be called face biometrics. The lengths of the segments of such a mask are used as one of the ways to define emotions. The idea is that each line segment has its own place in the line segment vector and some of them depending on the emotion. And each emotion, in theory, has a limited number of such vectors. This hypothesis was confirmed in the course of experiments. To train such a model, the following datasets were used: Cohn-Kanade +, JAFFE, RAF-DB.



In parallel, another network is learning to classify emotions by the image itself. Face images are cut out from the rectangles found with Dlib. Converted to single-channel black and white and compressed to 48x48. To train this model, the same datasets were used as for the biometrics model. However, the FER2013 dataset was additionally used.



In conclusion, the third neural network comes into operation, the architecture of which combines the two previous frozen and pretrained networks with a trained layer. These networks also override the last fully connected layers. Instead of the expected "vector of probabilities" by which the target class can be determined, more "low-level features" are now returned. And the unifying layer is trained to interpret this information into the target class.



Among the "similar solutions" are the following: EmoPy, DLP-CNN (RAF-DB), FER2013, EmotioNet. However, it is difficult to make comparisons because they were trained on different data. 



Code on GitHub



Conclusion



In conclusion, I would like to say that the pilot course has shown its worth, and in this 2020/21 academic year, the program is already being taught in 23 universities that are partners of the Samsung IT Academy in Russia and Kazakhstan. The complete list can be seen here . This year a group of masters and bachelors is already studying with us (there is even one whole Ph.D. in the group!) And so far, in the bulk, the granite of science is successfully gnawing. Ideas for an individual project have yet to be found, but students are full of optimism. Of course, in the next competition of individual projects, the competition will increase tenfold, but we hope to continue to receive high marks for the achievements of our students. And most importantly, I am sure that the knowledge and experience gained will be of great help for our graduates in their further development in the field of IT.



2020 Rostov-on-Don. SFedU, IT Academy Samsung.





Dmitry Yatsenko

Senior Lecturer of the Department of Information and Measuring Technologies, Faculty of High Technologies, Southern Federal University,

Lecturer of the Samsung IT School,

Lecturer of the AI ​​IT Track at the Samsung Academy. 



All Articles