How AI Systems Aim to Simplify Sound Engineering

This weekend, we decided to talk about the developments of two American universities, which help to generate a sufficiently believable sound scale for silent videos.





Photo Free To Use Sounds / Unsplash



The difficult task of the noisemaker



Sounds for films and TV shows - for example, the rustle of rain - is very difficult to record in the right way right on the set at the time of shooting a particular fragment. There will be a lot of extraneous noise, conflicts with the voices of actors and other equipment are possible. For this reason, almost all sounds are recorded separately and mixed during editing. The noise makers are doing this .



If a movie needs to reproduce the sound of a broken window, then the sound designers go to the studio and start breaking glass under controlled acoustic conditions. The recording is carried out until the sound coincides with what is happening on the screen. In particularly difficult cases, this can take dozens of iterations, which complicates and increases the cost of filmmaking.



University of Texas engineers suggestedAlternative option. They developed an AI system that detects what is happening in the frame and automatically suggests a scale.



How it works



Engineers described how the system works in their work for the IEEE ( PDF ). They designed two machine learning models. The first one extracts features of images from the footage - for example, color. The second model analyzes the movement of an object in different frames and determines its nature in order to select the appropriate sound.



For the formation of the acoustic series, engineers have developed the AutoFoley program. It generates a new sound based on thousands of short audio samples - with the sound of rain, the ticking of a clock, a galloping horse. The result of the work is quite convincing:





Unfortunately, the system has a number of serious limitations so far. It is suitable for processing recordings in which the sound does not have to perfectly match the video. Otherwise, desynchronization becomes noticeable - as in this video . Also, the object must be constantly present in the frame so that the MO model can recognize it. Now the developers are engaged in patent registration, but then they plan to fix the flaws.



Who else is involved in such projects



In 2016, experts from MIT and Stanford introduced a machine learning model capable of voicing silent video. It predicts sound based on a property of an object in the frame - for example, its material. As an experiment, engineers uploaded a video into the system in which a person hits a drum stick on various surfaces: metal, earth, grass, and others.





The developers assessed the effectiveness of the algorithm using an online survey. The most realistic were the sounds of leaves and dirt (they were called real by 62% of the respondents), and the least - wood and metal. Metal sounded natural only 18% of the time.



This system also needs to be improved. It generates sounds that occur when objects collide, but cannot recreate the acoustic array for wind noise. In addition, the algorithm fails if objects are moving too fast. Despite this fact, such solutions have the potential - they can simplify the work of noise-makers and transform the film industry.






« Hi-Fi»:



:

?

«, , »: ,

, :

«»:







All Articles