Video Color Video Search Technology

A little about search



When we talk about search, we immediately imagine a Google search engine with a form for entering a text string and many hundreds of results of links to the pages found. However, let's think about the subject of our search.



What are we looking for?



  • Text
  • Documents
  • HTML pages
  • Images
  • Audio
  • Video
  • Binary files


There are specialized search engines for some types of data. For example, there are sites specializing in finding DLL files.



Search video







Let's take a look at finding video information. How can you do this? Purely in theory?



  • According to the text
  • By image
  • By a short video clip
  • By short audio fragment


Current state of affairs



Search engines



  • Google
  • Microsoft
  • Yandex


I named the three largest search engines and they all allow you to search for videos by text and images.



image



Disadvantages of modern search engines



Unfortunately, they all suffer from the following problems:



  • The exact name of the video found is not always available. Instead, the actor can be correctly specified and other images of him presented.
  • There is no precise indication of the position in the video. And this is very useful information.


image



Yes it is. Try it yourself and you will see that I am right. Search engines are prone to some uncertainty. Check out the screenshot above, the one showing Tom Hanks. There is neither the name of the film, nor the position in which it was made.



image



Formulation of the problem



Before we start solving the problem, let's try to describe it. So what do we want?



Desired request execution speed



Nowadays, no one will wait a few minutes for a search query to be completed. However, the amount of data and computation can be such that it takes some time to process the request. You have to compromise. We will conditionally limit the search query execution time to 10 seconds (± a few seconds). This, on the one hand, will allow the browser not to break the connection, but, on the other hand, will give time for scripts to process information.



How much data is there?



Let's figure it out in our head.



Number of videos



According to the IMDb cinematography database, approximately 2.6 million films have been shot in total, including individual episodes of TV shows, cartoons and shorts. (Information as of November 13, 2018).



To begin with, let's limit ourselves to the round number of 1 million videos. It is clear that we are not even trying to touch on YouTube and other similar services, where the video volume is many times larger. And most importantly, this snowball will only grow.



Number of frames



Some movies or episodes of TV series are quite short. There are 15-20 minutes. On the other hand, there are quite a few films up to 2 hours in length or more. Without further ado, let's take the average video duration equal to 1 hour.



A large number of films were shot at 24 frames per second, but there are also faster ones. Nowadays, everyone can shoot their own film, and the frame rate in it can be 60, 100 and 200 FPS and higher. It all depends on a camcorder, a camera, an action camera, a smartphone, a video surveillance camera, etc. (underline the necessary). All in our hands. But, let's take as a first approximation the frame rate of an average video equal to 30 FPS.



In this case, the average video will be:



30 FPS * 3600 sec = 108,000 frames



Rounding up, we get that the average video is about 100,000 frames.



Data volume



What is the storage capacity of information about one frame? Obviously, this value depends on the algorithm for comparing frames in our database with a given sample. We use two algorithms to compare data. One of them requires about 30 bytes per frame, the other about 10 bytes. Let's take the average - 20 bytes.



This means that to store information about 1 million videos, 1,000,000 videos * 100,000 frames * 20 bytes = 2,000,000,000,000 bytes are needed







Simply put, we need about 2 TB in order to somehow describe all our frames. Which, generally speaking, is not so bad, because this amount of information can fit on a modern HDD or SSD disk. On the other hand, this information should be somehow streamlined, otherwise even a simple reading of 2 TB will take a lot of time, and we agreed that the user will not wait more than 10 seconds.



Even if we read information from the disk at a speed of 500 MB / s, we will need 2000 seconds, that is, more than half an hour!



How many servers do we need to search for a specified time?



If we assume that we store information evenly on several servers, then, in this case, the amount of information processed to complete one search query decreases. For example, if we have 10 servers, each of them will need to process not 2 TB of information, but only 200 GB. Or if we have 100 servers, then we need to process not 2 TB, but 20 GB of information. In principle, the specified amount should be sufficient for the functioning of such a search engine.



How many requests per second can such a system digest?



It is difficult to answer exactly, but most likely a maximum of several tens of requests per second.



What was done



First, we implemented a search by video fragments. However, image search was soon implemented.



History



1 july 2019



On this day, the first version of the VideoColor package was released. It included three parts:



  • Manager (source video indexing)
  • Server (the back end that accepts requests and looks for a match in the index database)
  • Client (a client application that allows you to play AVI files and send search queries to the server).


March 2020



A website was created with the ability to identify videos by the uploaded video fragment.



14 April 2020



Released the first version of the application for video identification and positioning of the video being played by capturing the contents of a part of the monitor screen.



23 june 2020



The first version of the application for adding index and descriptive information about video to the site database has been released.



Search by video fragments



main idea







We will consider a video as a sequence of images. For each image, find the average of red, green, and blue. We get three graphs versus time. Let's build and save these graphs for each video that we want to index.







Having received a video fragment for identification, we will build these graphs for it as well. Let's compare the obtained graphs with those already available. Of course, comparisons will have to be made across the entire length of each original movie. If the difference between the graphs at a specific point is less than a certain value, then we consider that the problem is solved.



It should be noted that this is a simplified diagram. There are several points that differ in the workflow from what is described here. But, in general, this is the idea.



pros



  • . 1 . , 1000 , 2 , 2 .
  • . , , .
  • 5-10 .
  • ( ).
  • . . , . , , .




  • . .. . , . , . , . , 2- DDR3 1600 12 0,5 . 48 2- .
  • ( ) . . , , , .
  • . , , . — .
  • .






Divide the original image into table cells M x N. Find the average value of the red, green and blue components in each of the areas. Actually, the set of these values ​​will be the characteristic of this image, with the help of which we can distinguish them all from each other. We enter this characteristic into the database along with the pointer to the video description (Video ID) and the serial number of the frame in the video. The only question that remains is, what values ​​do M and N take? We took 5 x 5, but you can try other values. With small values ​​of these parameters, there is a chance that we will have many duplicates, and with large values, we will spend a lot of memory.







However, this is not all. If in the future you search for all these characteristics, then it will take a lot of time to process each request! How to be? You can calculate the average value of the R, G, B components for this image and, based on these values, group them in the data array. For example: R = 200, G = 188, B = 212. In this case, we enter information about the frame in the appropriate section or add a field to the table. And when searching, we define these components in the same way and search taking into account these parameters. Thus, we greatly reduce the amount of compared data and speed up the search.







To be honest, this is only in theory, in practice everything is a little different. But this is a topic for a separate article.



pros



  • Relatively small data size.
  • It is possible to split all data into groups and search by groups, which significantly speeds up the search.
  • Unlike the previous method, it does not require permanent storage of large amounts of data in RAM.
  • Low probability of error.




Minuses



  • Due to the fact that after transcoding the video may slightly differ from the original, and JPEG encoding (when searching by image) changes the original and the group may be determined incorrectly. This requires either expanding the range of the group (leads to a decrease in search speed) or additional search queries (also slows down the search).


Tools



To date, several applications have been written, some of them have become outdated and are no longer supported.



Video search (client side)



  • Via a web form on the site
  • Via "Video Color Capture" application


Video search (backend)



  • Video Color Server. : Windows ( ) Linux ( , crontab).




  • «Video Color Creator»








  • .
  • .
  • ().
  • () .
  • - , .




Suppose you have a file with a clumsy name. The initial splash screen is either missing (the author's intention) or cut out. What is this movie? I would like to read the description and comments of those who viewed it.



Finding and cutting off ad units



Example: You have your own self-written video player and you want your users to see your own ads, not central channels, when watching streaming video.

Checking parts of the video for borrowing them from other films (plagiarism)

Example: If there is a suspicion that someone is using your video (taken from a quadcopter) in their video.



Determination of the exact date of publication and the name of the show (program) if this information is absent in the repost



Example: You are watching a video show hosted on an unknown site. You may even know what the show is called, but you don't know when it was shown. A year ago or two?



Determination of a more or less accurate position of the streaming video being played if a previously indexed video is being broadcast



Example: This may be needed if you want to attach an application to someone else's streaming video that shows titles or other contextual information (maps, links, news, etc.). First, there is video capture, index calculation, identification of video and position on the server, and then the application displays contextual information in a separate window, synchronized with the video being played.



How to use the service



Search video through a web form on the site



To do this, you need to upload a video fragment or image into the corresponding field of the form.







It should be noted that if a video fragment is loaded, the server will first deal with the storyboard of the uploaded video and its processing, which will require additional time.



The result page contains the title of the film, the name of the director, information on the country of origin, the year of release, the genre, the names of the actors, a short description, the duration of the video, as well as the position found in the video, links to additional information and a tabular image of frames from the video.







Search videos using the app



Searching for video using the application is much faster, since all preprocessing is done on the client side, and only a small part of the original data is sent to the server. This puts less load on the channel and increases the speed of the search query.



















Can I single-handedly populate a database with index information for one million videos?



Most likely no. Where can I get these videos? How to pump them over the network? Where to get computing resources for processing them?



But you can make the base available for filling by the users themselves. And this has already been implemented. We have filled the database with one hundred videos and you can make sure that our service works. You can also download and install a free application for indexing videos and adding a description, followed by uploading data to the server. The application allows in the future to perform some operations with the loaded data: deleting, editing the description, viewing and searching.







If you decide to add your video or any other, please make sure that this video is not yet in the database. You can search in the application by name, director, year of creation and other parameters.







The speed of creating index information depends on the power of your computer and the characteristics of the video itself (resolution, codec, frame rate). On average, processing takes a few minutes. At this time, the user can fill in the video description text fields.







Plans for the future



  • Search acceleration.
  • Improving search accuracy.
  • Search by audio fragments.


Searching for videos by short audio fragments will complement the existing two search methods (by video fragments and images).



Outcome



  • In this post, we reviewed the current state of the art with video search.
  • We got acquainted with the methods of video search by short video fragment and image.
  • We talked about the Video Color Capture video search application .
  • Mention was made of the Video Color Creator application for adding to the AAP Software shared video database .


Links



Website



http://www.videocolor.aapsoftware.ru/

The site offers a search by a short video fragment, as well as by an image from the video.



Applications





Video





Publications






All Articles