Object video analytics in transport

There are a large number of tasks where processing and content processing must be performed β€œat the edge”, that is, in the immediate vicinity of the data source (cameras). In particular, this also applies to the tasks of object video analytics, for example, within the framework of projects to optimize transport infrastructure.



Let's consider several joint solutions from the Russian integrator Larga Group and developers of object video analytics systems, ComBox Technology .



image



Task:



  1. Implementation of passenger counters on buses to control the number of tickets sold and to obtain statistics on transport congestion in the context of the route.
  2. Driver control (detection of smoking and use of mobile phones).


Terms:



  1. Inference of neural networks and execution of analytics "on the edge" to minimize traffic and due to the instability and high cost of communication channels.
  2. Possibility of joint and separate use of different detectors (scalability).
  3. Data transmission for further processing via mobile communication channels.


As a solution, we settled on the AAEON VPC-3350S, since this device has the following characteristics that are important to us:



  • Built-in LTE module.
  • VPU expandable with Intel MyriadX accelerator.
  • Integrated Intel HD Graphics 500 that can use hardware decoders and encoders to process video streams.
  • Multiple LAN ports for direct connection of network cameras without the need to install a switch.
  • Wide operating temperature range (-20 + 70).


AAEON VPC-3350S

AAEON VPC-3350S



Let's consider the first case of separate application of detectors. In the field of car sharing, there are already penalties in the form of fines for smoking in the salons of rented cars. The amount of the fine varies from 5 to 15 thousand rubles depending on the company. Compared to object video analytics and smoke detection sensors, the sensors do not pick up vapes and other devices for smoking mixtures, and are also practically insensitive when the car windows are open. But this does not negate the fact of violation and, accordingly, the legal punishment in the form of a fine in accordance with the contract.



In addition, several neural networks can be cascaded (sequentially) applied in transport, such as smoking detection and detection of the fact / time of using a mobile phone. It is clear that further such systems should be scaled, for example, with the integration of telematics and connection to the car's CAN bus to track the use of phones only when the vehicle is moving, but these are already integration details.



An illustrative example of what we specifically detect and what we get as a result:



Detection of a mobile phone in the hands of a vehicle driver



Smoking detection in cars



Demonstration on bots in Telegram (input - a picture from a smartphone camera or from a gallery, output - probability):





Specifically, our version of the AAEON VPC-3350S is equipped with an Intel Atom x5 E3940 processor. If necessary, you can additionally install expansion cards with Intel MyriadX and transfer the inference of neural networks to VPU without any significant modifications, since the Intel OpenVINO framework is used.



Let's look at the inference speed (FP16) on various devices, including CPU, iGPU (Intel HD) AAEON VPC-3350, VPU Intel Movidius and solutions from other manufacturers:



Inference speed (FP16) on various devices, including CPU, iGPU (Intel HD) AAEON VPC-3350, VPU Intel Movidius and third-party solutions

Inference speed (FP16) on various devices, including CPU, iGPU (Intel HD) AAEON VPC -3350, VPU Intel Movidius and third-party solutions



Thus, on the iGPU graphics of the Intel Atom x5 E3940 processor, we get 54 FPS, and supplementing the Intel Movidius VPU - another 45 FPS. To detect smoking, 15 FPS / camera is enough, which will allow processing up to 3 threads on one processor graphics. It should also be borne in mind that in addition to allocating and using resources for inference, it is necessary to decode the incoming RTSP stream. Let's



AAEON VPC 3350 decoder test

look at the decoder tests: AAEON VPC 3350 decoder test



At maximum CPU and graphics load, we decode 30 720p streams at 15 FPS, that is, we get 450 frames for 720p. For 1080p, that's about 150 frames.



Consider the composition of the kit for use in car sharing and the main steps of data processing:



  1. The car is equipped with IP cameras powered by Ethernet, PoE (one for the driver or two: driver, passenger).
  2. , AAEON NVR 3350.
  3. .
  4. .
  5. ( ). . , , 50%, ( ).
  6. /.
  7. (10 ), . :

    • ,
    • .
    • vehicle identifier (static GUID)
    • camera number (0, 1)
    • event type
  8. Event data upon the availability of 3G / LTE is transmitted to the central data processing server with integration with the existing information system of car sharing for billing operations.


Let's return to the second case of using AAEON VPC-3350S - detection and counter of passengers on buses:







Stages of work performed:



  1. Preparatory work (testing cameras, choosing a focal length, setting the boundary conditions of the problem):

    • Marking 600 frames from multiple cameras with different focal lengths
    • Neural network training on nVidia GPU, 10k steps
    • Testing a model against a validation dataset
    • Converting a model to Intel OpenVINO
    • Testing the resulting model in Intel OpenVINO using a validation dataset, comparing quality and speed with the model before conversion
    • , (, CPU, VPU)
  2. ( + )
  3. , 20 .
  4. nVidia GPU
  5. OpenVINO
  6. :

    • gstreamer/ffserver
    • (, , , )
    • mongoDB/PostgreSQL
    • REST API
  7. Β« Β»




Directly, the learning process:







Interface of the Larga Group's personal account for clients with displaying reports on passenger traffic:



image

Interface of the Larga Group's personal account for clients with displaying reports on passenger traffic



Personal account for clients with displaying reports on passenger traffic



The process of detecting people in the bus entrance area and marking the zones



Counter operation algorithm:



  1. Cutting an RTSP stream into frames
  2. Head detection on every frame
  3. Trajectory analysis (keeping the head in the frame while moving)
  4. Analysis of the direction of movement based on the sequence of intersection of 3 pre-marked zones
  5. Recording events in the local database, taking into account the direction of movement (entry / exit)
  6. Providing access via REST API to third-party information systems and reporting systems


Since a hybrid solution for the inference of neural networks (edge ​​and processing of part of the data in the data center) was initially assumed, we will consider the pros and cons of both approaches:







Thus, we get the minimum cost of the flow in the data center with centralized processing, but high requirements for the availability of high-quality and fast channels communication. For edge solutions - a higher cost, but minimal requirements for communication channels and no requirements for their reservation.



All Articles