Writing a video chat for a local network, or mastering WebRTC in 2020

Against the background of well-known events , there was nothing to do as part of my professional growth, I'm trying to master WebRTC. As you know, the best way to learn is to do something at least potentially useful. And at the same time and share-exchange experience of creating and stuffing cones.



As a task, it was decided to make a simple application that allows audio-video calls between two (for now) stationary or mobile devices in a local network without the need to connect to the Internet. The installation and initial configuration of such an application should be simple enough so that any advanced enikey can handle it without any problems and show users how to make calls, and if he has the appropriate skills, he could make minor improvements in terms of design and capabilities. The client should be any device equipped with multimedia input-output devices and allowing you to run a suitable browser (Firefox or Chrome - testing on May, it seems, versions).



How is it done



As you know, the WebRTC technology for communication between two subscribers suggests using an object of the RTCPeerConnection type, and the main task of the developer is to organize the exchange of text information (SDP-offer, SDP-answer, ICE-candidate) between the caller and the callee. In other words, the developer needs to first create a text chat with an API for browser JavaScript and then attach the multimedia part to it - RTCPeerConnection events and methods for transmitting and processing data receiving.



The choice of technologies for implementation and the text chat API remains with the developer. Many (and in particular Mozilla in their official example of RTCPeerConnection) prefer to use the WebSocket API and the corresponding server - for example, on Node.JS. But, taking into account our task to make it as simple as possible for deployment, I decided to begin with not to overcomplicate the server application, especially in order to deliver pages and scripts to the client device, an additional Web server was needed. Therefore, the API decided to use xmlHttpRequest with periodic client calls to the same Web server. I can't say that it works perfectly in terms of the resource (and battery) consumption of the client device and the absence of brakes, but it works exactly if some nuances are taken into account when developing. Maybe,in some next version I will add a WebSocket server and redesign the API accordingly, but not all at once.



It was decided to make the server part on Lazarus for Windows; networking capabilities are provided by the Synapse package. In some ways, this is probably a perversion, and I had to seriously tinker and get a few bumps in order to make everything work as intended. But one exe, two dlls (OpenSSL libraries), files of a self-signed SSL certificate and a key to it and a few configuration files (plus static) allow you not to bother much with the level of technology for the server and the way the application is launched. I tested the first version of this server in a 32-bit build even on an Asus Eee PC 900 of 2009 release under Windows XP, although it was not without a cheat in the form of a recent replacement of the regular super-slow SSD with a more modern and voluminous one. This is in terms of performance. And the "installation" of the server is unpacking the downloaded zip-archive into any suitable folder,editing the JSON file of the configuration of user accounts and launching the exe-file of the program (there is also a button in the window, but you can specify a parameter in the command line to start the Web server immediately). One way or another, I'm thinking about a more serious server part, since I have such experience. Everything has its time.



In addition to the actual organization of the API, our server serves static files for browsers (login and chat web pages, styles, images, scripts, ringtone). In general, I tried to do without third-party libraries as much as possible, but due to the fact that my design and html layout are not so hot, I decided to use jQuery.UI and, accordingly, jQuery, which the Web server also sends as static. All static files are in a separate subfolder of the program folder; they, of course, can be watched and even changed if desired and with the appropriate skills. In JavaScript, the code is commented, and you can learn from it if necessary.



How to organize communication



To organize communication, the main thing is to select and combine client devices (computers, laptops, smartphones, tablets) and a Windows machine with a "server" (it can also act as a client) into a common network. From client devices, I tested several inexpensive smartphones released in the last few years on Android starting from version 7, as well as a computer and laptop on Windows 10, including with two connected Web cameras; they performed well. For fun, I even tested the first version on Orange Pi One with Lubuntu (or Kubuntu, I don't remember right off the bat) from the manufacturer. Surprisingly, it even worked, even though the video slowed down, and the chat page was opened for no time at all (I don't even want to talk about loading the system and opening the browser).



Our server is installed on the "server" machine in the above described way and user accounts are configured. Each user needs to be given a login with a password.



Everything works like this. Users enter the "server" typewriter with a browser using the https protocol, using its IP address or domain name. There they enter their username-password and go to the chat page with a list of contacts. When you click on a contact, a dialog window opens with the history of text messages (by the way, the server stores it only in RAM, it cannot yet be stored in a file), a field for a chat and an audio-video call form with checkboxes for selecting audio and (or) video. To make a video call, the user marks the appropriate checkboxes, presses the call button and confirms the permission to the browser. The called subscriber starts beeping a ringtone and a response form opens with the same flags. After clicking the answer button, the browser will also ask for permission to access the multimedia devices. Then a call window opens.



I can't say that I have a lot of experience with software for video conferencing, video consulting, etc., but, for example, in Google Hangouts on a computer (as on mobile devices, I don’t know) I didn’t see an opportunity to turn my beloved on full screen, which, in theory, may be required at remote consultations when you need to see well what you are showing your interlocutor (for example, through the back camera of your smartphone). In this chat, in the call dialogue, I decided to make two tabs with the video - the interlocutor and the user himself. From the current version on the user tab, in addition to the video itself, there are fields for selecting a camera and microphone; you can change their values ​​on the fly during a conversation. Perhaps it will be useful to someone.



Now I will briefly describe the cones filled with development; maybe it will help someone while developing and debugging their solutions.



Modern features of the work and implementation of WebRTC and generally working with multimedia in JavaScript



Here, briefly; details can be found in the comments in the static / js / videoChat.js javascript file



  1. Chrome for sure, plus, possibly, other browsers also allow you to work with getUserMedia only on sites accessible via HTTPS
  2. The list of audio and video input devices can be obtained only after a successful call to getUserMedia
  3. Automatic start of playing sound by means of JavaScript (via the play () method of the html video or audio element) is possible only after the user shows activity on the site - for example, clicks on some control.
  4. promise setLocalDescription , offer. RTCPeerConnection ICE-, .
  5. « » getUserMedia RTCPeerConnection. , , .
  6. Many descriptions for mobile devices refer to the facingMode property to select front or rear cameras. In fact, I don't know how in old devices, but in this chat on tested smartphones, switching works even without using this property. But strictly taking into account clause 5.


The list is most likely not exhaustive. I think a lot more will come up in the course of further development. If someone knows how to get around the restrictions and, accordingly, simplify the program or work with it, please write in the comments.



Lazarus network application developer big shot



Synapse currently only supports OpenSSL 1.0.x libraries; in 1.1 a lot of things are already implemented differently, other even library names. In addition, it is not enough to simply place dlls in the program folder. You also need a configuration file (openssl.cnf), the path to which is set through the OPENSSL_CONF environment variable.



Where can I download



The distributions of the program for Win32 and Win64 and the source code of the server side on Lazarus are available on the program page at the link www.lubezniy.ru/soft/videochat



PS: By the way, does anyone know how you can use Lazarus to automate building from the same sources two different exe - for Win32 and Win64? The cross compiler from Win64 to Win32 is up and running, but changing the options in the project every time is not correct.



PPS: Who tried it, please share your impressions in the comments.



All Articles