How the youtube-dl program was born

As you know, at the moment the youtube-dl repository on GitHub is blocked by a DMCA request from the RIAA. Although I cannot comment on the current plans of the maintainers or the current discussions, but after the accusations made by the RIAA, I thought that it would be useful for me as the creator of the program and the first maintainer to talk about the early years of youtube-dl.



Copper collectors



In any good story, there must be a villain, and as such a character I decided to choose copper collectors - thieves who collected non-ferrous metals in the area. It was they who prompted the creation of youtube-dl. Back in 2006, my village was located 5-10 kilometers from the small town of Aviles in northern Spain. Aviles residents enjoyed good infrastructure and services, including cable TV and ADSL internet access. There was nothing like that in my area: too far from an ADSL telephone exchange, and copper collectors for years stole copper wires on the way to it, occasionally causing interruptions in the telephone service and forcing the telephone company to replace those wires with weaker and thinner ones, because they, too, are likely to be stolen. This went on for several years.



The only way to access the Internet from home was a 56k V.90 modem . In fact, the connection quality was so bad that it was necessary to reduce the speed to 33.6 Kbps for stability. Actual download speeds rarely exceeded 4KB / s. At the same time, an interesting video service YouTube appeared on the Internet, it was rapidly gaining popularity, and at the end of the same year it was bought by Google.



Stay up all night to watch a piece



Watching any YouTube video over a 33.6Kbps connection was an excruciating experience. Pretty much any video took forever to download. For example, it takes 40 minutes to download a short 10MB video, which makes streaming impossible. A longer and higher quality video takes several hours and completely takes up the channel, not to mention the fact that the connection can be interrupted at any time - and the download will have to start over! Imagine that you really liked a particular video and want to watch it a second or third time. The repetition of this process became practically an act of masochism.



In such a situation, I began to think about the possibilities of downloading video files: if the video was interesting, having a copy allows you to review it several times later. And if the download program is really good, then after the interrupted connection it will be able to resume the download from the same place!



There were other solutions for downloading YouTube videos at the time, including the rather popular Greasemonkey script... By pure coincidence, I could not configure any of the existing solutions to work, so I decided to write my own tool. This is how the youtube-dl program appeared. It was more convenient and faster for me to launch it from the console, so there is no graphical interface. Python was chosen because it has an extensive standard library, with the pleasant side effect that it will work on any platform.



Ethereal launch



The first version only worked on YouTube. The program had practically no normal architecture, because it was not needed. Written as a simple script, it went straight to the point. The size of the program is 223 lines, with only 143 actual code, 44 lines of comments and 36 empty lines. The name was chosen for pure convenience: youtube-dl is obvious, understandable, hard to forget and can be intuitively entered as “YOU-TAB” in the console.



Since I have been using Linux for several years now, I decided to publish the program under a free license (MIT in the first versions) in case someone finds it useful. Back then, GitHub hadn’t appeared yet and we had to make do with SourceForge . But there, when creating a new project, you had to fill out a tedious form. So instead of SourceForge, I quickly posted the code topersonal page , which was given by the Internet provider. While this seems unusual today, ISPs used to give users an email address and some hosting where they could upload files via FTP. Thus, you could host your own personal website on the web. The first version of the program was published on 08.08.2006, although by that time I had been using it for several weeks.



During the development process, it was necessary to understand the operations of the Firefox browser when watching videos on YouTube. If I recall correctly, Firefox has not yet built in development tools for analyzing network activity. Connections were made primarily over HTTP, so Wireshark, known at the time as Ethereal, became an invaluable tool for analyzing network traffic. I wrote youtube-dl with the specific purpose of doing the same thing that the web browser did when extracting a video. The program even sent the same user-agent string, copied verbatim from Firefox for Linux, to make sure the site would send the program the same web pages as the browser.



Also, YouTube used the Adobe Flash player back then.... The videos were served as Flash Video (FLV) files, so a proprietary plugin was required to view in a browser (many will remember the dreaded libflashplayer.so library), so any in-browser development tools were useless. This proprietary plugin has been a constant source of security vulnerabilities and issues. I had a Firefox extension called Flashblock that prevented content from loading by default and replaced it with placeholders with a clickable icon, so content would only load upon request, and the plugin library would not be used unless the user requested it.



In addition to improving security, Flashblock had two other benefits. First, it removed a lot of noisy and nasty banners, which could also be a source of security problems. Secondly, it facilitated the analysis of the process of loading video into the player. I waited for the page to fully load and then launched Wireshark just before clicking on the Flashblock icon, starting the video download. Thus, the only traffic to analyze was related to the plugin loading the video player application and the application itself loading the video.



It is also worth noting that the Flash Player plugin at that time already uploaded a copy of the video to the hard disk (under Linux they were stored in/tmp), and many users have relied on this functionality to make a copy without additional tools. So youtube-dl was more convenient only because it extracted the name of the video and assigned it to the file automatically, for example.



Oh, fresh meat!



Ultimately, the Flash Player was changed to make videos more difficult to extract . One of the first measures was to break the link to the video file after it was created so that the i-node still exists and is available to the process using it (until it is closed), keeping the file invisible from the file system's point of view. It was still possible to grab a file using the filesystem /procto examine the file descriptors used by the browser process, but with each of these small steps youtube-dl became more and more convenient.



Like many open source enthusiasts at the time, I used Freshmeat to subscribe to new releases of projects that interested me. When I created youtube-dl, I also created a project entry on this website so that users can be notified of new releases and a changelog listing new features, fixes and improvements. In the Freshmeat catalog it was possible to search for new and interesting projects; the latest updates were published on the first page, usually several dozen per day. I suppose that in this way Joe Barr (rest in peace), the editor of linux.com , found out about the program and decided to write an article about itback in 2006. Linux.com was one of the most popular resources for Linux enthusiasts at the time, along with other classic sites like Slashdot or Linux Weekly News. At least for me.



From that moment on, the popularity of youtube-dl began to grow, and from time to time I began to receive letters of gratitude for the creation and support of the program.



Traffic counting



Fast forward to 2008. The popularity of youtube-dl continued to grow slowly, and users often asked to make similar programs to download from other sites, and I gave in to this request several times. It was at this point that I decided to rewrite the program from scratch in order to implement the original support for multiple video sites. I had some simple ideas on how to divide the internals of the program into several parts in order to simplify the most important parts: a separate file loader, common to all websites, and separately - information extractors: objects (classes) that contain code specific to a particular video site. When a URL or pseudo-URL is given, extractors are requested to find out which one can handle this type of URL, and then it is requested to extract information about that video or video list,with the main purpose of getting a video URL or a list of URLs with available formats as well as some other metadata like titles like.



I also took the opportunity to change the version control system and move the project to another hosting. At the time, Git was winning the distributed version control war, but Mercurial also had many users. I tested both and decided that I like Mercurial a little more than Git. I started using it for youtube-dl and posted the project on Bitbucketwhich was a natural choice. At the time, Bitbucket only hosted Mercurial repositories, and GitHub only hosted Git. Both launched in 2008 and are a breath of fresh air compared to SourceForge. Different project namespaces for each user (that is, your project name does not have to be globally unique, but unique only for your projects) with distributed version control systems meant that you can publish your personal projects in a matter of minutes to either of the two sites ... Anyway, moving the project history to Git and moving the project to GitHub followed a couple of years later .



When rewriting the project, I should have no doubt seized the opportunity to rename it, but I didn't want to confuse existing users and left the name in an attempt to keep the program a little popular.



The technological context also changed slightly that year. Mobile data plans began to gain traction, and at the end of that year I bought myself a 3G modem and a data plan that for the first time allowed me to surf the web at decent speed. Anyway, it didn't make me stop using youtube-dl. I paid 45 euros per month, but the monthly data limit was limited to 5 GB, meaning I could only use about 150 MB per day on average. And the speed allowed you to download much more, so you had to monitor traffic and selectively select content, avoiding large downloads if possible. Therefore, youtube-dl helped a lot to prevent multiple downloads of large video files and stay within the tariff plan.



Episode: New Home



Some time later, at the end of 2009, I moved and finally started living with my girlfriend (now my wife and mother of two) in Aviles. For the first time, I got high speed internet, which has been the standard for many of my friends and family for many years. I remember it was a 100/10 Mbps cable connection (download / upload) with no traffic limitation. This definitely marked a turning point in how often I used youtube-dl and how much attention I paid to the project.



Later, I finally ported the code to Git and GitHub. At the time, YouTube began experimenting with HTML5 video, which would become the default around 2015. In 2011, I worked full-time for several years as a software engineer, and in general, upon returning from work, I did not really want to program and configure youtube-dl or implement, at the request of users, a function that I myself was not going to use personally.



In the second half of 2011, in the midst of another important project, I decided to leave the post of youtube-dl maintainer, since I had not been able to cope with the task for several months. Philip Hagemeisterproved to be a great programmer, and he submitted several pull requests to GitHub with fixes that many people were interested in. I gave him access to commits to my youtube-dl repository, and it was essentially the end of the story on my part. The upstream logs show me a continuous stream of commits until March 2011 and then a jump to August 2011 with a merge from Philip. I have since made the only commit in 2013 for a change in the rg3.github.com source code to rg3.github.io when GitHub moved custom pages from USERNAME.github.com to USERNAME.github.io to avoid security issues with malicious code on its own domain, if I remember correctly.



Although I did not participate in the development of youtube-dl, for many years the official project page was still under my account on https://github.com/rg3/youtube-dland https://rg3.github.io/youtube-dl/. I needed to show up when Philip or other maintainers asked to give access to commits to additional developers, such as Filippo Valsorda or Sergey M. , one of the current maintainers. Unfortunately, in 2019 there was a small problem with trolls in the tracker, and only project owners are allowed to block users. This made us finally move the project to the GitHub organization, where everyone who had access to commits was invited (although not everyone joined). The organization of GitHub allowed maintainers to act more freely, without tugging me at the slightest provocation.



I would like to once again express my most sincere gratitude to the various project maintainers over the years who have significantly improved the code, were able to create a real community around and who made the project much more popular than it was when I left almost 10 years ago.



Offline and free



I would like to note once again that the purpose of youtube-dl as a tool has practically not changed over the 14 years of its existence. Before and after receiving the DMCA letter from the RIAA, many talked about how they use youtube-dl for different purposes .



For me, it has always been offline access to videos that are already available to the general public on the Internet. In the world of mobile networks and always-on connectivity, you might ask if it's really necessary. I think so, if Netflix, Amazon, Disney and HBO have implemented similar functionality in their hugely popular streaming apps. For long road trips or trips abroad, especially with kids, or underground, or on an airplane, or in a place with poor or limited connections, it's incredibly convenient to have offline access to a podcast, lecture, review, news, or artwork.



An additional side effect of youtube-dl is access to content when the online interface is not up to the task. The old proprietary Flash plugin didn't work for every platform and architecture. Currently, browsers can play videos, but sometimes fail to take advantage of efficient GPU decoding and waste a lot of battery power. Youtube-dl can be used with its own player to make certain videos playable and / or efficient. For example, the mpv player includes built-in support for youtube-dl. You only need to pass the URL to it, and it uses youtube-dl to access the video stream and play it without saving anything to your hard drive.



The default online interface may not have accessibility features that some people need to navigate, or color filters for color blind people, again available from the native video player app.



Last but not least, tools like youtube-dl allow you to access online videos using only free software. I understand that there are not so many supporters of strictly free and open source software in the world. I don't even consider myself as such, by and large. Proprietary software is constantly present in our modern life and is delivered to us every day as a huge amount of Javascript lines in the browser, with many different purposes and not always in the best interests of users. The proof is the emergence of the GDPR, with all its flaws and problems. Accessing online videos with youtube-dl guarantees complete incognito peace of mind where uBlock Origin or Privacy Badger barely soothe.



All Articles