My eight-year quest to digitize 45 videotapes. Part 2

The first part describes a difficult quest to digitize old family videos and break them into separate scenes . After processing all the clips, I wanted to make their online viewing as convenient as on YouTube. Since these are personal memories of the family, they cannot be posted on YouTube itself. We need a more private hosting that is both convenient and secure.



Step 3. Publishing



ClipBucket, an open source YouTube clone that can be installed on your own server



My first try was ClipBucket , which calls itself an open source YouTube clone that you can install on your server.







Surprisingly, ClipBucket doesn't have any installation instructions. Thanks to a third party tutorial, I automated the installation process using Ansible , a server configuration management tool.



Part of the difficulty was that the ClipBucket installation scripts were completely broken. At that time, I worked at Google and under the terms of the contract I had no right to contribute to the open source YouTube clone, but I posted a bug report from which it was easy to make the necessary corrections. Months passed, and they still did not understand what the problem was. Instead, they added more and more bugs with each release.



ClipBucket worked on a consulting model - they released their code for free and charged a fee to help with the deployment. Gradually, it dawned on me that a company that makes money from paid support is probably not very interested in having customers install the product themselves.



MediaGoblin, a more modern alternative



After months of frustration with ClipBucket, I reviewed the options available and found MediaGoblin .





MediaGoblin  is a standalone media sharing platform



. MediaGoblin has a lot of goodies. Unlike ClipBucket in unsightly PHP, MediaGoblin is written in Python, which is the language I have a lot of coding experience in. There is a command line interface that makes it easy to automate video downloads. Most importantly, MediaGoblin comes in a Docker image that eliminates any installation issues.



Docker is a technology that creates a self-contained environment for an application that runs anywhere. I use Docker in many of my projects .


Amazing Difficulty of Re-Dockerizing MediaGoblin



I assumed that deploying the MediaGoblin docker image would be trivial. Well, it didn't quite work out that way.



The finished image was missing two necessary functions:



  • Authentication
    • MediaGoblin creates a public media portal by default, and I needed a way to restrict unauthorized access.
  • Transcoding
    • Every time you upload a video, MediaGoblin tries to recode it for optimal streaming. If the video is initially ready to be streamed, transcoding degrades the quality.
    • MediaGoblin provides for disabling transcoding via configuration options , but this cannot be done in an existing Docker image.


Well, no problem. The Docker image is open source , so you can rebuild it yourself .



Unfortunately, the Docker image is no longer built from the current MediaGoblin repository . I tried to sync it up with the version from the last successful build, but it didn't work either. Although I used exactly the same code, MediaGoblin's external dependencies changed, breaking the build. After dozens of hours, I ran the 10-15 minute build process of MediaGoblin over and over again until it finally worked.



A few months later, the same thing happened. In total, over the past couple of years the MediaGoblin dependency chain has broken my build several times, and the last time it happened was just when I was writing this article. I ended up publishing my own fork of MediaGoblin, with hardcoded dependencies and explicit library versions. In other words, instead of dubiously claiming that MediaGoblin works with any version of celery > = 3.0, I installed a specific dependency on celery 4.2.1 because I tested MediaGoblin with that version. It looks like the product needs a reproducible build mechanism , but I haven't done that yet.



Anyway, after many hours of struggle, I was finally able to build and configure MediaGoblin in a Docker image. There it was already easy to skip unnecessary transcoding and set up Nginx for authentication .



Step 4. Hosting



Since MediaGoblin was running Docker on my local machine, the next step was to deploy to a cloud server so the family could watch the video.



MediaGoblin and video storage problem



There are many platforms that take a Docker image and host it on a public URL. The catch is that in addition to the app itself, 33GB of video files had to be published. It was possible to hard-code them into a docker image, but it turned out to be cumbersome and ugly. Changing one line of configuration would require redeploying 33 GB of data.



When I used ClipBucket, I solved the problem with gcsfuse  , a utility that allows the operating system to upload directories to Google Cloud as normal file system paths. I posted the video files to Google Cloud and used gcsfuse to show them as local files in ClipBucket.



The difference was that ClipBucket ran in a real VM, while MediaGoblin ran in a Docker container. Here, mounting files from cloud storage turned out to be much more difficult. I spent dozens of hours solving all the problems and wrote a whole blog post about it .





The initial integration of MediaGoblin with Google Cloud storage, which I talked about in 2018.



After several weeks of tweaking all the components, everything worked. Without making any changes to the MediaGoblin code, I cheatfully force it to read and write media files to google cloud storage.



The only problem was that MediaGoblin became obscenely slow. It took a whopping 20 seconds to load video thumbnails on the home page. If you jumped forward while watching a video, MediaGoblin would pause for infinite 10 seconds before resuming playback.



The main problem was that videos and pictures went to the user in a long, roundabout way. They had to go from the Google cloud storage through gcsfuse to MediaGoblin, Nginx - and only then they got into the user's browser. The main bottleneck was gcsfuse, which is not optimized for fast performance. The developers warn about large delays in the utility's work right on the project main page: Performance





warnings in the gcsfuse documentation



Ideally, the browser should fetch files directly from Google Cloud, bypassing all intermediate layers. How can I do this without diving into the MediaGoblin codebase and adding complex Google Cloud integration logic?



Nginx sub_filter trick



Fortunately, I found a simple solution, albeit a little ugly. I added the following filter to the default.conf configuration in Nginx :



sub_filter "/mgoblin_media/media_entries/" "https://storage.googleapis.com/MY-GCS-BUCKET/media_entries/";
sub_filter_once off;


In my installation, Nginx worked as a proxy between MediaGoblin and the end user. The above directive tells Nginx to search and replace all of MediaGoblin's HTML responses before passing them on to the end user. Nginx replaces all relative paths to MediaGoblin media files with URLs from Google Cloud Storage.



For example, MediaGoblin generates HTML like this:



<video width="720" height="480" controls autoplay>
  <source
    src="/mgoblin_media/media_entries/16/Michael-riding-a-bike.mp4"
    type="video/mp4">
</video>


Nginx changes the response:



<video width="720" height="480" controls autoplay>
  <source
    src="https://storage.googleapis.com/MY-GCS-BUCKET/media_entries/16/Michael-riding-a-bike.mp4"
    type="video/mp4">
</video>


Now everything is working out as expected:





Nginx rewrites responses from MediaGoblin so that clients can request media files directly from the Google cloud storage



The best part about my solution is that it does not require any changes in the MediaGoblin code. Nginx's two-line directive seamlessly integrates MediaGoblin and Google Cloud, even though these services know absolutely nothing about each other.



Note : This solution requires files in Google Cloud Storage to be readable by everyone. To mitigate the risk of unauthorized access, I use a long, random bucket name (for example mediagoblin-39dpduhfz1wstbprmyk5ak29) and verify that the bucket's access control policy does not allow unauthorized users to display the contents of the directory.


Final product



At this point, I had a complete, working solution. MediaGoblin happily ran in its own container on Google's cloud platform, so it didn't need to be patched or updated frequently. Everything in my process was automated and reproducible, allowing easy edits or rollbacks.



My family loved how easy it was to watch the video. With the above Nginx hack, video processing is now as fast as YouTube.



The preview screen looks like this:





Content of the Family Video Directory by the Featured



Tag Clicking the thumbnail displays the following screen:





Viewing a single clip on the media server



After many years of work, I was incredibly pleased to give my relatives the opportunity to watch our videos in the same user-friendly interface as on YouTube, as I originally wanted.



Bonus: cost reduction to less than $ 1 per month



You don't watch home videos often, only every few months. My family collectively generated about 20 hours of traffic in a year, but the server was running around the clock. I paid $ 15 monthly for a server that was down 99.7% of the time.



At the end of 2018, Google released the Cloud Run product . The killer feature was the launch of Docker containers so quickly that the application could respond to HTTP requests. That is, the server could remain in standby mode - and start only when someone wanted to access it. For infrequently launched apps like mine, costs have dropped from $ 15 a month to a few cents a year.



For reasons I can't remember anymore, Cloud Run didn't work with my MediaGoblin image. But with the advent of Cloud Run, I remembered that Herokuoffers a similar service for free, and their tools are much more convenient than Google's.



With a free application server, your only expense is data storage. Google's standard regional storage costs 2.3 cents / GB. The video archive is 33 GB, so I only pay 77 cents a month.





The cost of this solution is only $ 0.77 per month



Tips for those looking to try



Obviously, the process took a long time for me. But I hope this article helps you save 80-90% of your efforts to digitize and publish home videos. In a separate section, you can find a detailed step-by-step guide to the entire process, but here are some general tips:



  • Save as much metadata as possible during the digitizing and editing phase.
    • .
    • , .
    • , .
  • .
    • .
    • EverPresent ( , ).
  • , HDD.
  • - , .
    • , , . .
    • (, , ), , .
    • . , .
      • ?
      • ?
      • ?
  • .
    • , .
    • β€œbest of” , .
  • , .
    • , . .
    • , . , .
  • .
    • , , .
    • , , .
  • .
  • .
    •  β€” .
    •  β€” . , Slow Snow The National, .



All Articles