About the original problem
The PREMIER video platform, as befits a modern resource, has created a recommendation service for its clients based on machine learning. A lot of users turn to the video platform - about a million a day, PREMIER is very popular - and the calls come both through a web form, from applications for mobile devices and from Smart TV.
The initial data on the basis of which the machine learning of our service works is stored in the columnar ClickHouse DBMS. According to the schedule, data is processed in the background to build models (which will be used to issue final recommendations). The calculation results are saved in the PostgreSQL relational DBMS.
The solution to the problem of fast interaction between the application server and the client in a short time is always relevant. To ensure the required speed of work - and the response time should have been no more than 50 ms - we had to optimize the structure of the relational database, implement caching and horizontal scaling. We'll talk about some of the techniques now.
About caching
Caching is a common optimization technique. We create an additional intermediate storage, faster than the main one, and place the most demanded data there. Access to cached data is dramatically accelerated, which significantly increases the speed of the application. When developing a high-load application, caching is the most common option for optimizing application performance without expanding hardware resources. Caching can cause some savings in terms of resource usage to re-generate the same output for the same input.
Cache capacity is limited - the faster the storage, the more expensive it is - so it needs to be used efficiently. Of course, you can theoretically do without caching at all if your main storage is fast enough. But it will be economically unprofitable, since you will have to carry out a significant hardware upgrade, often boiling down to an increase in RAM and / or replacing disks from HDD to SSD. Those. significantly increase the infrastructure requirements, which will affect the economic parameters of the entire application being created. Without time-tested caching, in most cases, it will hardly be possible to create a mass product.
However, adding a caching layer does not automatically solve all problems. It is also necessary to think over the rules for its filling, which depend on the characteristics of the task, which can change depending on the situation and as the service develops. And keep in mind that this is not a panacea, but only a remedy that will relieve the symptoms of performance problems in specific parts of your application. If the application has deep architectural problems, a mediocre execution environment, then caching is more likely to add to you problems.
There are several options for where to store cached resources: locally - on the client instance in the browser, in a third-party CDN service, on the application side. We'll talk about in-app caching. Application process memory is probably the most common data caching option you will come across. However, this solution has its drawbacks, since memory is associated with a process performing a specific task. This is all the more important if you plan to scale out your application, since memory is not allocated between processes, i.e. it will not be available in other processes, such as those responsible for asynchronous processing. Yes, caching works, but we really don't get the full benefit from it.
Recommender caching
Coming back to the project, we understand the need for a centralized caching solution. For a shared cache, you can use, for example, Memcached: if an application is connected to the same instance, you can use it in many processes. On the one hand, Memcached is a simple and convenient solution, on the other hand, it is rather limited when it comes to precise management of invalidation, data typing, and more complex queries to the cached data store. Now, in fact, Redis storage has become the standard in caching tasks, which is devoid of the disadvantages of Memcached.
Redis is a fast store of key values. It increases the efficiency of working with data, because it becomes possible to define the structure. Provides granular control over disability and preemption, allowing you to choose from six different policies. Redis supports both lazy and eager preemption, as well as time preemption. The use of Redis data structures can provide tangible optimizations, depending on the business entities. For example, instead of storing objects as serialized strings, developers can use a hash data structure to store and manipulate fields and values by key. Hash eliminates the need to fetch the entire string, deserialize it, and replace it in the cache with a new value on every update, which means less resource consumption and better performance. Other data structures,Redis' suggested “sheets”, “sets”, “sorted sets”, “hyperlogs”, “bitmaps” and “geoindexes”) can be used to implement even more complex scenarios. Sorted sets for analyzing time series data offer reduced complexity and volume in processing and transferring data. The HyperLogLog data structure can be used to count unique items in a set using only a small, persistent amount of memory, specifically 12KB for each HyperLogLog (plus a few bytes for the key itself). A significant part of the about 200 commands available in Redis are dedicated to data processing operations and embedding logic into the database itself using Lua scripts.Built-in commands and scripting capabilities provide flexibility for processing data directly in Redis without having to send data over the network to your application, reducing the overhead of implementing additional caching logic. Having the cache data available immediately after a restart can significantly reduce the cache warm-up time and relieve the burden of recalculating the cache content from the main data store. We will talk about the features of the Redis configuration and the prospects for clustering in the following articles.We will talk about the features of the Redis configuration and the prospects for clustering in the following articles.We will talk about the features of the Redis configuration and the prospects for clustering in the following articles.
After choosing a caching tool, the main problem was the synchronization of the data stored in the cache and the data stored in the application. There are different ways to synchronize data depending on the business logic of your application. In our case, the difficulty lay in the creation of an algorithm for data invalidation. Data placed in the cache is stored there for a limited time, as long as there is a need to use it in the current situation, or at least the probability of such. As the situation develops, they must make room for other data that, in the changed conditions, are more needed. The main task in this case is the selection of criteria by which data will be evicted from the cache. Most often, this is the time of the relevance of the data, but it is worth remembering about other parameters: about the volume, ranking (provided equal, with tolerances,lifetime), category (main or auxiliary data), etc.
The basic and common way to keep data up to date is time obsolescence. We also use this method, taking into account the periodic centralized updating of the data of the recommendation application. However, everything is not as simple as it seems at first glance: in this case, it is extremely important to monitor the ranking so that only really popular data gets into the cache. This becomes possible thanks to the collection of query statistics and the implementation of "data pre-warming", i.e. preloading data into the cache at the start of the application. Cache size management is also an important aspect of caching. In our application, about millions of recommendations are generated, therefore, it is unrealistic to store all this data in the cache. Cache size management is done by removing data from the cache to make room for new data.There are several standard methods: TTL, FIFO, LIFO, last accessed. For now, we are using TTL, because The Redis instance does not go beyond the allocated memory and disk resources.
Recall that the cache is different. Most often, there are two categories: write-through and deferred. In the first case, writing is performed synchronously to both the cache and the main storage. In the second, initially, writing is performed only in the cache, and writing to the main storage will be postponed until the changed data is replaced by another cache block. Write-back cache is more difficult to implement, since it requires monitoring the data in the cache for subsequent writing to the main store when it is removed from the cache. We use the first option in conjunction with the cache warm-up procedure. Note that "warming up" in itself is an important and difficult task, our solution will be discussed in the following articles.
Scale out in a recommendation app
To provide PREMIER access to the recommendation application, we use the HTTP protocol, which is often the main option for application interaction. It is suitable for organizing interaction between applications, especially if Kubernetes and Ingress Controller are used as the infrastructure environment. Using Kubernetes makes scaling easier. The tool is able to automatically balance the request between the pods in the cluster for even work, making it easier for developers to scale up. The Ingress Controller module is responsible for this, which defines the rules for external connection to applications in Kubernetes. By default, applications in Kubernetes are not accessible from the external network. To provide external access to applications, you must declare an Ingress resource that supports automatic balancing.We are using Nginx Ingress Controller, which supports SSL / TLS, URI rewrite rules, and VirtualServer and VirtualServerRoute to route requests to different applications depending on the URI and host header.
The basic configuration in the Ingress Controller allows you to use only the basic functions of Nginx - this is routing based on host and path, and additional functions such as URI rewrite rules, additional response headers, connection timeouts are not available. Annotations applied to the Ingress resource allow you to use the functions of Nginx itself (usually available through the configuration of the application itself) and change the behavior of Nginx for each Ingress resource.
We plan to use the Nginx Ingress Controller not only in the project under consideration, but also in a number of other applications, which we will talk about later. We will talk about this in the following articles.
Risks and consequences of using caching
Any team working to optimize a data-intensive application will have a lot of questions about how to optimize caching. At the same time, the problem with caching cannot be solved "once and for all"; over time, various problems arise. For example, as your user base grows, you may encounter conflicting statistics on the popularity of data by category, and this problem will need to be addressed.
While caching is a powerful tool for speeding up an application, it is not the only solution. Depending on your situation, you may need to optimize the application logic, change the stack and infrastructure environment. You should also be careful when validating non-functional requirements. It is possible that after discussion with the product owner, the application requirements will be overstated. It must be remembered that each of the solutions has its own characteristics.
The risks of providing outdated data, increasing overall solution complexity, and the likelihood of introducing latent bugs must be considered before any caching method is applied to a project. After all, in this case, caching will only complicate problem solving, but rather, it will simply hide performance and scalability problems: are database queries slow? - cache results in fast storage! Are API calls slow? - cache results on the client! This is because the complexity of the code that manages caching increases significantly as the complexity of the business logic increases.
In the first versions of the application, caching really has a tangible effect immediately after implementation. After all, the business logic is also simple: you save the value and get it back. Invalidation is easy because the dependencies between business entities are trivial or nonexistent. However, over time, in order to improve performance, you will need to cache an increasing number of business entities.
Caching is not a silver bullet for performance issues. In many cases, optimizing your code and core storage will do you well in the long run. Moreover, the introduction of caching should be a reaction to the problem, and not a premature optimization.
In conclusion, optimizing application performance and making it scalable is an ongoing process that aims to achieve predictable behavior within specified non-functional requirements. Caching is needed to reduce the hardware cost and development time spent on improving performance and scalability.
Links: