It all started with a simple task - to display new products to the user, taking into account his individual preferences. And if there were no problems with obtaining new products, then correlating new products with preferences (analysis of statistics) already created a tangible load (for example, let's define it at 4 seconds). The peculiarity of the task was that entire organizations can act as users. And it is not uncommon for 200-300 requests pertaining to one user to arrive at the server at once (within 2-3 seconds). Those. the same block is generated for many users at once.
The obvious solution is to cache it in RAM (we will not expose the DBMS to violence, forcing it to process a large flow of calls). Classic scheme:
- Request came
- Checking the cache. If there is data in it, and they are not outdated, we just give it back
- No data => generating an issue
- We send to the user
- Additionally, we add it to the cache, indicating the TTL
The disadvantage of this solution: if there is no data in the cache, all requests that came during the first generation will generate them, spending server resources on this (load peaks). And of course, all users will wait at the "first call".
Also note that with individual cache values, the number of records can grow so much that the available server RAM is simply not enough. Then it seems logical to use a local HDD server as a cache storage. But we immediately lose speed.
How to be?
The first thing that comes to mind: it would be great to store records in 2 places - in RAM (frequently requested) and HDD (all or only rarely requested). The concept of "hot and cold data" in its purest form. There are many implementations of this approach, so we will not dwell on it. Let's just designate this component as 2L. In my case, it is successfully implemented based on the Scylla DBMS.
But how to get rid of drawdowns when the cache is out of date? And here we include the concept of 2R, the meaning of which is a simple thing: for a cache record, you need to specify not 1 TTL value, but 2. TTL1 is a timestamp, which means "data is outdated, it should be regenerated, but you can still use it"; TTL2 - "everything is so outdated that it can no longer be used."
Thus, we get a slightly different scheme of caching:
- Request came
- We are looking for data in the cache. If the data is there and is not outdated (t <TTL1) - we give it back to the user, as usual and do nothing else.
- The data is there, outdated, but it can be used (TTL1 <t <TTL2) - we give it to the user AND initialize the procedure for updating the cache record
- There is no data at all (killed after TTL2 expires) - we generate it "as usual" and write it to the cache.
- After serving the content to the user or in a parallel stream, we perform the procedures for updating the cache records.
As a result, we have:
- if cache records are used often enough, the user will never find himself in a situation of "waiting for the cache to be updated" - he will always receive a ready-made result.
- if the queue of "updates" is properly organized, then it is possible to achieve the fact that in the case of several simultaneous accesses to the record with TTL1 <t <TTL2, only 1 task for updating will be in the queue, and not several identical ones.
As an example: for a new product feed, you can specify TTL1 = 1 hour (nevertheless, new content does not appear very intensively), and TTL2 - 1 week.
In the simplest case, the PHP code to implement 2R can be like this:
$tmp = cache_get($key);
If (!$tmp){
$items = generate_items();
cache_set($items, 60*60, 60*60*24*7);
}else{
$items = $tmp[‘items’];
If (time()-$tmp[‘tm’] > 60*60){
$need_rebuild[] = [‘to’=>$key, ‘method’=>’generate_items’];
}
}
…
//
echo json_encode($items);
…
// ,
If (isset($need_rebuild) && count($need_rebuild)>0){
foreach($need_rebuild as $k=>$v){
$tmp = ['tm'=>time(), 'items'=>$$v[‘method’]];
cache_set($tmp, 60*60, 60*60*24*7);
}
}
In practice, of course, implementation is likely to be more difficult. For example, a cache records generator is a separate script launched as a service; queue - via Rabbit, the sign "such a key is already in the queue for regeneration" - via Redis or Scylla.
So, if we combine the “two-band” approach and the concept of “hot / cold” data, we get 2R2L.
Thanks!