There is a proven architecture I've seen many times to support your web services and applications:
- PostgreSQL for data storage
- Redis for coordinating background job queues (and some limited atomic operations)
Redis is fantastic, but what if I told you that its most common use cases for this stack can actually be achieved using only PostgreSQL?
Scenario 1: job queue
Perhaps the most common use of Redis I've seen is coordinating the submission of jobs from your web service to the background worker pool. The idea is that you want to record the desire to do some kind of background job (perhaps with some input) and ensure that only one of your many background workers will execute it. Redis helps with this because it provides a rich set of atomic operations for its data structures.
But since the release of version 9.5, PostgreSQL has a SKIP
LOCKED function for the SELECT… FOR… statement ( documentation here ). When this option is specified, PostgreSQL will simply ignore any rows that need to wait until the lock is released.
Let's look at this example from the perspective of a background worker:
BEGIN;
WITH job AS (
SELECT
id
FROM
jobs
WHERE
status = 'pending'
LIMIT 1
FOR UPDATE SKIP LOCKED
)
UPDATE
jobs
SET
status = 'running'
WHERE
jobs.id = job.id
RETURNING
jobs.*;
COMMIT;
When FOR UPDATE SKIP LOCKED is specified, row-level locking is implicitly set on any rows returned from SELECT. Also, since you specified SKIP LOCKED, this statement will not be blocked for another transaction. If there is another job ready to be processed, it will be returned. There is no concern that multiple worker processes executing this command will receive the same row due to row-level blocking.
The biggest caveat for this method is that if you have a large number of workers trying to execute this queue and a large number of jobs supply them, they might spend some time jumping between jobs and trying to acquire a lock. In practice, most applications I've worked on have less than a dozen background workers, and the costs are unlikely to be significant.
Scenario 2: application blocking
Let's say you have a synchronization routine with a third-party service, and you only need one instance of it, working for any given user across all server processes. Another common Redis application I've seen is distributed locking.
PostgreSQL can also achieve this using recommendation locks (advisory locks). advisory locks allow you to use the same locking mechanism that PostgreSQL uses internally, for your own application-defined purposes.
Scenario 3: Pub / Sub
I left the coolest example for last: sending events to your active clients. For example, suppose you need to notify a user that they have a new message available for reading. Or perhaps you want to transfer data to the client when it becomes available. Typically websockets are the transport layer for these events, while Redis serves as the Pub / Sub engine.
However, starting in version 9, PostgreSQL also provides this functionality with the LISTEN and NOTIFY operators ... Any PostgreSQL client can subscribe (LISTEN) to a specific message channel, which is just an arbitrary string. When any other client sends a NOTIFY message over this channel, all other subscribed clients will be notified. Optionally, you can attach a small message.
If you are using Rails and ActionCable, using PostgreSQL is even supported out of the box.
Taking full advantage of PostgreSQL
Redis fundamentally occupies a different niche than PostgreSQL, and excels in what PostgreSQL does not strive for. Examples include caching data with TTL, and storing and processing ephemeral data.
However, PostgreSQL has a lot more power than you might expect when you approach it in terms of just another SQL database or some mysterious entity that lives behind your ORM.
There is a good chance that the things you use Redis for might turn out to be good things for PostgreSQL as well. It might be worth ditching Redis and saving on operational costs and development complexity by relying on multiple data services.