Every time some production environment can be simplified, it is good news in my opinion. The ideal situation with Rails would be if there is a simple way to switch back to Redis, so that you can start simple, and as soon as you hit some fundamental issue with using SolidQueue (mostly scalability, I guess, in environments where the queue is truly stressed -- and you don't want to have a Postgres scalability problem because of your queue), you have a simple upgrade path. But I bet a lot of Rails apps don't have high volumes, and having to maintain two systems can be just more complexity.
the problem i see here is that we end up treating the background job/task processor as part of the production system (e.g. the server that responds to requests, in the case of a web application) instead of a separate standalone thing. rails doesn’t make this distinction clear enough. it’s okay to back your tasks processor with a pg database (e.g. river[0]) but, as you indirectly pointed out, it shouldn’t be the same as the production database. this is why redis was preferred anyways: it was a lightweight database for the task processor to store state, etc. there’s still great arguments in favor of this setup. from what i’ve seen so far, solidqueue doesn’t make this separation.
For people that does not think it scales. A similar implementation in Elixir is Oban. Their benchmark shows a million jobs per minute on a single node (and I am sure it could be increased further with more optimizations). I bet 99,99999% of apps have less than a million background jobs per minute.
This benchmark is probably as far removed from how applications use task queues as it could possibly be. The headline is "1 million jobs per minute", which is true. However...
- this is achieved by queuing batches of 5000 jobs, so on the queue side this is actually not 1 million TPS, but rather 200 TPS. I've never seen any significant batching of background job creation.
- the dispatch is also batched to a few hundred TPS (5ms ... 2ms).
- acknowledgements are also batched.
So instead of the ~50-100k TPS that you would expect to get to 17k jobs/sec, this is probably performing just a few hundred transactions per second on the SQL side. Correspondingly, if you don't batch everything (job submission, acking; dispatch is reasonable), throughput likely drops to that level, which is much more in line with expectations.
Semantically this benchmark is much closer to queuing and running 200 invocations of a "for i in range(5000)" loop in under a minute, which most would expect virtually any DB to handle (even SQLite).
The one use case where a DB backed queue will fail for sure is when the payload is large. For example, you queue a large JSON payload to be picked up by a worker and process it, then the DB writing overhead itself makes a background worker useless.
I've benchmarked Redis (Sidekiq), Postgres (using GoodJob) and SQLite (SolidQueue), Redis beats everything else for the above usecase.
SolidQueue backed by SQLite may be good when you are just passing around primary keys. I still wonder if you can have a lot of workers polling from the same database and update the queue with the job status. I've done something similar in the past using SQLite for some personal work and it is easy to hit the wall even with 10 or so workers.
Interesting, as a self-contained minimalistic setup.
Shouldn't one be using a storage system such as S3/garage with ephemeral settings and/or clean-up triggers after job-end ? I get the appeal of using one-system-for-everything but won't you need a storage system anyway for other parts of your system ?
Have you written up somewhere about your benchmarks and where the cutoffs are (payload size / throughput / latency) ?
I'm guessing you're with that adding indirection for what you're actually processing, in that case? So I guess the counter-case would be when you don't want/need that indirection.
If I understand what you're saying, is that you'll instead of doing:
- Create job with payload (maybe big) > Put in queue > Let worker take from queue > Done
You're suggesting:
- Create job with ID of payload (stored elsewhere) > Put in queue > Let worker take from queue, then resolve ID to the data needed for processing > Done
Is that more or less what you mean? I can definitively see use cases for both, heavily depends on the situation, but more indirection isn't always better, nor isn't big payloads always OK.
- Persist payload in db > Queue with id > Process via worker.
Queueing it directly via queue can be tricky. Any queue system usually will have limits on the payload, for good reasons. Plus if you already commit to db, you can guarantee the data is not lost and can be process again however you want later. But if you queue is having issue, or failed to queue, you might lost if forever.
> Job latency under 1ms is critical to your business. This is a real and pressing concern for real-time bidding, high frequency trading (HFT), and other applications in the same ilk.
From TFA. Are there really people using Rails for HFT?
Interesting migration story! I've been using Redis for background jobs for years and it's been solid, but the operational overhead is real.
Curious about your experience with SolidQueue's reliability - have you run into any edge cases or issues with job retries/failures? Redis has been battle-tested for so long that switching always feels risky.
Would love to hear more about your production experience after a few months!
Email is in my profile. I’m currently building something in this space and I’m looking for early adopters. Reach out, I’d love to show you what we have!
For Node.js, my startup used to use [Graphile Worker](https://github.com/graphile/worker) which utilised the same "SKIP LOCKED" mechanism under the hood.
We ran into some serious issues in high throughput scenarios (~2k jobs/min currently, and ~5k job/min during peak hours) and switched to Redis+BullMQ and have never looked back ever since.
Our bottleneck was Postgres performance.
I wonder if SolidQueue runs into similar issues during high load, high throughput scenarios...
Comparing Redis to SQL is kinda off topic. Sure you can replace the one with the other but then we are talking about completely different concepts aren't we?
When all we are talking about is "good enough" the bar is set at a whole different level.
We're talking about business challenges/features which can be solved by using either of the solutions and analyzing pros/cons. It's not like Redis is bad, but sometimes it's an over-engineered solution and too costly
I wish you'd have expanded on that. I almost always learn about some interesting lower-level tech through people trying to avoid a full-featured heavy-for-their-use-case tool or system.
Nice article, I'm just productionising a Rails 8 app and was wondering whether it was worth switching from SolidQueue (which has given me no stress in dev) to Redis ... maybe not.
Wearing my Ruby T-Shirt (ok, Rubyconf.TH, but you get the gist) while reading this makes me fully approving and appreciating your post! It totally resonates with my current project setups and my trying to get them as simple as possible.
Especially when building new and unproven applications I'm always looking for things that trade the time I need to set tings up properly with he time I need to BUILD THE ACTUAL PRODUCT. Therefore I really like the recent changes to the Ruby on Rails ecosystem very much.
What we need is a larger user base setting everything up and discovering edge-cases and (!) writing about it (AND notifying the people around Rails). The more experience and knowledge there is, the better the tooling becomes. The happy path needs to become as broad as a road!
Like Kamal, at first only used by 36signals and now used by them and me. :D At least, of course.
[0]: https://riverqueue.com/
https://oban.pro/articles/one-million-jobs-a-minute-with-oba...
- this is achieved by queuing batches of 5000 jobs, so on the queue side this is actually not 1 million TPS, but rather 200 TPS. I've never seen any significant batching of background job creation.
- the dispatch is also batched to a few hundred TPS (5ms ... 2ms).
- acknowledgements are also batched.
So instead of the ~50-100k TPS that you would expect to get to 17k jobs/sec, this is probably performing just a few hundred transactions per second on the SQL side. Correspondingly, if you don't batch everything (job submission, acking; dispatch is reasonable), throughput likely drops to that level, which is much more in line with expectations.
Semantically this benchmark is much closer to queuing and running 200 invocations of a "for i in range(5000)" loop in under a minute, which most would expect virtually any DB to handle (even SQLite).
That being said, I regret that we have switched from good_job (https://github.com/bensheldon/good_job). The thing is - Basecamp is a MySQL shop and their policy is not to accept RDMS engine specific queries. You can see in their issues in Github that they try to stick "universal" SQL and are personally mostly concerned how it performs in MySQL(https://github.com/rails/solid_queue/issues/567#issuecomment... , https://github.com/rails/solid_queue/issues/508#issuecomment...). They also still have no support for batch jobs: https://github.com/rails/solid_queue/pull/142 .
I've benchmarked Redis (Sidekiq), Postgres (using GoodJob) and SQLite (SolidQueue), Redis beats everything else for the above usecase.
SolidQueue backed by SQLite may be good when you are just passing around primary keys. I still wonder if you can have a lot of workers polling from the same database and update the queue with the job status. I've done something similar in the past using SQLite for some personal work and it is easy to hit the wall even with 10 or so workers.
Shouldn't one be using a storage system such as S3/garage with ephemeral settings and/or clean-up triggers after job-end ? I get the appeal of using one-system-for-everything but won't you need a storage system anyway for other parts of your system ?
Have you written up somewhere about your benchmarks and where the cutoffs are (payload size / throughput / latency) ?
If I understand what you're saying, is that you'll instead of doing:
- Create job with payload (maybe big) > Put in queue > Let worker take from queue > Done
You're suggesting:
- Create job with ID of payload (stored elsewhere) > Put in queue > Let worker take from queue, then resolve ID to the data needed for processing > Done
Is that more or less what you mean? I can definitively see use cases for both, heavily depends on the situation, but more indirection isn't always better, nor isn't big payloads always OK.
- Persist payload in db > Queue with id > Process via worker.
Queueing it directly via queue can be tricky. Any queue system usually will have limits on the payload, for good reasons. Plus if you already commit to db, you can guarantee the data is not lost and can be process again however you want later. But if you queue is having issue, or failed to queue, you might lost if forever.
Reminds me of Antirez blog post that when Redis is configured for durability it becomes like/slower than postgresql http://oldblog.antirez.com/post/redis-persistence-demystifie...
From TFA. Are there really people using Rails for HFT?
https://nanovms.com/dev/tutorials/running-postgres-as-a-unik...
The MySQL + Redis + AWS' elasti-cron (or whatever) was a ghetto compared to Postgres.
Turns out it is a matter of feature set.
Curious about your experience with SolidQueue's reliability - have you run into any edge cases or issues with job retries/failures? Redis has been battle-tested for so long that switching always feels risky.
Would love to hear more about your production experience after a few months!
We ran into some serious issues in high throughput scenarios (~2k jobs/min currently, and ~5k job/min during peak hours) and switched to Redis+BullMQ and have never looked back ever since. Our bottleneck was Postgres performance.
I wonder if SolidQueue runs into similar issues during high load, high throughput scenarios...
When all we are talking about is "good enough" the bar is set at a whole different level.
Especially when building new and unproven applications I'm always looking for things that trade the time I need to set tings up properly with he time I need to BUILD THE ACTUAL PRODUCT. Therefore I really like the recent changes to the Ruby on Rails ecosystem very much.
What we need is a larger user base setting everything up and discovering edge-cases and (!) writing about it (AND notifying the people around Rails). The more experience and knowledge there is, the better the tooling becomes. The happy path needs to become as broad as a road!
Like Kamal, at first only used by 36signals and now used by them and me. :D At least, of course.
Kudos!
Best, Steviee