> The reason DBs like Mongo or Dynamo exist is because Postgres has a scaling problem.
I've used Postgres at a few places and the #1 problem was always high availability, not scaling. One Postgres cluster could easily handle 100000 transactions per minute, but when a primary node went down it was a page and manually failing over to the spare then manually replacing the spare. The manual tooling was very finicky but at least it worked, no automated solution came even close. Lack of a good HA story is why I avoid self-managed Postgres as much as possible.
I've extensively used Dynamo (internally at Amazon and externally) and even founded a DB startup with it at it's core. Boiling down scalability of Postgres vs Dynamo as it's written in blog is a bit terse. Dynamo scales writes horizontally with the keyspace, forever. Postgres simply can't, and no number of layers between the machines and the developer changes that. Sharding, pooling, Citus are all layered on top of an engine where a given row's writes still land on one
primary.
Curious how the DB startup with Dynamo at its core went. We use it heavily. The primary tricky thing for us at the moment is aligning pricing with workload value.
We obsessed over optimizations and pushing the apis to the limits of how we could pack it.
So much so, we re-wrote the DynamoSDK to squeeze out more optimizations so we could be the same cost even though we were a layer in front of dynamo. We used key encoding and other various technique as well as managed capacity (on demand vs reserved) to transparently optimize workloads for price. In our experience we saw dramatic gains vs just vanilla SDK usage.
If you're curious, here was the marketing website, but we're now part of Databricks: https://stately.cloud/
Interesting! We interact with the low-level APIs too vs the SDK, also: an IO scheduler for request batching and conn management, request hedging, full MVCC transactions, etc. We store raw bytes in DDB and manage schema/etc elsewhere. Curious if there is other low-hanging fruit, or not so low, you found that we haven't discovered yet.
Dynamo is a fundamentally different DB to Postgres. If your problem fits into the dynamo approach (I'd argue that more problems do), then you should be using it. No all problems fit, though.
Agreed, my critique was about how the article frames scalability. I've yet to see an OLTP problem that can't live in something like Dynamo. KV can model anything if you put in the work, the question is how much modeling discipline you trade for the scale, and in my experience the up front work is always worth it. Most of the time operational issues are swept under the rug and not consider tech debt.
Take for example AuroraDB: the sheer engineering it took to make SQL do scalable OLTP at all tells you how much that flexibility actually costs to keep.
Not by itself if it's naive, but if it's able to assess target health and avoid degraded instances then it becomes a component in HA, the other being integrating an orchestrator for gracious recovery.
> PgDog does not detect primary failure and will not call pg_promote(). It is expected that the databases are managed externally by another tool, like Patroni or AWS RDS, which handle replica promotion.
HA has to be all the way through, in which case you might not need a load balancer because each client already connects to a separate server. If you do, then you can have one load balancer per client machine.
CNPG is quite nice and robust but I'd still be a bit reluctant to stack PG on k8s for really big clusters just because k8s ecosystem moves quite quickly and there's lots of patching/maintenance/churn which means more PG failovers so depends on how well your workload handles that (they're normally only a few seconds)
"Why Us" => "I ran Postgres at Instacart, where we scaled the company 5x in April of 2020. The biggest problem we had was making Postgres serve 100,000s of grocery delivery orders per minute"
Instacard have released a public dataset[1] on their orders, so it should be even easier to verify this claim. From what I could find in some analysis[2] of this dataset around 100k orders per day and not per minute seems accurate.
I assume they are referring to how many database requests they have due to customers orders or a similar metric and just worded it poorly.
I worked at a company that had billions of views per year on a single big Postgres instance. Extremely read heavy with many queries needed for a page load. You can cache a lot of things.
It was one of the top real estate portals in the world. A lot of geolocation searches. New search every time someone moves the map. A ton of data sent to the client. Analytics in every page view.
No clue how a shopping cart or checkout flow would drastically increase database load. It should just be basic CRUD. Building a shopping cart is something every student makes. Pages in a web store can be cached relatively easily since items won't change often.
A primary DB with a few replicas and caching can go a really long way.
The composition of the average transaction will be different in a shopping cart (lots of writes and updates) compared to your use case which sounds like it skewed read heavy. With Postgres it’s generally easier to scale reads because it doesn’t really matter which replica the query hits, as long as it contains the data it needs. Whereas write-heavy workloads route through a single-writer bottleneck.
There’s challenges scaling read-heavy workloads, for sure — but they’re generally more straight forward than scaling write-heavy workloads. You can get away with more dumb horizontal scaling than with writes.
That doesn't necessarily mean _new_ orders per minute. Their app or website could poll for updates every 15 seconds
Could just be looking at the "orders" endpoint in their app which might also include incremental updates as shoppers get items from the store. It's a fairly ambiguous statement
Is that still a lot? Feels like a single 64-core, 256GB RDS instance with some caching should handle that fine. RDS has instances up to 192-core and 768GB.
I am trying to gain a basic understanding of this:
Right now I have a 4TB DB on one large box.
Is the idea that using a proxy tool like PGDog I could spin up 8 smaller boxes handling ~500GB each and then one medium box for the proxy?
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this.
We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us.
We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.
> 8 smaller boxes handling ~500GB each and then one medium box for the proxy?
That's exactly right. Get in touch (lev@pgdog.dev), happy to help or at the very least tell you what current works (or doesn't) so you know what your options are.
I'm curious how this might help with our biggest downtime-causer with postgres, which is major version upgrades. Poolers do a great job for failover and load balancing, but we consistently need ~10-20 minutes of downtime once or twice a year to do upgrades. Logical replication between old->new versions could probably help, but it would still require flipping everything over to the new cluster without partial writes or anything silly. Anybody have experience with this?
Logical replication is how this is typically done. If you have some infra-as-code setup, you create a new cluster with identical settings except for the major version, import the schema, start copying data from a read-replica running the old version, stop accepting writes from the old version (downtime starts), sync the sequence numbers, and point your services to the new cluster (downtime ends).
If you use something like CloudNativePG they automate parts of the process with cli tools and declarative syntax. Otherwise you take the time to figure it out by hand. It might sound complicated, but just practice on your staging DB, and if all goes well you do the same procedure in prod.
Seconded. Coming from MySQL this is a huge regression that makes Postgres look like something from the 80s. I still wonder why this isn't seen as the absolutely highest priority.
I have not ran MySQL for some years but it at least used to have exactly the same issue. Upgrading a database with MySQL can take a long time if you have many tables. The main difference is only really that PostgreSQL does it with a separate tool, pg_upgrade, while MySQL does it as part of the main binary.
For both MySQL and PostgreSQL you will need to use some kind of logical upgrades if you want no downtime.
MySQL has advocated for decades spinning up a replica with the upgraded version, waiting for it to catch up to master before promoting it to the new master. You can do the same thing with Postgres.
Exactly, MySQL and PostgreSQL are the same here. Maybe one is a bit faster than the other at doing major version upgrades but the behaviours are quite similar.
It is also a bit tricky tradeoff. You do not want to be stuck with the same data format forever. So databases like MySQL and PostgreSQL need a downtime when doing a major version upgrade. They both try to keep it short, usually seconds, but minutes can happen in either database.
Logical replication needs a special 'upgrade' use case that will automate most of its pain points away. I understand why DDL does not replicate, and that you may want to replicate to a data warehouse that only needs some columns, etc, but there should be a case just for upgrading that handles all DDL, sequences all existing everything, and just works...
Do other RDBMSs have this? I genuinely have no clue. I've been fortunate enough to be able to get away with one primary and multiple secondaries at my largest usage of Postgres. Multi-master is the kind of thing I am fully out of my depth on, so I'm curious if there's a well defined path for implementation here or what.
Commercial RDBMS (oracle/mssql) have had it in some form for awhile, with pluses and minuses. Open source DBs have had bolt-ons, including BDR for pgsql.
Multi-master is hard. The main issue is what to do with commit/replication lag. It's far "easier" if support for eventual consistency is ok with your use case. In some cases it's not. Also, the problems related to read-only lag can happen on multi-master instances. If somebody does a giant long running query on one of the masters, the target instance needs to hold the data state for the query, even if the underlying DB is getting updates. It also needs to still keep up with other masters. This means the whole cluster can slow down if the multi-master replication is synchronous. Depending on a variety of factors, that can chew up disk space, memory, etc.
There are ways of dealing with these issues (and others), but it comes with tradeoffs with performance, etc.
It has been tried many times. Good luck to pgdog, but there’s a reason these projects don’t stick.
Multi master, from even a conceptual perspective, is incredibly complicated. Databases, transactions, consistency, parallelism are all very complicated.
It’s something that always seems promising at the start but as soon as maintenance and long term improvements enter the picture(ie integrating new Postgres versions), the complexity becomes too much.
In fact postgresML took naming heat because Postgres is right there in the name and they weren't affiliated with the brand. "pg" is just two letters. like WP-engine (literally the name as they say it is "double U P engine").
And a cat and a dog is fun.
don't think they're trying to get one over on you.
> you should own that layer yourself inside of your infrastructure
Unless you have millions of users, you don't really need this. It would be nice to have but its not a pressing need. So why invest into developing something that you only need once you are at massive scale? At this point you might as well switch away from Postgres because you'll surely have the manpower to do it.
Even with a proxy like PgDog the Postgres sharding story isn't solved. Resharding with logical replication is unlikely to work with databases which are already TBs in size. I never got it to catch up, I had to sync data at the filesystem level which is terrible. Tools like pg_repack also fall apart at scale.
For those that get to a point where a sharding proxy is required, switching databases is a very appealing solution.
And for those that are almost there, application side sharding is more flexible than building a query routing proxy.
Doesn’t PgDog also handle the sharding by proxying the writes? Maybe I missed something but I thought this is their value prop. It’s not just another PgBouncer.
Love PgDog. I don't need it honestly, but using it in my on-prem k8s because I heard about you in Postgres FM podcast randomly when I had nothing to listen to on a hike in the woods and it picked up my interest.
I notice there is an Enterprise Edition, can you please specify which features are not open source? Do you predict new features you add will be ee licensed as a way to pay back your VC funders?
If your working set is 20 TB, then it's pretty big. Each database has its own mix of hot/cold data, so it's impossible to compare without more information. A better measure might be IOPS. RDS has fairly low maximum IOPS unless you spend a lot more for provisioned IOPS or use Aurora.
You are correct. As a point of comparison: almost ten years ago at Segment we had a single Aurora PostgreSQL instance with ~50T of data, it was used to index potential identity data in a much larger corpus of files stored in S3.
I tried out PgDog a while ago, but couldn't find a good way of handling the config except for having this users / pgdog toml file, which makes it a bit awkward to handle in kubernetes where we often do multi-tenancy in postgres - or rather having many databases on the same instance(s), and have them come and go at will.
Also had an issue with it because it cached authentication requests when doing passthrough it seems, I'd changed the roles password, but it kept using the old one, which was no bueno ;).
PgDog seems to make more sense when you really care about a few databases that need massive scale, rather than a simple proxy in front of postgres. I'll keep following the development though, it is much needed in this space, postgres can use all the investment it can get to get it past the single machine scale that it excels at currently.
We successfully did this with pgdog at $JOB using our own "controller" -- the same service that handles deploying new instances of our application (instancing an argoCD Application that fires Crossplane DB creation, making new Deployments of bricks, etc) will also, at the end of that process, scan the cluster for Database CRDs, use those to generate a new pgdog.toml + users.toml, update the Secrets in the cluster, enable maintenance mode on all pgdog pods, do a live config reload on each of them, then disable maintenance mode (this is to make the change atomic between all the pgdog instances). Downtime there is about 2-3 seconds and all it does is make new SQL requests from existing clients wait, it doesn't break the connection or anything.
Not the place and not the time, but we are building an enterprise edition that "just works" out of the box. Not saying that the open source experience cannot be better - it always can and we'll keep improving. What you've experienced is definitely a known issue with our specific implementation of passthrough auth. Scram made things a bit harder, since we can't validate user's passwords at login time anymore (that's what makes scram secure fwiw).
Happy to chat about this, but we use the AWS secrets manager flowing into External Secrets Operator to generate a pgdog_users.toml. We then kick off a workflow to refresh things, but our rate of change here is much smaller than a super dynamic multi-tenant system.
You could also build a watcher side car that watches for changes of the pgdog_users.toml and have pgdog refresh itself then too with this combination. We thought about that but prefer to control the reloads for our needs.
I'd love to advocate for PgDog if there were more than 2 managed service providers. Adding a single company with no substitute in your supply chain feels hard
> With $5.5M from Basis Set, YC, Pioneer Fund and other great investors, we have years of runway,
This is years of product development with a three person team. If Enterprise sales and support are a big part of your business plan it will suck up a lot more than that.
Is there an explainer for people who are broadly familiar with the DB space? It sounds like you're building an equivalent to Vitesse for Postgres, but it's not super clear from the article (which I know is not the point of this, but still :) ).
Edit: It also might be interesting to point out how your solution differs from what the folks at Planetscale are building https://planetscale.com/neki
This is not an extension, it's a proxy! Very different. You can deploy it anywhere already without having to wait for upstreaming or your cloud provider adding support for it. It's one of the two reasons why we built it this way, the other being performance (it's much faster to do this in the proxy than inside Postgres).
>PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code.
Still trying to figure out how this works technically, is the performance gain really just re-write in rust?
Not quite. The performance gain is to bring those features to Postgres!
Edit:
Performance gains are from having the ability to load balance reads (horizontal scaling for read queries) and scale out writes (with sharding). Once instance bottleneck in Postgres has many faces:
1. Behind schedule vacuums because of too many dead tuples (too many writes)
2. The WALWriter is single-threaded and IO-bound - Postgres can only do about 200-300MB/sec in writes per instance (real prod numbers on EC2 with NVMes and ZFS, basically best case scenario).
3. Bulkheading: single primary is a single point of failure. With 12 primaries, if one fails, 91% of your customers don't notice.
The list goes on. Rust is just a side effect. We love it because it's fast and correct - the perfect match for a database product.
Suggestion: have more than just helm and Docker in your quickstart documentation. I'd like to try this out just to see what it can do, but not quite enough to fire up one of those systems for it.
We should add it to brew/apt/etc for sure. Also, we could add it to crates.io so you could do something like `cargo install pgdog`. Distribution, distribution, distribution.
The docker compose example is just a demo. I don't know anyone who runs Postgres with docker compose / swarm in prod :) But yes, happy to add volumes so it seems more real.
OLAP means different things to different people. For us, it's just making sure your admin dashboard keeps working basically:
SELECT tenant_id, COUNT(clicks)
FROM users
GROUP BY tenant_id
ORDER BY 2 DESC
LIMIT 25;
Performance is a side effect - definitely needed and we'll do everything we can, but we are not competing with ClickHouse or Snowflake - just trying to make sharded Postgres work with your app.
Re OLAP: It's probably ~good enough~ for a lean team that's trying to keep the tech stack standard and/or doesn't have a dedicated data person to take advantage of a columnar store.
I'm a big PGDog fan! It really helped us scale our connection proxy needs pretty substantially and it has great features like auto mode to support Aurora failovers neatly. It's infra that just works.
I've loved using pgdog for the last 6 months. It's been incredibly stable. It's nifty how they've solved the LISTEN/NOTIFY on a transaction pooler problem.
It’s surprising they don’t mention advantages over other sharding systems like Citus. Maybe it’s just the fact that it’s only a proxy and not core extensions? But that could limit capabilities.
The same old processes vs. threads debate, plus having the ability to scale the coordinator past a single machine. So, if you're OLTP, definitely consider PgDog. OLAP - Citus still wins because of its advanced query engine. We'll get there.
the reason mongo is a joy to use in scaled env is because no additional setup/software needed and all drivers natively support secondary/primary writes/reads and topological changes. so it's end to end, and adding is as a new proxy in frontend of postgres leads to all clients being incompatible or the code itself has no control anymore about when to use a secondary and what allowed stall is acceptable for a particular query. Any solutions to this by pgdog?
> all drivers natively support secondary/primary writes/reads and topological changes.
Expanding on that a bit, mongo drivers even have a shared specification of the state machine for monitoring topology changes[1] and algorithm for selecting the server to send an operation to[2] (along with various declarative test cases that the drivers use to validate them alongside the specs in the repo). I think people sometimes underestimate how important the client-side work is to this sort of experience; for all of the faults mongo has had over the years, the amount of investment that they put into the client libraries is something I've never seen anywhere else (although having spent several years working on some of these libraries, my take is likely very biased).
once mongo rewrote their engine - it's performant, scales & easy to run. seems a lot of devs got burnt by the early issues don't consider it all together.
its probably the easiest database to run at scale. run & forget. you just have to do a little more work on the data modeling part before you write your application i.e consider your query patterns.
I do tenant per PG schema, most are smallish some are bigger (not much, can do all in a single box) but moving forward eventually will need something like this. Also plan to provide "get your own VPS" for more enterprise customers.
Depends. Only pooling, very little. Load balancing/sharding needs to parse queries, so a bit more. Could go up to a GB per pod, sometimes more if you have a lot of unique SQL queries (unique by text, not by parameters). We cache query ASTs to avoid parsing them on each request - that's the bulk of memory usage.
Semi related question - I have always wondered, how do you tackle OOM issues at the proxy layer, i.e. let's say a particular SQL query requires proxy to fan out the query to multiple shards, which return a pretty large dataset. I'm assuming you would need to load this dataset in the ram to perform certain operations. What happens if the resulting dataset causes the proxy pod to go OOM?
Nit-Pick: It might be anti-marketing, still it would be helpful if the use cases can be articulated in a way where it would make sense to use this Vs any other type of database. Honesty goes a long way with the more technical folks for anything related to infrastructure.
Surfacing where and how PG is better than Dynamo or any other database is probably a good starting point instead of calling out PG a silver bullet for everything. At the end of the day its all a trade-off.
i am not using any tool like pgbouncer and have not run into any issues so far. Is it even required these days? Have you guys tested your setup without these connection poolers/multiplexers?
Wrt. the pooler, how do you compare with pgbouncer?
I'm interested because I have a postgres instance, low-traffic but still like ... tens of r(eads)ps. I was not running anything close to the machine limits but still added pgbouncer to improve performance and didn't see a noticeable difference. I was stress-testing the machine obv., I'm not talking about the 10 rps, lol.
For context, my numbers were something like 10k rps +/- 1k vanilla postgres and like 9k rps +/- 1k with pgbouncer in front of it. So ... slightly slower but big error bars so I wouldn't say for sure. I ended up not using pgbouncer as the benefit was immaterial.
Also yeah, in case you want to check it out, it's the db that backs this project: https://httpstate.com.
They are not just some random 3 have decades of real db experience behind them. They also just got funded which gives them the ability to expand and stay longer in the game.
I am odd, yes. I also care deeply about supply chain security and focused on it when I led the crates.io team as well as during my time at the Rust Foundation. You can rest assured that my occasional shitposts are not opening an attack avenue for your supply chain.
> The reason DBs like Mongo or Dynamo exist is because
Not quite. The reason "DBs" like those exist is purely due to fashion. Lets not kid ourselves into thinking they do anything better, save the exception of making data hard to access, which might be a project goal in some cases.
I've used Postgres at a few places and the #1 problem was always high availability, not scaling. One Postgres cluster could easily handle 100000 transactions per minute, but when a primary node went down it was a page and manually failing over to the spare then manually replacing the spare. The manual tooling was very finicky but at least it worked, no automated solution came even close. Lack of a good HA story is why I avoid self-managed Postgres as much as possible.
Load balancer with health checks and failover, works out of the box. :) Battle-tested at this point too, so could be worth a look.
So much so, we re-wrote the DynamoSDK to squeeze out more optimizations so we could be the same cost even though we were a layer in front of dynamo. We used key encoding and other various technique as well as managed capacity (on demand vs reserved) to transparently optimize workloads for price. In our experience we saw dramatic gains vs just vanilla SDK usage.
If you're curious, here was the marketing website, but we're now part of Databricks: https://stately.cloud/
I don’t think the backend matters. It’s the frontend wrapper that makes or breaks HA.
Take for example AuroraDB: the sheer engineering it took to make SQL do scalable OLTP at all tells you how much that flexibility actually costs to keep.
> PgDog does not detect primary failure and will not call pg_promote(). It is expected that the databases are managed externally by another tool, like Patroni or AWS RDS, which handle replica promotion.
https://github.com/patroni/patroni
Couldn't be a better why us :)
Instacart doesn't need "100,000s of grocery delivery orders per minute".
There must be some 0s added for the sake of the story.
It might make 100k row level changes per minute, but that’s a different metric.
https://www.sec.gov/Archives/edgar/data/1579091/000157909126...
I assume they are referring to how many database requests they have due to customers orders or a similar metric and just worded it poorly.
[1] https://www.kaggle.com/datasets/psparks/instacart-market-bas... [2] https://rstudio-pubs-static.s3.amazonaws.com/284199_5c498037...
No clue how a shopping cart or checkout flow would drastically increase database load. It should just be basic CRUD. Building a shopping cart is something every student makes. Pages in a web store can be cached relatively easily since items won't change often.
A primary DB with a few replicas and caching can go a really long way.
There’s challenges scaling read-heavy workloads, for sure — but they’re generally more straight forward than scaling write-heavy workloads. You can get away with more dumb horizontal scaling than with writes.
Could just be looking at the "orders" endpoint in their app which might also include incremental updates as shoppers get items from the store. It's a fairly ambiguous statement
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.
That's exactly right. Get in touch (lev@pgdog.dev), happy to help or at the very least tell you what current works (or doesn't) so you know what your options are.
This is for DBs that are ~1-1.5TB but doesnt have a huge amount of churn/qps
Effectively what is described here https://www.pgedge.com/blog/always-online-or-bust-zero-downt...
If you use something like CloudNativePG they automate parts of the process with cli tools and declarative syntax. Otherwise you take the time to figure it out by hand. It might sound complicated, but just practice on your staging DB, and if all goes well you do the same procedure in prod.
Edit: Apparently Postgres 19 has a patch for one-shot logical replication of sequences! https://www.depesz.com/2025/11/11/waiting-for-postgresql-19-...
For both MySQL and PostgreSQL you will need to use some kind of logical upgrades if you want no downtime.
At this point i wonder if i'll ever see that.
Multi-master is hard. The main issue is what to do with commit/replication lag. It's far "easier" if support for eventual consistency is ok with your use case. In some cases it's not. Also, the problems related to read-only lag can happen on multi-master instances. If somebody does a giant long running query on one of the masters, the target instance needs to hold the data state for the query, even if the underlying DB is getting updates. It also needs to still keep up with other masters. This means the whole cluster can slow down if the multi-master replication is synchronous. Depending on a variety of factors, that can chew up disk space, memory, etc.
There are ways of dealing with these issues (and others), but it comes with tradeoffs with performance, etc.
Multi master, from even a conceptual perspective, is incredibly complicated. Databases, transactions, consistency, parallelism are all very complicated.
It’s something that always seems promising at the start but as soon as maintenance and long term improvements enter the picture(ie integrating new Postgres versions), the complexity becomes too much.
Don't pay a startup for your DB proxy, you should own that layer yourself inside of your infrastructure.
pg cat
pg dog
What's he going to name the next version?
pg emu ?
In fact postgresML took naming heat because Postgres is right there in the name and they weren't affiliated with the brand. "pg" is just two letters. like WP-engine (literally the name as they say it is "double U P engine").
And a cat and a dog is fun.
don't think they're trying to get one over on you.
Unless you have millions of users, you don't really need this. It would be nice to have but its not a pressing need. So why invest into developing something that you only need once you are at massive scale? At this point you might as well switch away from Postgres because you'll surely have the manpower to do it.
Even with a proxy like PgDog the Postgres sharding story isn't solved. Resharding with logical replication is unlikely to work with databases which are already TBs in size. I never got it to catch up, I had to sync data at the filesystem level which is terrible. Tools like pg_repack also fall apart at scale.
For those that get to a point where a sharding proxy is required, switching databases is a very appealing solution.
And for those that are almost there, application side sharding is more flexible than building a query routing proxy.
https://open.spotify.com/episode/6qgpfiW68KcvRASs6649Fb
1. Control plane to manage multi-node deployments; "works out of the box" experience to make PgDog easy to deploy and use
2. QoS (quality of service): automatically block bad queries from taking down the database
Last but not least, you get SLA-backed support from us (up to P0).
New features are broken down into two categories:
1. Sharding / running Postgres at scale: always open source.
2. Infra management / making it easy to run PgDog at scale: enterprise.
Also had an issue with it because it cached authentication requests when doing passthrough it seems, I'd changed the roles password, but it kept using the old one, which was no bueno ;).
PgDog seems to make more sense when you really care about a few databases that need massive scale, rather than a simple proxy in front of postgres. I'll keep following the development though, it is much needed in this space, postgres can use all the investment it can get to get it past the single machine scale that it excels at currently.
We'll get there.
You could also build a watcher side car that watches for changes of the pgdog_users.toml and have pgdog refresh itself then too with this combination. We thought about that but prefer to control the reloads for our needs.
1. pool exhaustion from idle connections inside open long-running transactions
2. SQLAlchemy's client-side pool using dead connections that PgBouncer had already killed, causing periodic request errors
3. Some tasks have to bypass PgBouncer when they use SET or prepared statements
I've already sharded large datasets at the application layer, but looks like PgDog solves the above problems for any future work?
I had to disable application pooling as it was causing read only transactions I could couldnt pin down the cause.
This is years of product development with a three person team. If Enterprise sales and support are a big part of your business plan it will suck up a lot more than that.
Edit: It also might be interesting to point out how your solution differs from what the folks at Planetscale are building https://planetscale.com/neki
1. Neki as you mentioned 2. PgDog 3. Multigres, headed by original creator of Vitesse
My question is, has any of them been talked about being upstreamed to postgres itself? Or, adding a custom built in feature to postgres itself?
Still trying to figure out how this works technically, is the performance gain really just re-write in rust?
Edit:
Performance gains are from having the ability to load balance reads (horizontal scaling for read queries) and scale out writes (with sharding). Once instance bottleneck in Postgres has many faces:
1. Behind schedule vacuums because of too many dead tuples (too many writes)
2. The WALWriter is single-threaded and IO-bound - Postgres can only do about 200-300MB/sec in writes per instance (real prod numbers on EC2 with NVMes and ZFS, basically best case scenario).
3. Bulkheading: single primary is a single point of failure. With 12 primaries, if one fails, 91% of your customers don't notice.
The list goes on. Rust is just a side effect. We love it because it's fast and correct - the perfect match for a database product.
Is there a binary I can run directly?
Then again, sharding on a single host probably isn't very useful anyway - but it might work with docker in swarm mode?
If you’re already sharding by tenant for other reasons, OK… But I see CDC to a true OLAP system as more scalable.
PostgreSQL still needs real columnar tables in the core, hopefully one day
The same old processes vs. threads debate, plus having the ability to scale the coordinator past a single machine. So, if you're OLTP, definitely consider PgDog. OLAP - Citus still wins because of its advanced query engine. We'll get there.
TLDR: Tokio concurrency > Process concurrency in OLTP.
Expanding on that a bit, mongo drivers even have a shared specification of the state machine for monitoring topology changes[1] and algorithm for selecting the server to send an operation to[2] (along with various declarative test cases that the drivers use to validate them alongside the specs in the repo). I think people sometimes underestimate how important the client-side work is to this sort of experience; for all of the faults mongo has had over the years, the amount of investment that they put into the client libraries is something I've never seen anywhere else (although having spent several years working on some of these libraries, my take is likely very biased).
[1]: https://github.com/mongodb/specifications/blob/master/source... [2]: https://github.com/mongodb/specifications/blob/master/source...
its probably the easiest database to run at scale. run & forget. you just have to do a little more work on the data modeling part before you write your application i.e consider your query patterns.
This kind of tool will help in this case?
You _could_ make that ACID, but it's not going to be faster than a single machine.
I don't know how the pg scaling story gets fixed unless certain things are rewritten. that's my fear of going all in pg.
mysql has vitess etc & even upgrades are easier. though pg is more extensible.
1. Let it crash. Increase the RAM, try again.
2. Page to disk (swap), make it slow but ultimately work.
Both have their trade-offs. There is no free lunch here.
Surfacing where and how PG is better than Dynamo or any other database is probably a good starting point instead of calling out PG a silver bullet for everything. At the end of the day its all a trade-off.
This solves the thousands of clients case for read in a way that is transparent to the clients.
Yes it's required at large scale, especially if you want to distribute reads or shard to a particular geographical area.
Wrt. the pooler, how do you compare with pgbouncer?
I'm interested because I have a postgres instance, low-traffic but still like ... tens of r(eads)ps. I was not running anything close to the machine limits but still added pgbouncer to improve performance and didn't see a noticeable difference. I was stress-testing the machine obv., I'm not talking about the 10 rps, lol.
For context, my numbers were something like 10k rps +/- 1k vanilla postgres and like 9k rps +/- 1k with pgbouncer in front of it. So ... slightly slower but big error bars so I wouldn't say for sure. I ended up not using pgbouncer as the benefit was immaterial.
Also yeah, in case you want to check it out, it's the db that backs this project: https://httpstate.com.
As long as they don't get undercut by the equivalent of AWS https://aws.amazon.com/rds/proxy/ which is a managed pgbouncer.
You’d need a ton of faith in these 3 people.
Feels more like it would work better inside of a bigger organization.
The QA tester in me is kinda risk adverse.
https://github.com/pgdogdev/pgdog/commit/36434f93f03dec1d7d4...
I want to have as much fun as the next developer, but that makes me worry, what with supply chain attacks in the news and all.
- Sage
In all seriousness, we review every single line of code that goes in and only people who work for PgDog Inc are allowed to merge.
i'm sure you'll get 100x comments about "why not just have one fast SSD? it can do 2000 trillion writes/s"
Not quite. The reason "DBs" like those exist is purely due to fashion. Lets not kid ourselves into thinking they do anything better, save the exception of making data hard to access, which might be a project goal in some cases.