Grafeo – A fast, lean, embeddable graph database built in Rust

(grafeo.dev)

100 points | by 0x1997 4 hours ago

11 comments

adsharma 3 hours ago
There are 25 graph databases all going me too in the AI/LLM driven cycle.
Writing it in Rust gets visibility because of the popularity of the language on HN.
Here's why we are not doing it for LadybugDB.
Would love to explore a more gradual/incremental path.
Also focusing on just one query language: strongly typed cypher.
https://github.com/LadybugDB/ladybug/discussions/141
[-]
- tadfisher 2 hours ago
  Is LadybugDB not one of these 25 projects?
  [-]
  - adsharma 2 hours ago
    LadybugDB is backed by this tech (I didn't write it)
    https://vldb.org/cidrdb/2023/kuzu-graph-database-management-...
    You can judge for yourself what work has been done in the last 5 months. Many short videos here. New open source contributors who I didn't know before ramping up.
    https://youtube.com/@ladybugdb
natdempk 20 minutes ago
Serious question: are there any actually good and useful graph databases that people would trust in production at reasonable scale and are available as a vendor or as open source? eg. not Meta's TAO
[-]
- cjlm 13 minutes ago
  Serious answer: limiting to just Open Source: JanusGraph, DGraph, Apache AGE, HugeGraph, MemGraph and ArcadeDB all meet that criteria.
- pphysch 9 minutes ago
  Yeah: Postgres, etc.
  When you actually need to run graph algorithms against your relational data, you export the subset of that data into something like Grafeo (embedded mode is a big plus here) and run your analysis.
Aurornis 3 hours ago
Does anyone have any experience with this DB? Or context about where it came from?
From the commit history it's obvious that this is an AI coded project. It was started a few months ago, 99% of commits are from 1 contributor, and that 1 contributor has some times committed 100,000 lines of code per week. (EDIT: 200,000 lines of code in the first week)
I'm not anti-LLM, but I've done enough AI coding to know that one person submitting 100,000 lines of code a week is not doing deep thought and review on the AI output. I also know from experience that letting AI code the majority of a complex project leads to something very fragile, overly complicated, and not well thought out. I've been burned enough times by investigating projects that turned out to be AI slop with polished landing pages. In some cases the claimed benchmarks were improperly run or just hallucinated by the AI.
So is anyone actually using this? Or is this someone's personal experiment in building a resume portfolio project by letting AI run against a problem for a few months?
[-]
- jandrewrogers 2 hours ago
  That is a lot of code for what appears to be a vanilla graph database with a conventional architecture. The thing I would be cautious about is that graph database engines in particular are known for hiding many sharp edges without a lot of subtle and sophisticated design. It isn't obvious that the necessary level of attention to detail has been paid here.
  [-]
  - adsharma 2 hours ago
    Are you talking about Andy Pavlo bet here?
    https://news.ycombinator.com/item?id=29737326
    Kuzu folks took some of these discussions and implemented them. SIP, ASP joins, factorized joins and WCOJ.
    Internally it's structured very similar to DuckDB, except for the differences noted above.
    DuckDB 1.5 implemented sideways information passing (SIP). And LadybugDB is bringing in support for DuckDB node tables.
    So the idea that graph databases have shaky internals stems primarily from pre 2021 incumbents.
    4 more years to go to 2030!
    [-]
    - jandrewrogers 1 hour ago
      I wasn't referring to the Pavlo bet but I would make the same one! Poor algorithm and architecture scalability is a serious bottleneck. I was part of a research program working on the fundamental computer science of high-scale graph databases ~15 years ago. Even back then we could show that the architectures you mention couldn't scale even in theory. Just about everyone has been re-hashing the same basic design for decades.
      As I like to point out, for two decades DARPA has offered to pay many millions of dollars to anyone who can demonstrate a graph database that can handle a sparse trillion-edge graph. That data model easily fits on a single machine. No one has been able to claim the money.
      Inexplicably, major advances in this area 15-20 years ago under the auspices of government programs never bled into the academic literature even though it materially improved the situation. (This case is the best example I've seen of obviously valuable advanced research that became lost for mundane reasons, which is pretty wild if you think about it.)
    - adsharma 1 hour ago
      Source: https://www.theregister.com/2023/03/08/great_graph_debate_we...
      > There are some additional optimizations that are specific to graphs that a relational DBMS needs to incorporate: [...]
      This is essentially what Kuzu implemented and DuckDB tried to implement (DuckPGQ), without touching relational storage.
      The jury is out on which one is a better approach.
  - justonceokay 2 hours ago
    Yes a graph database will happily lead you down a n^3 (or worse!) path when trying to query for a single relation if you are not wise about your indexes, etc.
    [-]
    - cluckindan 54 minutes ago
      That sounds like a ”graph” DB which implements edges as separate tables, like building a graph in a standard SQL RDB.
      If you wish to avoid that particular caveat, look for a graph DB which materializes edges within vertices/nodes. The obvious caveat there is that the edges are not normalized, which may or may not be an issue for your particulat application.
    - adsharma 2 hours ago
      Are you talking about the query plan for scanning the rel table? Kuzu used a hash index and a join.
      Trying to make it optional.
      Try
      explain match (a)-[b]->(c) return a.rowid, b.rowid, c.rowid;
- ozgrakkurt 53 minutes ago
  Using a LLM coded database sounds like hell considering even major databases can have some rough edges and be painful to use.
- arthurjean 1 hour ago
  Sounds about right for someone who ships fast and iterates. 54 days for a v0 that probably needs refactoring isn't that crazy if the dev has a real DB background. We've all seen open source projects drag on for 3 years without shipping anything, that's not necessarily better
- gdotv 2 hours ago
  Agreed, there's been a literal explosion in the last 3 months of new graph databases coded from scratch, clearly largely LLM assisted. I'm having to keep track of the industry quite a bit to decide what to add support for on https://gdotv.com and frankly these days it's getting tedious.
  [-]
  - aorth 28 minutes ago
    Figurative!
  - piyh 59 minutes ago
    I'm turning off my brain and using neo4j
cjlm 27 minutes ago
Overwhelmed by the sheer number of graph databases? I released a new site this week that lists and categorises them. https://gdb-engines.com
satvikpendem 3 hours ago
There seem to be a lot of these, how does it compare to Helix DB for example? Also, why would you ever want to query a database with GraphQL, for which it was explicitly not made for that purpose?
cluckindan 58 minutes ago
The d:Document syntax looks so happy!
OtomotO 1 hour ago
Interesting... Need to check how this differs from agdb, with which I had some success for a sideproject in the past.
https://github.com/agnesoft/agdb
Ah, yeah, a different query language.
nexxuz 1 hour ago
I was ready to learn more about this but I saw "written in Rust" and I literally rolled my eyes and said never mind.
[-]
- chuckadams 17 minutes ago
  Too bad you don't do the same for commenting on HN.
- ComputerGuru 53 minutes ago
  I think "written by genAI" should be a bigger turnoff than "written in Rust".
  [-]
  - andriy_koval 27 minutes ago
    alternative opinion:
    * it is possible to write high quality software using GenAI
    * not using GenAI could mean project won't be competitive in current landscape
    [-]
    - quantumHazer 9 minutes ago
      > not using GenAI could mean project won't be competitive in current landscape
      why? this is false in my opinion, iterating fast is not a good indicator of quality nor competitiveness
measurablefunc 2 hours ago
This looks like another avant-garde "art" project.
takahitoyoneda 14 minutes ago
[dead]
aplomb1026 1 hour ago
[dead]