A “frozen” dictionary for Python

(lwn.net)

179 points | by jwilk 12 hours ago

29 comments

  • pansa2 9 hours ago
    I wonder whether Raymond Hettinger has an opinion on this PEP. A long time ago, he wrote: "freezing dicts is a can of worms and not especially useful".

    https://mail.python.org/pipermail/python-dev/2006-February/0...

    • pkulak 3 hours ago
      This is why I love how Rust approached this; almost by accident to make borrow checking work. Every reference is either mutable or not, and (with safe code), you can't use an immutable reference to get a mutable reference anywhere down the chain. So you can slowly construct a map through a mutable reference, but then return it out of a function as immutable, and that's the end of it. It's no longer ever mutable, and no key or value is either. There's no need to make a whole new object called FrozenHashMap, and then FrozenList, and FrozenSet, etc. You don't need a StringBuilder because String is mutable, unless you don't want it to be. It's all just part of the language.

      Kotlin _kinda_ does this as well, but if you have a reference to an immutable map in Kotlin, you are still free to mutate the values (and even keys!) as much as you like.

      • vlovich123 1 hour ago
        You cannot return an immutable version. You can return it owned (in which case you can assign/reassign it to a mut variable at any point) or you can take a mut reference and return an immutable reference - but whoever is the owner can almost always access it mutably.
        • pkulak 1 hour ago
          Arg, you’re right. Not sure what I was thinking there. I still think my point stands, because you get the benefits of immutability, but yeah, I didn’t explain it well.
        • tekne 41 minutes ago
          I mean, if you return an immutable reference, the owner in fact cannot mutate it until that reference is dropped.

          If you in fact return e.g. an Rc::new(thing) or Arc::new(thing), that's forever (though of course you can unwrap the last reference!)

      • rcxdude 1 hour ago
        Only if you're returning a reference or wrapping it in something that will only ever return a reference. If you return an object by value ('owned'), then you can do what you like with it and 'mut' is just an light guardrail on that particular name for it.
      • the__alchemist 3 hours ago
        My favorite part of rust is its explicit control over mutability, in the manner you describe.
    • jonathaneunice 9 hours ago
      That's a great link and recommended reading.

      It explains a lot about the design of Python container classes, and the boundaries of polymorphism / duck typing with them, and mutation between them.

      I don't always agree with the choices made in Python's container APIs...but I always want to understand them as well as possible.

      Also worth noting that understanding changes over time. Remember when GvR and the rest of the core developers argued adamantly against ordered dictionaries? Haha! Good times! Thank goodness their first wave of understanding wasn't their last. Concurrency and parallelism in Python was a TINY issue in 2006, but at the forefront of Python evolution these days. And immutability has come a long way as a design theme, even for languages that fully embrace stateful change.

      • zahlman 9 hours ago
        > Also worth noting that understanding changes over time. Remember when GvR and the rest of the core developers argued adamantly against ordered dictionaries? Haha! Good times!

        The new implementation has saved space, but there are opportunities to save more space (specifically after deleting keys) that they've now denied themselves by offering the ordering guarantee.

        • jonathaneunice 8 hours ago
          Ordering, like stability in sorting, is an incredibly useful property. If it costs a little, then so be it.

          This is optimizing for the common case, where memory is generally plentiful and dicts grow more than they shrink. Python has so many memory inefficiencies that occasional tombstones in the dict internal structure is unlikely to be a major effect. If you're really concerned, do `d = dict(d)` after aggressive deletion.

          • zahlman 8 hours ago
            > Ordering, like stability in sorting, is an incredibly useful property.

            I can't say I've noticed any good reasons to rely on it. Didn't reach for `OrderedDict` often back in the day either. I've had more use for actual sorting than for preserving the insertion order.

            • xen0 3 hours ago
              It's sometimes nice to be deterministic.

              I don't often care about a specific order, only that I get the same order every time.

              • no_wizard 3 hours ago
                Thinking about this upfront for me, I am actually wondering why this is useful outside of equality comparisons.

                Granted, I live and work in TypeScript, where I can't `===` two objects but I could see this deterministic behavior making it easier for a language to compare two objects, especially if equality comparison is dependent on a generated hash.

                The other is guaranteed iteration order, if you are reliant on the index-contents relationship of an iterable, but we're talking about Dicts which are keyed, but extending this idea to List, I see this usefulness in some scenarios.

                Beyond that, I'm not sure it matters, but I also realize I could simply not have enough imagination at the moment to think of other benefits

                • dontlaugh 1 hour ago
                  Being able to parse something into a dict and then serialise it back to the same thing is a bit easier. Not a huge advantage, though.
                • xen0 3 hours ago
                  I work on a build system (Bazel), so perhaps I care more than most.

                  But maybe it does all just come down to equality comparisons. Just not always within your own code.

            • mcherm 4 hours ago
              Personally, I find lots of reasons to prefer an orders Dict to an unordered one. Even small effects like "the debugging output will appear in a consistent order making it easier to compare" can be motivation enough in many use cases.
            • kzrdude 2 hours ago
              It seems like opinions really differ on this item then. I love insertion sort ordering in mappings, and python with it was a big revelation. The main reason is that keys need some order, and insertion order -> iteration order is a lot better than pseudorandom order (hash based orders).

              For me, it creates more reproducible programs and scripts, even simple ones.

            • morshu9001 4 hours ago
              Same. Recently I saw interview feedback where someone complained that the candidate used OrderedDict instead of the built-in dict that is now ordered, but they'll let it slide... As if writing code that will silently do different things depending on minor Python version is a good idea.
              • tehnub 4 hours ago
                Well it's been guaranteed since 3.7 which came out in 2018, and 3.6 reached end-of-life in 2021, so it's been a while. I could see the advantage if you're writing code for the public (libraries, applications), but for example I know at my job my code is never going to be run with Python 3.6 or older.
                • morshu9001 2 hours ago
                  Yeah, if you have that guarantee then I wouldn't fault anyone for using dict, but also wouldn't complain about OrderedDict.
            • vanviegen 7 hours ago
              Indeed! I don't understand why it isn't more common for stdlibs to include key-ordered maps and sets. Way more useful than insertion ordering.
              • zahlman 6 hours ago
                Presumably because it involves different performance characteristics.
            • BiteCode_dev 4 hours ago
              Ordering is very useful for testing.

              This morning for example, I tested an object serialized through a JSON API. My test data seems to never match the next run.

              After a while, I realized one of the objects was using a set of objects, which in the API was turned into a JSON array, but the order of said array would change depending of the initial Python VM state.

              3 days ago, I used itertools.group by to group a bunch of things. But itertools.group by only works on iterable that are sorted by the grouping key.

              Now granted, none of those recent example are related to dicts, but dict is not a special case. And it's iterated over regularly.

          • seanhunter 6 hours ago
            Ordering is specifically a property (useful or not) that a set doesn't have. You need a poset for it to be ordered.

            I would expect to use a different data structure if I needed an ordered set.

          • LtWorf 6 hours ago
            Does your code actually rely on that? I've never once needed it.
    • mvanbaak 9 hours ago
      This was 19 (almost) 20 years ago. As stated in the lwn.net article, a lot of concurrency has been added to python, and it might now be time for something like a frozendict.

      Things that were not useful in 2006 might be totally useful in 2026 ;P

      Still, like you, I'm curious wether he has anything to say about it.

    • dkarl 8 hours ago
      It's interesting that he concludes that freezing dicts is "not especially useful" after addressing only a single motivation: the use of a dictionary as a key.

      He doesn't address the reason that most of us in 2025 immediately think of, which is that it's easier to reason about code if you know that certain values can't change after they're created.

      What a change in culture over the last 20 years!

      • morshu9001 4 hours ago
        You can't really tell though. Maybe the dict is frozen but the values inside aren't. C++ tried to handle this with constness, but that has its own caveats that make some people argue against using it.
        • krick 2 hours ago
          Indeed. So I don't really understand what this proposal tries to achieve. It even explicitly says that dict → frozendict will be O(n) shallow-copy, and the contention is only about O(n) part. So… yeah, I'm sure they are useful for some cases, but as Raymond has said — it doesn't seem to be especially useful, and I don't understand what people ITT are getting excited about.
          • morshu9001 42 minutes ago
            Maybe treating Python like a systems language, so applying the same reasoning for const in C++ and Rust to it
    • morshu9001 4 hours ago
      I agree, same with frozenset. If you really want to use one of those as a key, convert to a tuple. There might be niche use cases for all this, but it's not something that the language or even the standard lib need to support.
      • boothby 4 hours ago
        Problem being that sets aren't consistently ordered and conversion to a tuple can result in an exponential (specifically, factorial) explosion in the number of possible keys associated with a single set. Nor can you sort all objects. Safe conversion of sets to tuples for use as keys is possible but the only technique I know requires an auxiliary store of objects (mapping objects to the order in which they were first observed), which doesn't parallelize well.
        • morshu9001 3 hours ago
          tuple(sorted(s)) and if you can't even sort the values, they're probably not hashable. I get that this involves a copy, but so does frozenset, and you can cross that bridge in various ways if it's ever a problem.
          • boothby 2 hours ago
            Here are some types that support hashing:

              str
              bytes
              int, float
              complex
              tuple
              frozenset
            
            Aside from int and float, you cannot perform comparisons between objects of different types. Moreover, you cannot sort complex numbers at all.

            I have crossed that bridge, and I'm telling you (again) that a sorted tuple is not a generic solution.

            • morshu9001 2 hours ago
              I'm not saying the problem with tuple doesn't exist, but that there doesn't need to be a built-in way to deal with it. If for some unfortunate reason you've got a mixed-type set that you also want to use as a dict key, you can write a helper.
    • zahlman 9 hours ago
      > Another PEP 351 world view is that tuples can serve as frozenlists; however, that view represents a Liskov violation (tuples don't support the same methods). This idea resurfaces and has be shot down again every few months.

      ... Well, yes; it doesn't support the methods for mutation. Thinking of ImmutableFoo as a subclass of Foo is never going to work. And, indeed, `set` and `frozenset` don't have an inheritance relationship.

      I normally find Hettinger very insightful so this one is disappointing. But nobody's perfect, and we change over time (and so do the underlying conditions). I've felt like frozendict was missing for a long time, though. And really I think the language would have been better with a more formal concept of immutability (e.g. linking it more explicitly to hashability; having explicit recognition of "cache" attributes, ...), even if it didn't go the immutable-by-default route.

      • kccqzy 7 hours ago
        Apple (or perhaps NeXT) has solved this problem already in Objective-C. Look at NSArray and NSMutableArray, or NSData and NSMutableData. It’s intuitive and Liskov-correct to make the mutable version a subclass of the immutable version. And it’s clearly wrong to have the subclass relationship the other way around.

        Given how dynamic Python is, such a subclass relationship need not be evident at the C level. You can totally make one class whose implementation is independent of another class a subclass of the other, using PEP 3119. This gives implementations complete flexibility in how to implement the class while retaining the ontological subclass relationship.

      • pansa2 8 hours ago
        > ImmutableFoo as a subclass of Foo is never going to work. And, indeed, `set` and `frozenset` don't have an inheritance relationship.

        Theoretically, could `set` be a subclass of `frozenset` (and `dict` of `frozendict`)? Do other languages take that approach?

        > linking [immutability] more explicitly to hashability

        AFAIK immutability and hashability are equivalent for the language's "core" types. Would it be possible to enforce that equivalence for user-defined types, given that mutability and the implementation of `__hash__` are entirely controlled by the programmer?

        • kccqzy 7 hours ago
          Yes you could. Other languages do. See NSMutableSet and NSSet in Objective-C.
        • chriswarbo 8 hours ago
          > Theoretically, could `set` be a subclass of `frozenset` (and `dict` of `frozendict`)?

          At one extreme: sure, anything can be made a subclass of anything else, if we wanted to.

          At the other extreme: no, since Liskov substitution is an impossibly-high bar to reach; especially in a language that's as dynamic/loose as Python. For example, consider an expression like '"pop" in dir(mySet)'

          • tremon 8 hours ago
            > consider an expression like '"pop" in dir(mySet)'

              class frozenset:
                pass
              
              class set(frozenset):
                def pop(self, key):
                  pass
            
            I don't see why hasattr(mySet, 'pop') should be a problem here?
            • chriswarbo 6 hours ago
              > I don't see why hasattr(mySet, 'pop') should be a problem here?

              I never said it's a problem (and I never said it's not!). I was specifically addressing two things:

              - The "theoretical" nature of the question I quoted (i.e. ignoring other aspects like subjectivity, practicality, convention, etc.)

              - The reasoning about "Liskov violation", which was quoted further up this thread.

              For context, here's Liskov's definition of their principle (from https://en.wikipedia.org/wiki/Liskov_substitution_principle ):

              > Barbara Liskov and Jeannette Wing described the principle succinctly in a 1994 paper as follows:[1]

              > > Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.

              My expression `"pop" in dir(mySet)` gives an explicit example of how `set` and `frozenset` are not subtypes of each other (regardless of how they're encoded in the language, with "subclasses" or whatever). In this case `ϕ(x)` would be a property like `'"pop" in dir(x)' = 'False'`, which holds for objects x of type frozenset. Yet it does not hold for objects y of type set.

              Your example of `hasattr(mySet, 'pop')` gives another property that would be violated.

              My point is that avoiding "Liskov violations" is ("theoretically") impossible, especially in Python (which allows programs to introspect/reflect on values, using facilities like 'dir', 'hasattr', etc.).

              (FYI I became rather jaded on the Liskov substitution principle after reading https://okmij.org/ftp/Computation/Subtyping )

              • kccqzy 2 hours ago
                > I became rather jaded on the Liskov substitution principle after reading https://okmij.org/ftp/Computation/Subtyping

                The root of the issue here is that Liskov substitution principle simply references ϕ(x) to be some property satisfied by objects of a class. It does not distinguish between properties that are designed by the author of the class to be satisfied or properties that happen to be satisfied in this particular implementation. But the Hyrum’s Law also states that properties that are accidentally true can become relied upon and as time passes become an intrinsic property. This to me suggests that the crux of the problem is that people don’t communicate sufficiently about invariants and non-invariants of their code.

              • tremon 6 hours ago
                > > Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.

                This says "if hasattr(parent, 'pop') == True then hasattr(child, 'pop') must be True". This is not violated in this case, since hasattr(parent, 'pop') is False. If you want to extend the above definition so that negative proofs concerning the parent should also hold true for the child, then subtyping becomes impossible since all parent and child types must be identical, by definition.

                • minitech 1 hour ago
                  The property in question is `hasattr(x, "pop") is False`.

                  > If you want to extend the above definition so that negative proofs concerning the parent should also hold true for the child, then subtyping becomes impossible since all parent and child types must be identical, by definition.

                  The distinction isn’t “negative proofs”, but yes, that’s their point. In Python, you have to draw a line as to which observable properties are eligible.

      • tmp10423288442 3 hours ago
        And, to the point of this proposal, `dict` and `frozendict` don't have an inheritance relationship either.
      • immibis 2 hours ago
        ImmutableFoo can't be a subclass of Foo, since it loses the mutator methods. But nor can Foo be a subclass of ImmutableFoo, since it loses the axiom of immutability (e.g. thread-safety) that ImmutableFoo has.

        When you interpret Liskov substitution properly, it's very rare that anything Liskov-substitutes anything, making the entire property meaningless. So just do things based on what works best in the real world and aim for as much Liskov-substitution as is reasonable. Python is duck-typed anyway.

        It's a decent guiding principle - Set and ImmutableSet are more substitutable than Set and Map, so Set deriving from ImmutableSet makes more sense than Set deriving from Map. It's just not something you can ever actually achieve.

    • ndr 7 hours ago
      Immutability it's a joy to work with. Ask anyone who's worked with Clojure's dicts.
  • perrygeo 7 hours ago
    Python discovers immutable data, and gets it wrong. Frozendict is a blunt instrument - instead of a mutable free-for-all, it just locks the data for the entire lifecycle. Brace for the wave of code littered with deep copies, proudly proclaiming how functional it all is.

    If you want real immutable data structures, not a cheap imitation, check out pyrsistent.

    • cylemons 7 hours ago
      How else would you "modify" immutable data other than by copying it?
      • ndr 7 hours ago
        You don't have to copy everything.

        Look up persistent data structures [0]

        The main trick is to store everything in a tree with a large branching factor.

        The elements are on your leaves. When you want to make an edit you need to create only the nodes from the root to the actual leaf. Then you return the new root. Everything else you can share.

        This is a log operation, normally implemented with branch 32.

        It works with vectors as well as dicts. Creating a copy is constant, it's immutable can be shared freely (especially on GC'd languages).

        Insert/Delete are O(log32(n)) instead of O(n) they'd be with a full copy.

        [0] https://en.wikipedia.org/wiki/Persistent_data_structure#Pers...

      • thechao 7 hours ago
        And, yet, somehow, functional languages have been doing this since forever? I jest, I jest! There's an entire field of study focused on this! I'm no expert, but imagine you've got a singly-linked list, and we're going to allow only push_back() and pull_back() (no insert()). Let's go ahead and let multiple owners have at this list. If A does "pull_back()" what really happens is A has a local "tail" pointer that moves back along the end of the list and has a new belief about the end of the list. When A does "push_back()" it begins attaching links at its new tail. Since the links just "point backwards" without modifying the old links, this doesn't mutate the list "as is", it only mutates A's version of the list. So, if B is also looking at the list, it could just start doing "push_back()" and add its own "tail". The result is a data-structure that's a "bunch of shared singly linked lists" in a bush structure that's more akin to a tree than a singly linked list.

        You can do the same thing with binary trees and other structures. It's more fiddly, and there's definitely places where single-ownership allows mutability "under the hood" for real performance gains, but that's the basic idea.

        • cylemons 2 hours ago
          Interesting, I didn't think about that, so it is copying on write but on a more granular level
    • shadowgovt 7 hours ago
      If your dicts are frozen, you shouldn't need to deep-copy. The point of immutability is that if you want a new frozendict based on another one, you just rebuild the indirection data structure up top and leave the values it references alone.
      • perrygeo 5 hours ago
        You're absolutely right: an "indirection data structure" is necessary. Freezing the data is the least interesting part - it doesn't give you any of the benefits typically associated with immutable data structures in functional languages. That's my point - Python is shipping a half solution that's being mistaken for a proper one.

        You think Python developers are going to roll their own HAMT on top of frozendicts? Or are they just gonna make copies? Personally, I'd just use pyrsistent which seems to get it right.

        • rented_mule 4 hours ago
          It's a little early to say that "Python is shipping" anything related to this. The PEP is still in draft status and it may be modified (as it was three hours ago) or even rejected. The article says it is likely to be accepted, but that has not happened yet. That also means there is time to comment on the PEP in the discussion it links to if you'd like.

          https://peps.python.org/pep-0814/

        • BiteCode_dev 4 hours ago
          pyrsistent is super slow, though. Just ran a quick benchmark:

            - Creation - 8-12x slower  
            - Lookup - 22-27x slower  
            - Contains check - 30-34x slower  
            - Iteration - 5-14x slower  
            - Merge - 32-158x slower  
           
          Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.

          This is rarely the case in practice, most dictionaries and dict operations are small, if you have a huge dict, you probably should be chunking your load or delegating that to infra.

          Not to mention pyrsistent's API is incompatible with dicts, so you can't pass it to external code without conversion.

          You'd better have an incredible ROI to justify that.

          • instig007 2 hours ago
            > pyrsistent is super slow, though

            Since when is Python about speed?

            > Just ran a quick benchmark

            Where's the code? Have you observed the bottleneck call?

            > Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.

            > This is rarely the case in practice

            Where's the stats on the actual practice?

            > You'd better have an incredible ROI to justify that.

            The ROI being: fearless API design where 1) multiple instances of high level components are truly independent and could easily parallelize, 2) calling sites know that they keep the original data intact and that callees behave within the immutability constraints, 3) default func inputs and global scope objects are immutable without having to implement another PEP, 4) collections are hashable in general.

            • BiteCode_dev 39 minutes ago
              Clearly the ROI is perfect for you.

              I won't waste more of your time.

  • polyrand 10 hours ago
    A frozen dictionary would be very welcome. You can already do something similar using MappingProxyType [0]

      from types import MappingProxyType
      
      d = {}
      
      d["a"] = 1
      d["b"] = 2
      
      print(d)
      
      frozen = MappingProxyType(d)
      
      print(frozen["a"])
      
      # Error:
      frozen["b"] = "new"
    
    
    [0]: https://docs.python.org/3/library/types.html#types.MappingPr...
    • zahlman 8 hours ago
      > You can already do something similar

      Only if you deny access to the underlying real dict.

      • ali_m 8 hours ago
        Yes, this only prevents the callee from mutating it, it can't provide a strong guarantee that the underlying mapping won't be changed upstream (and hence MappingProxyType can't be washable).
  • sevensor 8 hours ago
    Concurrency is a good motivation, but this is super useful even in straight line code. There’s a huge difference between functions that might mutate a dictionary you pass in to them and functions that definitely won’t. Using Mapping is great, but it’s a shallow guarantee because you can violate it at run time.
  • code_biologist 11 hours ago
    Shoutout to pyrsistent, a personal favorite library for frozen/immutable/hashable data structures whenever I need to use lists, sets, or dicts as dict keys, or make sets of such items: https://github.com/tobgu/pyrsistent
  • sundarurfriend 10 hours ago
    Can someone ELI5 the core difference between this and named tuples, for someone who is not deep into Python? ChatGPT's answer boiled down to: unordered (this) vs ordered (NTs), "arbitrary keys, decided at runtime" vs "fixed set of fields decided at definition time" (can't an NT's keys also be interpolated from runtime values?), and a different API (`.keys()`, `.items()`), etc (I'm just giving this as context btw, no idea if there's inaccuracies in these).

    So could this also have been approached from the other side, as in making unordered NamedTuples with support for the Mapping API? The line between dictionaries and named tuples and structs (across various languages) has always seemed a bit blurry to me, so I'm trying to get a better picture of it all through this.

    • quietbritishjim 9 hours ago
      > arbitrary keys, decided at runtime" vs "fixed set of fields decided at definition time" (can't an NT's keys also be interpolated from runtime values?)

      If you want to create a named tuple with arbitrary field names at runtime then you need to create a new named tuple type before you create the instance. That is possible, since Python is a very dynamic language, but it's not particularly efficient and, more importantly, just feels a bit wrong. And are you going to somehow cache the types and reuse them if the field names match? It's all a bit of a mess compared to just passing the entries to the frozendict type.

    • _flux 9 hours ago
      The key difference is what the GPT outlined: tuples have order for the fields and named tuples are not indexed by the name, but instead a field accessor is used, i.e. foo.bar vs foo["bar"]. In addition namedtuples can be indexed using that order like tuples can (foo[0]), which clearly isn't possible with dicts and would be quite confusing if dict had integer key.

      So I think the differences aren't great, but they are sufficient. A frozendict is not going to be indexable by an integer. Python already has an abstract type for this, for mostly the use of type checking I imagine: https://docs.python.org/3/glossary.html#term-mapping

      Documentation for namedtuple: https://docs.python.org/3/library/collections.html#collectio...

    • minitech 1 hour ago
      On top of generating types dynamically being slow and bad as quietbritishjim said, “str values that also happen to be valid identifiers” is a very limiting dict key requirement.
    • willvarfar 9 hours ago
      The values in tuples cannot change. The values that keys point to in a frozen dict can?

      But yeah I'd be in favour of something that looked a lot like a named tuple but with mutable values and supporting [name] access too. And of course some nice syntactic sugar rather like dicts and sets have with curly brackets today.

      • pansa2 9 hours ago
        > The values in tuples cannot change. The values that keys point to in a frozen dict can?

        The entries of a tuple cannot be assigned to, but the values can be mutated. The same is true for a `frozendict` (according to the PEP they don't support `__setitem__`, but "values can be mutable").

        • vscode-rest 8 hours ago
          Tuple entries must be hashable, which (as far as standard library is concerned) means immutable.
          • pansa2 8 hours ago

              >>> hash([1, 2])
              TypeError: unhashable type: 'list'
            
              >>> t = ([1, 2], [3, 4])
              >>> print(t)
              ([1, 2], [3, 4])
            • vscode-rest 8 hours ago
              Ah. Of course… that’s how the workaround to use tuples as frozen dicts can work in the first place. Slow morning for me!
    • grimgrin 10 hours ago
      I think you could have asked this same comment w/o mentioning ChatGPT and you wouldn't have been downvoted to oblivion in 3 minutes

      I don't see anything wrong with your asking to understand

      • chistev 9 hours ago
        This place hates ChatGPT and AI. Lol.

        Edit: Of course, I get down voted as I predicted I would. Lol.

        • acdha 9 hours ago
          This place hates laziness and imprecision. Using ChatGPT for editing or inspiration is okay as long as you personally review the results for accuracy and completeness, at which point people care about it as much as you announcing that you used a spell checker.
          • delaminator 9 hours ago
            Pasting chat GPT responses is against the site rules.

            always has been even before GPT

            https://news.ycombinator.com/item?id=46206457

            • acdha 9 hours ago
              Fair, because they’re not your words. I’ll edit my comment for what I had in mind: that it can be helpful for that like a spell checker - for example, I know non-native English speakers who find them useful for editing but they completely understand the topic intellectually.
        • grimgrin 7 hours ago
          you made a non-comment, that's why you're downvoted

          comments don't have to be substantial, but they should be adding something that might merit more responses, or could. yours does not. what kind of reply could you even give to "this place [and so, its users] hates [thing]"

          I'd ask you to qualify it, but you can't really qualify such a statement

  • rurban 5 hours ago
    > so having a safe way to share dictionaries between threads will be a boon

    Since only the keys are const, the values not, frozendict is not thread-safe per se. There needs to be a small lock around the value getter and setter.

    • varelaz 5 hours ago
      it's thread safe on operations on the dict but not on the values. Same relates to other immutable structures like tuples. Lock will not help here cause unsafety comes from operation on value after value is obtained.
  • vlovich123 1 hour ago
    I’m curious why the dict->frozendict operation is not an O(1) operation when there’s a refcnt of 1 on the dict - that resolves the spooky action at a distance problem raised and solves the performance for the most common usage pattern (build up the dict progressively and convert to a frozendict for concurrent reads).
  • bilsbie 9 hours ago
    Wow weird Mandela effect for me. I really remember this being a built and actually using it.
    • aewens 8 hours ago
      You may be thinking of the `frozenset()` built in or the third party Python module [frozendict](https://pypi.org/project/frozendict/)?

      Personally, I’ve been using a wrapper around `collections.namedtuple` as an underlying data structure to create frozen dictionaries when I’ve needed something like that for a project.

      • guidopallemans 8 hours ago
        When you are making str -> Any dictionaries it's quite likely you're better off with dataclasses or namedtuples anyway.
        • TheFlyingFish 7 hours ago
          That works if you're dealing with a known set of keys (i.e. what most statically-typed languages would call a struct). It falls down if you need something where the keys are unknowable until runtime, like a lookup table.

          I do like dataclasses, though. I find them sneaking into my code more and more as time goes on. Having a declared set of properties is really useful, and it doesn't hurt either that they're syntactically nicer to use.

    • pansa2 9 hours ago
      There was a previous PEP (in 2012) with the exact same title:

      https://peps.python.org/pep-0416/

      Also one in 2019 for a "frozenmap":

      https://peps.python.org/pep-0603/

    • Qem 9 hours ago
      Perhaps you used the frozen dict implementation from the pip installable boltons library: https://boltons.readthedocs.io/en/latest/dictutils.html#bolt...
    • hiddencost 9 hours ago
  • Strilanc 7 hours ago
    This is a type that I would use a lot.

    For example, I often write classes that do cacheable analysis that results in a dict (e.g. the class stores a list of tiles defined by points and users want a point-to-tiles mapping for convenience). It's worth caching those transformations, e.g. using @functools.cached_property, but this introduces a risk where any caller could ruin the returned cached value by editing it. Currently, I take the safety hit (cache a dict) instead of the performance hit (make a new copy for each caller). Caching a frozendict would be a better tradeoff.

    • clickety_clack 7 hours ago
      Maybe you should take a look at pyrsistent. That allows you to make frozen “maps”. You can “evolve” them into new versions of the objects and it keeps the references to the unchanged keys and values under the hood so it’s fast and memory efficient.
  • mwsherman 8 hours ago
    I’ve found C#’s frozen dictionary to be useful: https://learn.microsoft.com/en-us/dotnet/api/system.collecti...

    It’s optimized for fast reads in exchange for expensive creation.

  • codethief 11 hours ago
    Next step: Auto-inferring the correct (most narrow) TypedDict type from a frozendict. (Think `const foo = { … } as const` in TypeScript.) I miss this TS feature in Python on the daily.
    • mrweasel 10 hours ago
      What would that do exactly? Auto-generate a TypedDict class?

      Bringing up TypedDict is sort of interesting, because it seems like a frozen dictionary could have been implemented by PEP 705, if typing.ReadOnly was enforced at runtime, and not just a hint to a static type checker.

      • dmurray 10 hours ago
        Auto-generate a TypedDict type while type checking, while doing nothing at runtime.

        I expect this is not a very big ask and the various typecheckers will have versions of this soon after release.

        • codethief 6 hours ago
          After what release? Type checkers already support the various TypedDict PEPs, even those that haven't been accepted yet. Yet, none of them infers TypedDict types.
        • mrweasel 9 hours ago
          Okay so basically just "inline":

            class MyType(TypedDict):
              a: str
              b: int
          
          or infer the type hint in:

            my_typed_dict: dict[str, int] = {"a": 5}
          
          It should probably be something like:

            auto_typed: dict[typing.Const, typing.Const] = {"a": 5}
          
          where typing.Const defaults to Any for an empty dict.
          • codethief 6 hours ago
            Not sure we're talking about the same thing. Inline TypedDicts are already in the process of being formalized, see https://peps.python.org/pep-0764/ , and have experimental support in e.g. Pyright.

            What I meant was that

              foo = {"a": 5}
            
            should be inferred as

              foo: TypedDict[{ "a": Literal[5] }] = {"a": 5}
            • mrweasel 1 hour ago
              Ah, I didn't know about PEP 795. I don't really know if I like it though. It looks like something someone invented because they don't want to write classes or specifically want to be able to access a data structure like a dict, for whatever reason.

              It basically provides data types, but also ensures that you have no easy way to reuse them, and it will miss cases where two data structures happen to have the same signature.

              It's getting a little messy when we have class, namedtuple and TypedDict, which all can do much the same thing.

    • adzm 10 hours ago
      Curious what this means for typescript as well; honestly I've only reached for as const rarely.
    • anentropic 9 hours ago
      Agreed... but it shouldn't need a frozendict for this

      IMHO TypedDict in Python are essentially broken/useless as is

      What is needed is TS style structural matching, like a Protocol for dicts

    • virtue3 10 hours ago
      I also SUPER LOVE this feature. Especially when you make a type union of the keys for easier indexing.
    • tweakimp 10 hours ago
      Can you give a Python example of a use case for this?
  • bvrmn 3 hours ago
    It would be great to have **kwargs as frozendict by default. It could help with caching decorators to speedup key construction. For example natural key is simply `(args, kwargs)`.
  • drhagen 10 hours ago
    If this gets wide enough use, they could add an optimization for code like this:

        n = 1000
        a = {}
        for i in range(n):
            a[i] = str(i)
        a = frozendict(a)  # O(n) operation can be turned to O(1)
    
    It is relatively easy for the JIT to detect the `frozendict` constructor, the `dict` input, and the single reference immediately overwritten. Not sure if this would ever be worth it.
    • _flux 9 hours ago
      Wouldn't ref-counting CPython already know that a has a single reference, allowing this optimization without needing any particular smart JIT?
      • zbentley 8 hours ago
        I think GP was talking about optimizing away the O(N) call on the last line. The GC will take care of removing the reference to the old (mutable) dict, but constructing a new frozendict from a mutable dict would, in the current proposal, be an O(N) shallow copy.

        There are also potentially other optimizations that could be applied (not specific to dict/frozendict) to reduce the memory overhead on operations like "a = f(a)" for selected values of "f".

      • tracnar 2 hours ago
        Indeed, the Tcl implementation does this so e.g. `set d [dict] ; dict set d key value` can modify d in place instead of creating a copy (since everything is immutable).
      • zahlman 8 hours ago
        First thought: I would very much expect it to be able to do this optimization given the similar things it does for string concatenation.

        But actually, I suspect it can't do this optimization simply because the name `frozendict` could be shadowed.

  • steadyelk 6 hours ago
    Agree it's about time to make this built in. Other functional packages like JAX [0] are already using the concept but they build it into their library from scratch.

    [0] https://flax.readthedocs.io/en/latest/api_reference/flax.cor...

  • the__alchemist 7 hours ago
    I wish Python could move away from types-as-mutability-determiners. Or paper over them; all could have mutable and non-mutable variants, with "to_mut()" and "to_immut()" methods. And in the same vein, explicit control over whether functions mutate a variable in place, or make a local copy. Python is my first lang, and I still am unable to consistently reason about these.
  • ok123456 7 hours ago
    Ruby has had frozen dictionaries since version 1.0, and they are generally considered a good thing to use where possible.
  • zzzeek 9 hours ago
    SQLAlchemy has its own frozendict which we've had in use for many years, we have it as a pretty well performing cython implementation these days, and I use it ubiquitously. Would be a very welcome addition to the stdlib.

    This proposal is important enough that I chimed in on the thread with a detailed example of how SQLAlchemy uses this pattern:

    https://discuss.python.org/t/pep-814-add-frozendict-built-in...

  • xamuel 8 hours ago
    This subject always seems to get bogged down in discussions about ordered vs. unordered keys, which to me seems totally irrelevant. No-one seems to mention the glaring shortcoming which is that, since dictionary keys are required to be hashable, Python has the bizarre situation where dicts cannot be dict keys, as in...

    {{'foo': 'bar'}: 1, {3:4, 5:6}: 7}

    ...and there is no reasonable builtin way to get around this!

    You may ask: "Why on earth would you ever want a dictionary with dictionaries for its keys?"

    More generally, sometimes you have an array, and for whatever reason, it is convenient to use its members as keys. Sometimes, the array in question happens to be an array of dicts. Bang, suddenly it's impossible to use said array's elements as keys! I'm not sure what infuriates me more: said impossibility, or the python community's collective attitude that "that never happens or is needed, therefore no frozendict for you"

    • kzrdude 8 hours ago
      Turning a dictionary into a tuple of tuples `((k1, v1), (k2, v2), ...)`; isn't that a reasonable way?

      If you want to have hash map keys, you need to think about how to hash them and how to compare for equality, it's just that. There will be complications to that such as floats, which have a tricky notion of equality, or in Python mutable collections which don't want to be hashable.

      • xamuel 7 hours ago
        That argument would apply to sets too, and yet frozenset is builtin.
  • westurner 3 hours ago
    PMap and PVector, functional Python libraries,

    "PEP 351 – The freeze protocol" (2005, rejected) https://peps.python.org/pep-0351/ ; IIUC the freeze protocol proposed basically:

      def freeze(obj)
        return cache.setsefault(hash(obj),
          obj.__freeze__())
    
    
    /? "Existing threads re: consts and applications thereof"

    I wasn't able to find a URL to this post (2021) from the python-ideas mailing list archives using a double quoted search term today; I had to use the python mailing list search engine? Did something break crawling of mailing lists? Old mailman HTML archives were very simple to crawl.. ENH: pypa: add a sitemap.xml for/of mailing list archives, forums; @pypa: ask for search engine indexing advice: "How do we make sure that the python mailing list archives will be search indexed?" (as they traditionally were)

    How to find the .txt of mailing list archives posts these days?

    From "[Python-ideas] Re: Introduce constant variables in Python" (2021) https://mail.python.org/archives/list/python-ideas@python.or... :

    - pyrsistent: PMap, PVector

    I'm out of time for this; (reformatting this for HN so that URLs will be auto-linkified but newlines won't be eliminated) here's the full email as .txt, the mailing list archive has a hyperlinkified version with newlines preserved. GH Markdown and CommonMark Markdown also preserve newlines and auto-linkify:

      From: [@westurner]
      Date: Thu, Jun 17, 2021, 10:43 AM
      Subject: Re: [Python-ideas] Re: Introduce constant variables in Python
      Cc: python-ideas <python-ideas@python.org>
      
      
      On Mon, May 24, 2021 at 5:43 PM Chris Angelico <rosuav@gmail.com> wrote:
      
      Requiring that a name not be rebound is well-defined and testable.
      Requiring that an object not change is either trivial (in the case of,
      say, an integer) or virtually impossible (in the case of most
      objects).
      
      What would be the advantage of such a declaration?
      
      ChrisA
      
      ## Existing threads re: consts and applications thereof
      
       So, `/? from:me pyrsistent` I found a few results:
      - "[Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)" 2016-04
        https://mail.python.org/pipermail/python-dev/2016-April/143958.html
        - ~Sandboxing python within python is nontrivial to impossible; consts might help a bit
          - https://mail.python.org/pipermail/python-dev/2016-April/143958.html
      
      - "Proposal to add const-ness type hints to PEP-484"
        https://mail.python.org/archives/list/python-ideas@python.org/thread/OVPF5I6IOVF6GOJQRH5UGCCU3R7PQHUF/
        - https://github.com/python/typing/issues/242
          - "Final names and attributes" https://github.com/python/mypy/pull/5522
             This is where `typing.Final` comes from.
      
      - "[Python-ideas] "Immutable Builder" Pattern and Operator"
        https://mail.python.org/pipermail/python-ideas/2017-January/044374.html
        - [pyrsistent] and "fn.py [do] immutables:
          https://github.com/kachayev/fn.py/blob/master/README.rst#persistent-data-structures "
      
      - "[Python-ideas] Add recordlcass to collections module"
        https://groups.google.com/g/python-ideas/c/9crHfcCBgYs/m/6_EEaWJAAgAJ
        - ORMs (e.g. Django, SQLAlchemy) require "dirty state" checking to know which object attributes have changed and need an SQL statement to be executed to synchronize the state; this is relevant because when we're asking for mutable namedtuple we're often trying to do exactly this pattern.
      
      - "[Python-ideas] Suggestions: dict.flow_update and dict.__add__"
        https://www.google.com/search?q=%22%5BPython-ideas%5D+Suggestions%3A+dict.flow_update+and+dict.__add__%22
      
        > dicttoolz has functions for working with these objects; including dicttoolz.merge (which returns a reference to the merged dicts but does not mutate the arguments passed).
        > 
        > https://toolz.readthedocs.io/en/latest/api.html#dicttoolz
        > https://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoolz.merge
        > 
        > pyrsistent has a PRecord class with invariants and type checking that precedes dataclasses. pyrsistent also has 'freeze' and 'thaw' functions for immutability. PRecord extends PMap, which implements __add__ as self.update(arg) (which does not mutate self)
      https://github.com/tobgu/pyrsistent/blob/master/README.rst#precord
        > 
        > https://github.com/tobgu/pyrsistent/blob/master/pyrsistent/_pmap.py
      
      - "[Python-ideas] How to prevent shared memory from being corrupted ?"
        https://www.google.com/search?q=%22How+to+prevent+shared+memory+from+being+corrupted+%3F%22
        > PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent
        > 
        > https://arrow.apache.org/docs/python/plasma.html#object-ids
        > https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer
      
        > > Objects are created in Plasma in two stages. First, they are created, which allocates a buffer for the object. At this point, the client can write to the buffer and construct the object within the allocated buffer. [...]
      
      - [Python-ideas] Experimenting with dict performance, and an immutable dict
        https://mail.python.org/archives/list/python-ideas@python.org/message/DNBGUJHDH4UTPSETMFFWMJHNXQXIWX4I/
      
        > https://pyrsistent.readthedocs.io/en/latest/intro.html#pyrsistent :
        > 
        >> Pyrsistent is a number of persistent collections (by some referred to as functional data structures). Persistent in the sense that they are immutable.
        >>
        >> All methods on a data structure that would normally mutate it instead return a new copy of the structure containing the requested updates. The original structure is left untouched.
        >>
        >> This will simplify the reasoning about what a program does since no hidden side effects ever can take place to these data structures. You can rest assured that the object you hold a reference to will remain the same throughout its lifetime and need not worry that somewhere five stack levels below you in the darkest corner of your application someone has decided to remove that element that you expected to be there.
        >>
        >> Pyrsistent is influenced by persistent data structures such as those found in the standard library of Clojure. The data structures are designed to share common elements through path copying. It aims at taking these concepts and make them as pythonic as possible so that they can be easily integrated into any python program without hassle.
      
        
      
      > What would be the advantage of such a declaration?
      
      Constants don't need to be locked or unlocked; which is advantageous for parallelism and reasoning about program correctness.
      True consts (wherein everything referred to in that object is 'frozen' and immutable or at least only modifiable with e.g. copy-on-write)
      wouldn't require locks,
      which would be post-GIL advantageous.
      
      You could do consts by never releasing a threading.Lock (or similar):
      - https://docs.python.org/3/library/asyncio-sync.html#locks
      - https://docs.python.org/3/library/threading.html#lock-objects
      - This from 
      https://docs.python.org/2/library/sets.html?highlight=immutable#immutable-transforms re ImmutableSet/FrozenSet
      is not present in the python 3 docs: 
      https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset 
      
      Though - even if Python enforced normal consts in the language - all of the other code objects would still be mutable, so you still have the impossibility of sandboxing python.
      
      Functional and contracts coding styles rely upon invariance; 
      which can be accomplished with various third-party packages that enforce const-ness throughout what may be an object tree behind that reference that would otherwise need to be copy.deepcopy()'d.
      
      ## pyrsistent
      Src: https://github.com/tobgu/pyrsistent
      
      > - PVector, similar to a python list
      > - PMap, similar to dict
      > - PSet, similar to set
      > - PRecord, a PMap on steroids with fixed fields, optional type and invariant checking and much more
      > - PClass, a Python class fixed fields, optional type and invariant checking and much more
      > - Checked collections, PVector, PMap and PSet with optional type and invariance checks and more
      > - PBag, similar to collections.Counter
      > - PList, a classic singly linked list
      > - PDeque, similar to collections.deque
      > - Immutable object type (immutable) built on the named tuple
      > -   freeze and thaw functions to convert between python standard collections and pyrsistent collections.
      > - Flexible transformations of arbitrarily complex structures built from PMaps and PVectors.
      
      
      ## icontract
      Src: https://github.com/Parquery/icontract
      
      > icontract provides design-by-contract to Python3 with informative violation messages and inheritance.
      >
      > It also gives a base for a flourishing of a wider ecosystem:
      > 
      > - A linter pyicontract-lint,
      > - A sphinx plug-in sphinx-icontract,
      > - A tool icontract-hypothesis for automated testing and ghostwriting test files which infers Hypothesis strategies based on the contracts,
      together with IDE integrations such as icontract-hypothesis-vim, icontract-hypothesis-pycharm, and icontract-hypothesis-vscode,
      > - Directly integrated into CrossHair, a tool for automatic verification of Python programs,
      together with IDE integrations such as crosshair-pycharm and crosshair-vscode, and
      > - An integration with FastAPI through fastapi-icontract to enforce contracts on your HTTP API and display them in OpenAPI 3 schema and Swagger UI.
      
      
    https://en.wikipedia.org/wiki/Design_by_contract

    https://en.wikipedia.org/wiki/Invariant_(mathematics)#Invari... [ https://en.wikipedia.org/wiki/Class_invariant ]

    > What is the difference between "invariant" and "constant" and "final"?

  • shadowgovt 7 hours ago
    Regarding the spooky-action-at-a-distance concerns of a `.freeze()` method on dict:

    `.freeze()` should probably just return a frozendict instead of in-place mutating the dict, and they should be separate types. Under the hood, you'll have to build the hashtable anyway to make the frozendict; as long as you're doing that work, you may as well build an object to contain the hashtable and just have that object be separate from the dict that birthed it.

    The values referenced by both the original dict and the frozendict can be the same values; no need to clone them.

  • DeathArrow 3 hours ago
    >But dictionaries are mutable, which makes them problematic for sharing data in concurrent code.

    Not really, C# has ConcurrentDictionary.

  • whimsicalism 5 hours ago
    yes! no more `MappingProxyType` hacks
  • semiinfinitely 3 hours ago
    its called dataclass
  • lou1306 8 hours ago
    Can someone give a strong rationale for a separate built-in class? Because "it prevents any unintended modifications" is a bit weak.

    If you have fixed keys, a frozen dataclass will do. If you don't, you can always start with a normal dict d, then store tuple(sorted(d.items())) to have immutability and efficient lookups (binary search), then throw away d.

  • drhagen 10 hours ago
    Great! Now make `set` have a stable order and we're done here.
    • cr125rider 10 hours ago
      Aren’t sets unsorted by definition? Or do repeated accesses without modification yield different results?
      • ledauphin 9 hours ago
        this is likely in reference to the fact that dicts have maintained insertion order since Python ~3.6 as property of the language. Mathematically there's no defined order to a set, and a dict is really just a set in disguise, but it's very convenient for determinism to "add" this invariant to the language.
        • zahlman 8 hours ago
          Sets use a different implementation intentionally (i.e. they are not "a dict without values") exactly because it's expected that they have different use cases (e.g. union/intersection operations).
        • jonathaneunice 9 hours ago
          Debugging is a completely different and better animal when collections have a predictable ordering. Else, every dict needs ordering before printing, studying, or comparing. Needlessly onerous, even if philosophically justifiable.
      • adrian_b 8 hours ago
        Not related to Python, but one of the possible implementations of a set, i.e. of an equivalence class on sequences, is as a sorted array (with duplicates eliminated, unless it is a multiset, where non-unique elements are allowed in the sorted array), as opposed to the unsorted array that can store an arbitrary sequence.

        So sets can be viewed as implicitly sorted, which is why the order of the elements cannot be used to differentiate two sets.

        Being sorted internally to enforce the equivalence between sets with elements provided in different orders does not imply anything about the existence of an operation that would retrieve elements in a desired order or which would return subsets less or greater than a threshold. When such operations are desired, an order relation must be externally defined on the set.

        So a possible definition of sets and multisets is as sorted arrays with or without unicity of elements, while sequences are unsorted arrays (which may also have the constraint of unique elements). However the standard set operations do not provide external access to the internal order, which is an order between arbitrary identifiers attached to the elements of the set, which have no meaning externally.

      • Retr0id 9 hours ago
        A stable order does not imply sorted order. If you convert the same set to a list twice you'll probably get the same order both times but it isn't guaranteed anywhere, and that order may change between python implementations and versions. The order may also change as the set grows or shrinks.
      • sltkr 9 hours ago
        So are dictionary keys, but Python decided to make them insertion ordered (after having them be unordered just like set elements for decades). There is no fundamental reason sets couldn't have a defined order. That's what languages like JavaScript have done too.
        • cpburns2009 8 hours ago
          Python's decision to make dict keys ordered in the spec was a mistake. It may be the best implementation so far, but it eliminates potential improvements in the future.
          • mrweasel 7 hours ago
            Agreed. The only reason to make them sorted is because people would wrongly assume that they where. You can argue that a programming language should not have unexpected behaviors, and apparently unsorted dictionary keys where a surprise to many, on the other hand I feel like it's a failure of education.

            The problem was that assuming that keys would be sorted was frequently true, but not guaranteed. An alternative solution would have been to randomize them more, but that would probably break a lot of old code. Sorting the keys makes no difference if you don't expect them to be, but it will now be a greater surprise if you switch language.

  • malkia 6 hours ago
    Welcome to starlark :)
  • neonsunset 7 hours ago
    [dead]