Superfunctions: A universal solution against sync/async fragmentation in Python

(github.com)

35 points | by pomponchik 4 days ago

8 comments

PaulHoule 1 day ago
It strikes me as a worst of all worlds solution where it is awkward to write superfunctions (gotta write all those conditionals) and awkward to use them (gotta call fn.something())
For async to be useful you've got to be accessing the network or a pipe or sleep so you have some context, you might as well encapsulate this context in an object and if you're doing that the object is going to look like
```
  class base_class:
     @maybe_async
     def api_call(parameters)
        ... transform the parameters for an http call ...
        response = maybe_await(self.http_call(**http_parameters))
        ... transform the response to a result ...
        return result
```
almost every time where I was wishing I could have sync and async generated from the same source it was some kind of wrapper for an http API where everything had the same structure -- and these things can be a huge amount of code because the http API has a huge number of calls but the code is all the same structure.
[-]
- zelphirkalt 1 day ago
  Why is it only "maybe async" and only "maybe await"? Isn't it clear at writing code time which one it is, async or sync?
  [-]
  - jcranmer 1 day ago
    There is some code that wants to be generic over its executor. This is particularly true of something like parsing code, where you logically have a "get more data" call and you really don't care about the details of the call otherwise. So if the user provided a synchronous "get more data" source, you'd want the parse method to be synchronous; if it were asynchronous, you'd want it the parse method to be asynchronous.
    That's basically the genesis of the idea of maybe-async. I've cooled tremendously on the idea, personally, because it turns out that a lot of code has rather different designs throughout the entire stack if you're relying on sync I/O versus async I/O, and this isn't all that useful in practice.
  - PaulHoule 1 day ago
    For an http API (say something like boto3 or a client for arangodb) you might want to use the API from either a sync or async application. Since the code is almost all the same you can code generate a version of the API for both sync and async which is particularly easy if you use
    https://www.python-httpx.org/
    since you can use basically the same http client for both sides. One way to do it is write code like the sample I showed and use
    https://docs.python.org/3/library/ast.html
    to scan the tree and either remove the maybe_await() or replace it with an await accordingly. You could either do this transformation when the application boots or have some code that builds both sync and async packages and packs the code up in PyPi. There are lots of ways to do it.
    [-]
    - zelphirkalt 22 hours ago
      Why not just make a decision to query an API async and be done with it, instead of having something being "maybe"? Since this whole topic is about having a good way to have ones cake and also eat it, assuming one has such a way, I don't see any downside to choosing async for API calls.
operator-name 1 day ago
Certainly an interesting approach compared to asgiref or synchronicity but I have doubts about the approach.
Does this not add further function colors - that of a transfunction, tiddle superfunction and non-tilde superfunction? Now every time you call one from another you need to use both the context managers and know what variant you are calling.
asgiref provides the simple wrappers sync_to_async() and async_to_aync(). Easy to understand and to slowly transition. Caveat is the performance impact if overused.
synchronicity uses a different approach - write 100% async code and expose both a sync and async interface. async def foo() becomes def foo() and async def foo.aio().
https://github.com/django/asgiref https://github.com/modal-labs/synchronicity
nilslindemann 1 day ago
"async" is a misnomer. What we call "async" is actually "chaotic async" or "time-optimized async" or "switchable async" and what we call "sync" is actually "ordered async" or "unoptimized async", and what we call "parallel" (a good name) is actually "sync".
Because nomen est omen, everything done now will just result in a growing pile of complexity. (see also the "class" misnomer for types), until someone looks again, and give the proper name - or operator - to the concept.
I imagine a future where we have a single operator on a line, like a "." which says: do everything from the last point to here parallel or async - however you want, in any order you want - but here is the point where everything has to be done ("joined"), before proceeding.
[-]
- 7bit 23 hours ago
  > "async" is a misnomer. What we call "async" is actually "chaotic async" or "time-optimized async" or "switchable async" and what we call "sync" is actually "ordered async" or "unoptimized async", and what we call "parallel" (a good name) is actually "sync".
  That's just semantic nitpicking. Everybody knows what async/sync means. The term is established for a very long time.
  [-]
  - nilslindemann 19 hours ago
    Using a logical fallacy like "Everybody knows ..." indicates you are not sure about your argument, or intentionally dishonest.
pomponchik 4 days ago
Many old Python libraries got their mirrored async reflections after the popularization of asynchrony. As a result, the entire Python ecosystem was duplicated. Superfunctions are the first solution to this problem, which allows you to partially eliminate code duplication by giving the client the opportunity to choose (similar to how it is done in Zig) whether to use the regular or asynchronous version of the function.
[-]
- nine_k 1 day ago
  But AFAICT in Zig you don't have to have async and sync versions. Instead, the runtime may choose to interpret `try` as an async or as a synchronous call, the latter is equivalent to a future / promise that resolves before the next statement [1]. This is a sane approach.
  Having separate sync / async versions that look like the same function is a great way to introduce subtle bugs.
  [1]: https://kristoff.it/blog/zig-new-async-io/
rsyring 1 day ago
```
  ~my_superfunction()
  #> so, it's just usual function!
  
```
> Yes, the tilde syntax simply means putting the ~ symbol in front of the function name when calling it.
There's a way to work around that but...
> The fact is that this mode uses a special trick with a reference counter, a special mechanism inside the interpreter that cleans up memory. When there is no reference to an object, the interpreter deletes it, and you can link your callback to this process. It is inside such a callback that the contents of your function are actually executed. This imposes some restrictions on you:
> - You cannot use the return values from this function in any way...
> - Exceptions will not work normally inside this function...
Ummm...I'm maybe not the target audience for this library. But...no. Just no.
[-]
- operator-name 1 day ago
  I think that only applies to tilde_syntax=False but way it is written isn’t very clear.
RS-232 1 day ago
I don’t see the utility here. You’re still duplicating code inside the template function with context managers.
The decorator would be a lot more useful if it abstracted all that away automagically. I/O bound stuff could be async and everything else would be normal.
OutOfHere 1 day ago
I advise users to just abandon async in Python for long term success because the future of Python is free-threaded, and async is inherently single-threaded. Even if you don't need multiple threads now, your CPU will thank you later when you do, saving you a full rewrite. With Python 3.14, free-threading is an established optional feature of Python.
[-]
- zbentley 1 day ago
  The two don't really compete, because async/await is primarily about parallelizing IO.
  If I want to (say) probe a dozen URLs for liveness in parallel, or write data to/from thousands of client sockets from my webserver, doing that with threads--especially free-threaded Python threads, which are still quite lock-happy inside the interpreter, GIL or not--has a very quickly-noticeable performance and resource cost.
  Async/await's primary utility is that of a capable utility interface for making I/O concurrent (and parallel as well, in many cases), regardless of whether threads are in use.
  Hell, even golang multiplexes concurrent goroutines' threads onto concurrent IO schedulers behind the scenes, as does Java's NIO, Erlang/BEAM, and many many similar systems.
  [-]
  - PaulHoule 1 day ago
    People who talk about there being a hard line between parallelism and concurrency are always writing code with race conditions that they deny exist or writing code with performance bottlenecks they can't understand because they deny they exist.
    I like working in Java because you can use the same threading primitives for both and have systems that work well in both IO-dominated and CPU-dominated regimes which sometimes happen in the same application under different conditions.
    Personally there are enough details to work out that we might be up to Python 3.24 when you can really count on all your dependncies to be thread safe. One of the reasons Java has been successful is the extreme xenophobia (not to mention painful JNI) which meant we re-implemented stuff to be thread-safe in pure Java as opposed to sucking in a lot of C/C++ stuff which will never be thread safe.
    [-]
    - zelphirkalt 1 day ago
      Using multiple OS level threads is not very efficient, when the only problem is waiting for some IO and wanting to do something else in the meantime. A more lightweight primitive of concurrency is needed. I am not saying that async is necessarily it.
      [-]
      - PaulHoule 1 day ago
        It's not efficient if you need 100,000 threads. 1,000 threads are almost free... what's expensive is thread switching in cases where parallelism matters. I have pair programmed embarrassingly parallel problems so many times and every time my partner wanted to argue about the need to batch the work and every time we failed to get a speedup the first time, and after adding batching we got close to ideal speedup.
        People are still hung up on this 1999 article
        https://www.kegel.com/c10k.html
        but hardware has moved on. I'm typing this on a machine with 16 CPU cores and I'm more worried about leaving easy parallelism on the table (even when it only gets say a 40% overall speedup) more than I am in 5% overhead from OS threads. I've seen so much buggy code that tries to get it correct with select() and io_uring() and all that but doesn't quite. Google has to handle enough questions to care, and you probably don't. If you worry about these things (right or wrong) Java has your back with virtual threads.
        [-]
        zbentley 1 day ago
        You're mostly right; computers are fast at core/thread parallelism now and leaving capacity on the table is bad.
        But there are a lot of common problems where the overhead of threads shows up a lot earlier in the scaling process than you'd think. Probing sets of ~100 URLs to see which are live? Even in a fast, well-threaded language, the overhead of creating/probing/shutting down all those threads adds up to noticeable latency very quickly. Creating lingering threadpools requires pre-planning available thread-parallelism and scheduling (say) concurrent web requests' workloads onto those shared pools with care. Compared to saying "I know these concurrent computations are 99.9% slow IO-wait-bound while probing URLs, let the event loop's epoll take care of it", that's a hassle--and just as hard to get right as manually messing about with select/poll/epoll multiplexers.
        You're right that there are a lot of buggy IO multiplexer implementations out there; that kind of stuff is best farmed out to a battle-hardened event loop, ideally one that's distributed with your language runtime. But such systems aren't "google scale only", they're needful in a lot of places for ordinary programs' work--whether or not they're ever coded manually or interacted with directly in those programs.
        If you don't often see problems like the hypothetical URL liveness checker, consider the main socket read/write components of a webserver harness: do you really want to spin up a thread as soon as bytes arrive on a socket to be read/written? Or would you rather allocate a precious request-handler thread (or process, or whatever) only after an efficient IO-multiplexed event loop has parsed enough of a request off of the socket to begin handling it? Plenty of popular webservers take both approaches. The ones that prefer evented I/O for their core request flow are less vulnerable to SlowLoris.
        There are more benefits to async/await concurrency (and costs, too), including cancellation/timeout semantics, prioritization, and more, but even without getting into those, near-automatic IO multiplexing is pretty huge.
        [-]
        PaulHoule 1 day ago
        As a pythoner one of the things that drives me crazy is that none of the concurrency control mechanisms are completely x-platform; different aio loops have different strengths and weaknesses in Windows and if I want to use gunicorn well I better do it in WSL2. I used aiohttp for my YOShInOn RSS reader [1] and Fraxinus image sorter and gave up on it when Fraxinus was serving images and other content with a high concurrency count, it was getting hung up on something and I could either try to debug it with an X% chance of succeeding or just go to gunicorn and I know I can use all the cores on my machine. It is still fronted by IIS and IIS serves the images because that's even better for performance. If I wasn't on my Frankstack it would be nginx.
        The real problem in web crawlers and such is supporting "one-thread per target server and limited request rate per target server" which is devilishly hard to do with whatever framework you're using on a single machine and even harder when you have a distributed crawler. I think some the rage people have about AI crawlers is the high cost of bandwidth out of AWS, but some of it has to be that the AI crawlers don't even seem to try anymore.
        [1] https://mastodon.social/@UP8/114887102728039235 really!
        [-]
        zelphirkalt 23 hours ago
        What is not working with multiprocessing pool?
  - amelius 1 day ago
    Async is nice in theory, but then some manager asks to do not only IO but also some computation, and there goes the entire plan. Better to use threads from the start because they can be used to manage both types of resource (IO and CPU) in a uniform way.
    [-]
    - OutOfHere 1 day ago
      That is precisely what happens. People then jump to absurdities like microservices which severely compound the architecture.
  - OutOfHere 1 day ago
    Go lang will happily use multiple cores if the load calls for it, multiple cores are available, and GOMAXPROCS is not restricted to 1.
    With Python's asyncio, yes, you can read from many client sockets in a core, but you really can't do much work with them in the core, otherwise even the IO will slow down. It is not future-proof at all.
    [-]
    - zbentley 1 day ago
      Not for I/O. If many goroutines all request a socket read, the golang runtime shoves them all onto an IO multiplexing event loop very similar to Python's asyncio stdlib event loops: https://www.sobyte.net/post/2022-01/go-netpoller/
      And sure, Golang's much better than a cooperative concurrency system at giving work out to multiple cores once the IO finishes, no argument there.
      But again . . . async/await in Python (and JavaScript, Java NIO, and many more) is not about using multiple cores for computations; it's about efficiently making IO concurrent (and, if possible, parallel) for unpredictably "IO-wide" workloads, of which there are many.
      [-]
      - wahern 1 day ago
        > at giving work out to multiple cores once the IO finishes
        The IO (read, write) doesn't need to finish, just the poll call. That's a very different thing in terms of core utilization, and while it's technically a serialization point, it's not the only serialization point in Go's architecture; it falls out from Go having a single, global run queue, requiring all (idle) threads to serialize on dequeueing tasks.
        But thank you for pointing that out. TIL, there's a single epoll queue for the whole Go process: https://github.com/golang/go/issues/65064
      - OutOfHere 1 day ago
        Today you want just IO, but tomorrow you will inevitably want some computation too, and later a lot of computation, and then everything will slow down, even the IO, because everything is still running in a single thread. This is how it goes, not just typically, but inevitably always.
anon291 1 day ago
So much work to implement the monad type class
[-]
- adamwk 1 day ago
  I don’t think this implements anything monad shaped
  [-]
  - codebje 1 day ago
    Async is monad shaped. Not-async is monad-shaped, for a degenerate monad. Writing a function that works in both async and not-async contexts just means writing a function that works for any monad.
- zbentley 1 day ago
  Not really; the problem is that languages with IO monads often provide a runtime that can schedule IO-ful things concurrently (or, in Haskell's case, lazily) based on the type. Python has no such scheduler; users have to run their own in the form of an async-capable event loop or a sequential (threadpool/processpool) executor for blocking code.
  Because of that missing runtime for scheduling and evaluating IO-ful things, tools like superfunctions are necessary.
  In other words: IO monads are only as useful as the thing that evaluates them; Python doesn't have a built-in way to do that, so people have to make code that looks "upward" to determine what kind of IO behavior (blocking/nonblocking/concurrent/lazy/etc.) is needed.
  [-]
  - codebje 1 day ago
    If you want a function to be usable in both an async and a non-async environment, the monads in question are ones for async and identity, not IO. The choice between a true concurrent runtime and a single threaded cooperative coroutine runtime is up to you in GHC Haskell.
    Monad-agnostic functions are exactly looking upwards to allow the calling context to determine behaviour.
  - operator-name 1 day ago
    You frame that as if python doesn’t have a choice, but it chose to have explicit syntax.
    There’s no reason a python-like language couldn’t have deeper semantics for async, and language level implementation.
    [-]
    - zbentley 1 day ago
      Well, there aren't reasons why Python couldn't have that, but there certainly are good reasons why the Python maintainers decided that type of runtime was not appropriate for that specific language--not least among them that maintaining a scheduling/pre-empting runtime (which is the only approach I'm aware of that works here without basically making Python into an entirely unrelated language) is very labor intensive! There's a reason there aren't usable alternative implementations of ERTS/BEAM, and a reason why gccgo is maintained in fits and starts, and why the Python GIL has been around for so long. Getting that type of system right is very, very hard.