Soul Player C64 – A real transformer running on a 1 MHz Commodore 64

(github.com)

64 points | by adunk 4 hours ago

10 comments

wk_end 2 hours ago
> 25K parameters is about 70 million times smaller than GPT-4. It will produce broken sentences. That's the point - the architecture works at this scale.
Since it seems to just produce broken and nonsensical sentences (at least based on the one example given) I'm not sure if it does work at this scale.
Anyway, as written this passage doesn't really make a whole lot of sense (the point is that it produces broken sentences?), and given that it was almost certainly written by an AI, it demonstrates that the architecture doesn't work especially well at any scale (I kid, I kid).
[-]
- forinti 1 hour ago
  How does it compare to a Markov chain generator I wonder.
  [-]
  - jll29 12 minutes ago
    The Transformer is the more powerful model than Markov chain, but on such a weak machine as the C64, a MC could output text faster - but it surely would sound "psychedelic", as the memory limits a MC to a first-order or second-order model, so to predict one word, only the two words before would be taken into account as context (and no attention).
    On a plain vanilla C64, the Transformer cannot really show what it's capable of doing. An implementation using 2 bit per weight (vectorized) could be slightly better, perhaps.
- pizza234 56 minutes ago
  [dead]
daemonologist 53 minutes ago
You can chat with the model on the project page: https://indiepixel.de/meful/index.html
It (v3) mostly only says hello and bye, but I guess for 25k parameters you can't complain. (I think the rather exuberant copy is probably the product of Claude et al.)
anyfoo 1 hour ago
This would have blown me away back in the late 80s/early 90s.
(Or maybe not, if it doesn't perform better than random, I haven't actually tried it out yet. Some more examples would have been nice!)
I wonder how far you could push this while still staying period correct, e.g. by adding a REU (RAM Expansion Unit), or even a GeoRAM (basically a REU on steroids).
SuperCPU would also be an option, but for me it's always blurring the line of "what is a C64" a bit too much, and it likely just makes it faster anyway.
mixmastamyk 40 minutes ago
Just reminded me of the random sentence generator program on my Vic-20. I had changed most of the words to all the bad words a preteen could think up. So many laughs with the neighborhood kids.
classichasclass 1 hour ago
If you're running this in VICE, run it under the SuperCPU with warp mode on.
[-]
- bartread 1 hour ago
  That's a good idea because, although I love this, 1 minute per token is absolutely savage. Whereas if you can juice the performance you're into semi-credible Jar Jar Binks simulator territory.
  It does also make me wonder what you could do with somewhat more powerful retro hardware. I'd love to see what a transformer running on a PSX or an N64 could do.
brcmthrowaway 1 hour ago
How does this compare to ELIZA?
[-]
- Geee 2 minutes ago
  ELIZA is better, because this doesn't seem to generate anything coherent. You can try the original ELIZA with DOCTOR script here: https://anthay.github.io/eliza.html
- jll29 16 minutes ago
  Jopsph Weizenbaum's ELIZA was rule-based and ran on even slower (1960s) hardware, but because it relied on simple pattern matching instead of neural nets, it would easily have been more responsive (the Emacs editor/operating system has an implementation included, start it with: M-x doctor RETURN).
  ELIZA was not written in assembler, but (different versions) in COMIT, FORTRAN and LISP.
  https://dl.acm.org/doi/pdf/10.1145/365153.365168
harel 2 hours ago
Eliza called, and asked if we saw her grand kids...
[-]
- tclancy 1 hour ago
  What makes you say that? This is about you, not me.
  (Came here to say an update to Eliza could really mess with the last person still talking to her.)
ghstinda 1 hour ago
but can you make mac keyboards feel like a c64c?
Lerc 1 hour ago
Ok now we need 1541 flash attention.
I'm not sure what the venn diagram of knowledge to understand what that sentence is suggesting looks like, it's probably more crowded in the intersection than one might think.
bighead1 2 hours ago
i hate ai, and i love the c64, but i'll allow it.