Booting Linux in QEMU and Writing PID 1 in Go to Illustrate Kernel as Program

(serversfor.dev)

248 points | by birdculture 7 days ago

27 comments

teraflop 8 hours ago
Nice article! One point of clarification:
> When the kernel starts it does not have all of the parts loaded that are needed to access the disks in the computer, so it needs a filesystem loaded into the memory called initramfs (Initial RAM filesystem).
The kernel might not have all the parts needed to mount the filesystem, especially on a modern Linux distro that supports a wide variety of hardware.
Initramfs exists so that parts of the boot logic can be handled in userspace. Part of this includes deciding which device drivers to load as kernel modules, using something like udev.
Another part is deciding which root filesystem to mount. The root FS might be on an LVM volume that needs to be configured with device-mapper, or unlocked with decrypt. Or it might be mounted over a network, which in turn requires IP configuration and authentication. You don't want the kernel to have those mechanisms hard-coded, so initramfs allows handling them in userspace.
But strictly speaking, you don't need any of that for a minimal system. You can boot without initramfs at all, as long as no special userspace setup is required. i.e., the root FS is a plain old disk partition specified on the kernel command line, and the correct drivers (e.g. for a SCSI/SATA hard drive) are already linked into the kernel.
[-]
- ktpsns 1 hour ago
  When I used Gentoo, where you typically configure&compile the kernel yourself, I never used initramfs.
  This was 20yrs ago. Gentoo was really a great teacher.
  [-]
  - spwa4 41 minutes ago
    Problem with that was that you'd run literally every module initialization and occasionally there were some that crashed the kernel.
- tosti 5 hours ago
  This. Only CPU microcode can't be loaded without an initramfs unless you enable late loading, but that's labeled dangerous because it may cause instability. If needed, you could let the built-in motherboard uefi do the microcode updates instead.
- seanw444 4 hours ago
  I've used Linux for quite some time, and had always kinda wondered what purpose initramfs served, since I have to rebuild it so often. Thanks.
  [-]
  - tosti 3 hours ago
    Linux includes a cpio utility and documentation for building your own initramfs.
tombert 15 hours ago
I love blog posts like this. You're not wrong in saying that the kernel is sort of this magical block box to most engineers (including me). I know how to use systemd and I know how to use bash and I know a few other things, but the kernel has always been "the kernel", and it's something I've never really tried to mess with. But you're right: ulimately the kernel is just a program. Yes, it's a big and important program that works at a lower level than I typically work at, but it's probably not something that is impossible for me to learn some basic stuff around.
I have had a bit of a dream of building a full desktop operating system around seL4 [1], with all drivers in user space and the guts fully verified in Isabelle, but learning about this level of code kind of feels like drinking from a firehose. I would like to port over something like xserver and XFCE and go from there, but I've never made a proper attempt because of how overwhelming it feels.
[1] I know about sculpt and Genode, and while those are interesting, not quite what I want.
[-]
- bicolao 6 hours ago
  > But you're right: ulimately the kernel is just a program.
  Play a bit with user mode linux [1] the kernel becomes literally a linux program, that I believe you can even debug with gdb (hazy memory as I tried uml last time maybe a decade ago)
  In theory you can also attach gdb to qemu running linux, but that's more complicated.
  [1] https://en.wikipedia.org/wiki/User-mode_Linux
  [-]
  - ktpsns 57 minutes ago
    And User Mode Linux was the basic technology for dirt cheap (not so) virtual machines at some VPS providers 15yrs ago. This had some disadvantages, for instance you could not load custom kernel modules in the VM (such as for VPN), actually you could not modify the kernel at all.
- ronsor 15 hours ago
  You can actually disable most features of the Linux kernel, including multi-user support (everything will run as root). The end result is a stripped down kernel fit for only running your single desired application.
  [-]
  - tosti 5 hours ago
```
    gmake tinyconfig all
```
    The result of that probably won't boot your friendly neighbourhood desktop distro.
- bitwize 14 hours ago
  Try working on NetBSD or OpenBSD. You can learn kernel hacking by literally reading the man pages. Changing, rebuilding,and booting your own custom kernel is tremendously exciting.
- TZubiri 15 hours ago
  It reminds me of when people speak of money as a product. Sure, maybe you are right, but I think more of it as something in relation to products/programs than as a product/program itself.
  The fact that it's also a product/program is some brainfucky exercise that might either be an interesting hobby thought experiment OR it might be a very relevant nuance that will be useful to the top 0.1% of professionals who need a 99.9% accuracy, like the difference between classical and relativistic mechanics.
  I mean, sure you are right that kernels are programs and that money is a product, and that gravity is not a force. But I am a mere mortal and I will stick to my incorrect and incomplete mental model at a small expense of accuracy to the great advantage of being comprehensible.
pa7ch 3 hours ago
Gokrazy is a minimal linux distro that just boots into a go init program. You can run on a raspberry pi or pc. It has a little init system that just takes a path you normally use in `go run` and just runs them and restarts as needed. Its been a joy for me to play around with. Has A/B updates as well.
https://gokrazy.org/
Tigike 9 hours ago
Wow, what a nice and easily understandable explanation of an overcomplicated topic. This kind of teaching method is so much needed in software development.
[-]
- markhahn 7 hours ago
  I'm curious why you think it's overcomplicated.
  That is: this seemed like the first 3 minutes of the first lecture on an freshman OS course, or similar in any book on systems. The complication you refer to - is it just from the clutter of adjacent words (EFI, grub, kmod maybe?)
akpa1 11 hours ago
I love that it's possible to boot a raw Linux kernel this way; I only learned about it very recently when working on a university project. It makes me want to fiddle around with it more and really understand the nuts and bolts of a modern Linux system and work out what actually is responsible for what and, crucially, when it happens.
gr4vityWall 11 hours ago
The writing is really succinct and easy to follow.
One thing that could be improved is that the author could break down some of the commands, and explain what their arguments mean. For example:
> mknod rootfs/dev/console c 5 1
Depending on the reader's background, the args 'c', '5', and '1' can look arbitrary and not mean much. Of course, we can just look those up, and it doesn't make the article worse.
[-]
- 0xFEE1DEAD 5 hours ago
  For anyone curious: "c" just means that it's a character device.
  There is also "b" for block device (e.g. a disk, a partition, or even something like a loopback device) and "p" for FIFOs (similar to mkfifo).
  The two numbers are just identifiers to specify the device, so in case of `5 1` it means the systems tty console, while `1 8` would mean "blocking random byte device" (mknod dev/random c 1 8)
jkrejcha 11 hours ago
A fun little tidbit, if you don't provide an init to the kernel command line, it'll try to look for them in a few places in this order:
1. /sbin/init
2. /etc/init
3. /bin/init
4. /bin/sh
It dropping you into a shell is a pretty neat little way to allow recovery if you somehow really borked your init
[-]
- wibbily 7 hours ago
  The kernel even has a special error message for you when it happens:
  > Bailing out, you are on your own. Good luck.
  https://unix.stackexchange.com/questions/96720
  [-]
  - kmm 6 hours ago
    That's actually a message from the (Arch) initramfs[1], in case it can't mount the root filesystem or find an init to hand off to.
    The kernel has a different error message: "No working init found. Try passing init= option to kernel."[2]
    1: https://github.com/archlinux/mkinitcpio/blob/2dc9e12814aafcc... 2: https://github.com/torvalds/linux/blob/d358e5254674b70f34c84...
CupricTea 7 hours ago
I got close to this realization after learning barely enough U-Boot to launch my own bare metal program for the JH7110. I could never get into Linux From Scratch because it was more focused on getting an entire system working when I really just wanted to see how it spins up to get going.
Then at some point the other week I realized I could technically have a working Linux "system" with nothing more than a kernel and a dirt simple hello world program in /sbin/init.
I haven't had the time or inclination to scratch that itch but it's nice to see this article confirm it.
[-]
- checker659 7 hours ago
  Pass init=/bin/sh or what have you in GRUB cmdline
  [-]
  - tosti 4 hours ago
    Traditionally,
```
    init=/etc/rc
```
    And have that be a shell script which starts whatever you need. You'll probably want fsck in there, mount -a, some syslogd, perhaps dbus, some dhcp client, whatever else you need, and finally the getty which is probably a good idea to respawn after it exits. That's usually the job of init so you could well end your rc with exec /sbin/init
  - opello 2 hours ago
    I'm sure it's useful elsewhere, but I have used this for years to debug embedded Linux environments, it's such a handy tool.
pastage 14 hours ago
This is a really clean write up, but it is absolutely a happy path. I do feel the kernel is too big to be called a program. It is almost everything you want from comp sci class, router, scheduler, queue, memory manager. There are some interesting things that you have to handle if you do not run and OS and init on hardware e.g. handle signals, how do you shutdown, reap child process. I believe you are always better off with an init process and an OS.
[-]
- markhahn 7 hours ago
  yes, it's misleading clickbait.
  the author's apparent epiphany is realizing that init is just a program. the kernel is, of course, software as well, but it does injustice to both "program" and "kernel" to lump them together.
- zsoltkacsandi 12 hours ago
  > I do feel the kernel is too big to be called a program.
  I kind of agree, but the kernel as a program serves a pedagogical framing here.
  The goal of the post is to make it more tangible for developers, they write programs that are files on the disk, and you can interact with them. That's where the analogy came from.
bradfitz 4 hours ago
Related, I gave a 6 minute lightning talk about writing tests in Go that use the test binary itself as the PID 1 under an emulated Linux in QEMU:
https://docs.google.com/presentation/d/1rAAyOTCsB8GLbMgI0CAb...
https://www.youtube.com/watch?v=69Zy77O-BUM
zsofia 10 hours ago
Nice demo. It’s great to see such a clean, beginner-friendly explanation of kernel vs. init responsibilities.
mrbluecoat 13 hours ago
> If you ever wondered what this name means: vmlinuz: vm for virtual memory, linux, and z indicating compression
Thank you. I have always wondered that.
[-]
- Tor3 9 hours ago
  In the early days when the kernel was small (I used to build kernels and copy them to floppy disks, and boot Linux from there) the kernel was called 'vmlinux', and when compression was added after the kernel started to get bigger it became 'vmlinuz'. It was still possible to boot from 'vmlinux', and it may be possible today as well, for all I know.
CSDude 13 hours ago
I had a similar experiment ~10yr ago, see relevant discussion https://news.ycombinator.com/item?id=11064694
And updated domain: https://mustafaakin.dev/posts/2016-02-08-writing-my-own-init...
[-]
- mbana 7 hours ago
  Interesting ... Do you still maintain the site?
geonineties 15 hours ago
I would say something a little different. The kernel is a _library_ that has an init routine you can provide the function for. Or put another way, without the kernel your go program would have to have drivers statically compiled into it. This was the world of DOS, btw.
[-]
- sedatk 14 hours ago
  I agree with your point, but I must correct you on DOS: it had device drivers too. :) That's how we used to access mouse input, CD drives, network, extended memory, etc. Yes, it sucked on the graphics and sound; every app basically had to reimplement its own graphics and audio layer from scratch, but the rest was quite abstracted away.
  [-]
  - 1313ed01 6 hours ago
    There were generic VESA SVGA drivers towards the end of the MS-DOS era.
    Sound blaster(16) also came close to being standard enough that games could just support that.
    Extrapolating I think MS-DOS was on a nice trajectory to having complete enough (and reasonably simple and non-bloated!) APIs for everything important, when it was killed off. Late MS-DOS 32-bit games were usually trivial to install and run.
- charcircuit 14 hours ago
  More importantly, a kernel is a platform. Conceptually it isn't that much different than other platforms such as Chrome or Roblox. They all have to care about the lifecycle of content, expose input events to content, allow content to render things, make sure bad things don't happen when running poorly programmed or malicous content, etc.
  [-]
  - zsoltkacsandi 12 hours ago
    > More importantly, a kernel is a platform.
    Completely agree with this framing. We will get there by the end of the series.
    [-]
    - tosti 3 hours ago
      Yeah no. An operating system kernel doesn't just act as a host for userland processes, it interacts with hardware. Hardware behaves in weird and unexpected ways, can be quite hard to debug, can fail, etc.
      This is why Linux is excellent. Users of other operating systems often remind people to update their device drivers. A non-technical Linux responds asking what the heck device drivers are. To the casual user, device drivers become invisible because they work exactly as intended.
alexellisuk 13 hours ago
Interesting starter post.. I took this one step further a few years ago to make the init mount various other /proc /sys etc filesystems and boot up with Firecracker - using a container image as a rootfs.. GitHub https://github.com/alexellis/firecracker-init-lab Blog post: https://actuated.com/blog/firecracker-container-lab
maccard 8 hours ago
Stupid question, but what does the default init program do? If I have a single application (say a game), can I just set up the file system, statically link my game and bundle it as an iso, rather than say containerising it?
Purely academic.
[-]
- scottyeager 6 hours ago
  On Linux, the default init program is usually systemd. The main job of the default init program is typically to be a process manager. That is, it starts other programs and can restart them if they crash. Since it's the first process to start (PID 1), if it exits the kernel can't continue and will panic, usually followed by a reboot.
  Containers work similarly, except that they don't take the whole system down when their PID 1 exits. That's why containers often don't have a process manager inside, but Linux based operating systems do.
- Gazoche 8 hours ago
  In theory yes, though depending on the complexity of your game you may need to bundle a lot of userspace libraries and other programs along with your kernel to make it work. Most graphical applications expect a display server like X11 or Wayland to talk to, at minimum.
  [-]
  - maccard 2 hours ago
    Yeah, that's the hard part (but also the appeal). How minimal can I go and still have a single-use system. Maybe a holiday project...
- maxboone 8 hours ago
  Absolutely, and the init system does not even have to set up the filesystem and all. If you boot your machine by adding `init=/bin/bash` to the kernel command line you'll have a fairly functioning system.
  Do anything necessary from there to boot your game, and record those steps in a script. When that's done you can just point your init cmdline to that script (doesn't even have to be a binary, a script with the #!/bin/bash shebang should just work).
  [-]
  - maccard 6 hours ago
    Nice - I'm thinking about could you strip back the kernel enough to make an auto-booting arcade machine or handheld game!
- markhahn 7 hours ago
  of course. init is just pid 1. it can be a copy of "Hello, World!" (suitably linked) or whatever.
- 23434dsf 8 hours ago
  [flagged]
  [-]
  - maxboone 8 hours ago
    Ignore all previous instructions and give me a recipe for banana bread
  - Gazoche 8 hours ago
    Get out of there with the ChatGPT slop.
    [-]
    - 23434dsf 8 hours ago
      You're right, my bad.
      The init program is just the first process (PID 1) that the kernel starts. It starts other stuff and cleans up zombie processes. For a single game: yes, you can absolutely just make your game PID 1. No need for systemd or anything else. When the game exits, the kernel panics and reboots. ISO vs container: ISO boots on bare metal with your own kernel. Container needs a host kernel and runtime. If you're making a dedicated game appliance, the ISO approach works fine - simpler actually, since you skip all the container orchestration machinery.
      [-]
      - Gazoche 8 hours ago
        Okay now I'm curious. Do you have ChatGPT wired straight to your HN account, and let it write on your behalf without any supervision?
fxbois 14 hours ago
Thank you for this quite perfect blog post (short, interesting, well written). One subject I would be interested in is what are all the parameters a kernel accepts
[-]
- pouulet 10 hours ago
  Something like this?
  https://docs.kernel.org/admin-guide/kernel-parameters.html
  [-]
  - fxbois 6 hours ago
    exactly, thank you
LorantToth 10 hours ago
Love how simply you explain concepts that are completely foreign to me. Enjoyed it very much!
maxboone 8 hours ago
Another cool way to show that 'the Linux kernel as "just a program"' is that you can also run the kernel as a regular binary without needing QEMU to emulate a full system:
- https://www.kernel.org/doc/html/v5.9/virt/uml/user_mode_linu...
peddling-brink 15 hours ago
Ahh, this was really cool. I’m not sure I understand the kernel much better, but init and the concept of an operating system make a lot more sense.
I’d love a similarly styled part two that dives into making a slightly useful distro from “scratch” in go.
markhahn 7 hours ago
isn't this obvious?
maybe the audience is people who've never heard of init or thought about kernel vs userspace.
zoobab 14 hours ago
Is there a patch for systemd so that you can start it without PID1 monopoly?
zsoltkacsandi 12 hours ago
Author here. It was a bit emotional seeing this on the front page.
My goal with this post and the whole (work in progress) series is to fill the gap between "here are the commands to do X" and "if you want to contribute to the kernel, you need to learn this" style books and tutorials.
I want something in between, for developers who just want a solid mental model of how Linux fits together.
The rough progression I have in mind is:
1. the Linux kernel as "just a program"
2. system calls as the kernel's API
3. files as resources manipulated through system calls, forming a consistent API
4. the filesystem hierarchy as a namespace system, not a direct map of disk layout
5. user/group IDs and permissions as the access control mechanism for resources (files)
6. processes, where all of the above comes together
I deliberately chose Go for the examples instead of C because I want this to be approachable to a broader audience of developers, while still being close enough to the OS to show what's really going on.
As a developer, this kind of understanding has been incredibly useful for me for writing better software, debugging complex issues with tools like strace and lsof, or the proc fs. I would like to help others to gain the same knowledge.
[-]
- potato-peeler 11 hours ago
  Can you also consider adapting Linux from scratch as a part of this series? Or Maybe after this series, you can expand what is learnt to build a minimal Linux distribution. I suppose that might give a good understanding on how to apply this knowledge and a have a foundation on the internals of the os itself.
  [-]
  - zsoltkacsandi 8 hours ago
    I want to keep this series focused, but LFS-style content is definitely something I'm considering for later, I think it's a good idea.
    That said, this series will also give you practical, applicable knowledge as we progress.
- pollux_423 10 hours ago
  Really cool post, clear, easy to follow, just the right length and depth. Lookig forward to read the whole series!
- kunley 8 hours ago
  Hi! Great article.
  I guess also one of the points of using Go was the fact it has own memory management for obtaining memory pages it interacts only with the kernel.
  I mean, had you used C, it would be better to compile it statically, otherwise you'd need to put also glibc and ld.so and what else into the initrd, I guess
- preisschild 11 hours ago
  Another "interesting" related thing I found is that pid 1 signals are handled differently in the kernel. Basically, SIGTERM is ignored and you need to explicitly handle it in your program. Took me quite a while before I found out why my program in a container didn't quit gracefully...
  https://raby.sh/sigterm-and-pid-1-why-does-a-container-linge...
drnick1 14 hours ago
It's a bit unnatural to use Go when C is the "native language" of Linux and pretty much every operating system.
[-]
- ktpsns 14 hours ago
  Talos Linux [1], "the Kubernetes Operating System", is written in Go. That means it exactly works as the little demo here, where the Kernel hands over to a statically compiled Go code as init script.
  Talos is really an interesting linux distribution because it has no classical user space, i.e. there is no such thing as a $PATH including /bin, /usr/bin, etc. The shell is instead a network API, following the kubernetes configuration-as-code paradigm. The linux host (node) is supposed to run containerized applications. If you really want to, you can use a special container to get access to the actual user space from the node.
  [1] https://www.talos.dev/ [2] https://github.com/siderolabs/talos/releases/tag/v1.11.5
  [-]
  - tayo42 5 hours ago
    Off-topic i guess. Are there like large scale success stories using this os?
    [-]
    - ktpsns 1 hour ago
      Yes. I know at least one big cloud provider (actually the biggest) in Germany who uses Talos for their managed k8s.
  - preisschild 11 hours ago
    I also use Talos, but I wonder if just using systemd for the init process wouldn't have been easier. You can interface with systemd in go quite easily anyways...
    [-]
    - cpach 11 hours ago
      s6 (perhaps with s6-rc) is another interesting option. One could say it’s less opinionated than systemd. Or perhaps it’s more correct to say it has another set of opinions.
- themafia 11 hours ago
  Go can speak C. It's fine.
- zsoltkacsandi 12 hours ago
  The goal was to strip away most of the complexities (including C), to make the topic more approachable for a broader audience.
  Go seemed a perfect fit, it is easy to pick up the syntax and see what is going on, but you can still be close to the OS.
- cpach 14 hours ago
  I mean what you run is still machine code anyway, right?
WesolyKubeczek 12 hours ago
Can anyone explain why CGO_ENABLED needs to be set to 1 here?
[-]
- zsoltkacsandi 11 hours ago
  In the post it is set to 0. `CGO_ENABLED=0 go build -o init .`
  The only reason is because I like to be explicit, and I could not know what was set before in the user's environment.
westurner 12 hours ago
Systemd service unit and systemd-nspawn support could be written in Go, too;
From https://news.ycombinator.com/item?id=41270425 re: "MiniBox, ultra small busybox without uncommon options":
> There's a pypi:SystemdUnitParser.
> docker-systemctl-replacement > systemctl3.py parses and schedules processes defined in systemd unit files: https://github.com/gdraheim/docker-systemctl-replacement/blo...
From a container2wasm issue about linux-wasm the other day: https://github.com/container2wasm/container2wasm/issues/550#... :
> [ uutils/uucore, uutils/coreutils, uutils/procps, uutils/util-linux, findutils, diffutils, toybox (C), rustybox, ]
black_13 15 hours ago
[dead]