> When the kernel starts it does not have all of the parts loaded that are needed to access the disks in the computer, so it needs a filesystem loaded into the memory called initramfs (Initial RAM filesystem).
The kernel might not have all the parts needed to mount the filesystem, especially on a modern Linux distro that supports a wide variety of hardware.
Initramfs exists so that parts of the boot logic can be handled in userspace. Part of this includes deciding which device drivers to load as kernel modules, using something like udev.
Another part is deciding which root filesystem to mount. The root FS might be on an LVM volume that needs to be configured with device-mapper, or unlocked with decrypt. Or it might be mounted over a network, which in turn requires IP configuration and authentication. You don't want the kernel to have those mechanisms hard-coded, so initramfs allows handling them in userspace.
But strictly speaking, you don't need any of that for a minimal system. You can boot without initramfs at all, as long as no special userspace setup is required. i.e., the root FS is a plain old disk partition specified on the kernel command line, and the correct drivers (e.g. for a SCSI/SATA hard drive) are already linked into the kernel.
This. Only CPU microcode can't be loaded without an initramfs unless you enable late loading, but that's labeled dangerous because it may cause instability. If needed, you could let the built-in motherboard uefi do the microcode updates instead.
I love blog posts like this. You're not wrong in saying that the kernel is sort of this magical block box to most engineers (including me). I know how to use systemd and I know how to use bash and I know a few other things, but the kernel has always been "the kernel", and it's something I've never really tried to mess with. But you're right: ulimately the kernel is just a program. Yes, it's a big and important program that works at a lower level than I typically work at, but it's probably not something that is impossible for me to learn some basic stuff around.
I have had a bit of a dream of building a full desktop operating system around seL4 [1], with all drivers in user space and the guts fully verified in Isabelle, but learning about this level of code kind of feels like drinking from a firehose. I would like to port over something like xserver and XFCE and go from there, but I've never made a proper attempt because of how overwhelming it feels.
[1] I know about sculpt and Genode, and while those are interesting, not quite what I want.
> But you're right: ulimately the kernel is just a program.
Play a bit with user mode linux [1] the kernel becomes literally a linux program, that I believe you can even debug with gdb (hazy memory as I tried uml last time maybe a decade ago)
In theory you can also attach gdb to qemu running linux, but that's more complicated.
And User Mode Linux was the basic technology for dirt cheap (not so) virtual machines at some VPS providers 15yrs ago. This had some disadvantages, for instance you could not load custom kernel modules in the VM (such as for VPN), actually you could not modify the kernel at all.
You can actually disable most features of the Linux kernel, including multi-user support (everything will run as root). The end result is a stripped down kernel fit for only running your single desired application.
Try working on NetBSD or OpenBSD. You can learn kernel hacking by literally reading the man pages. Changing, rebuilding,and booting your own custom kernel is tremendously exciting.
It reminds me of when people speak of money as a product. Sure, maybe you are right, but I think more of it as something in relation to products/programs than as a product/program itself.
The fact that it's also a product/program is some brainfucky exercise that might either be an interesting hobby thought experiment OR it might be a very relevant nuance that will be useful to the top 0.1% of professionals who need a 99.9% accuracy, like the difference between classical and relativistic mechanics.
I mean, sure you are right that kernels are programs and that money is a product, and that gravity is not a force. But I am a mere mortal and I will stick to my incorrect and incomplete mental model at a small expense of accuracy to the great advantage of being comprehensible.
Gokrazy is a minimal linux distro that just boots into a go init program. You can run on a raspberry pi or pc. It has a little init system that just takes a path you normally use in `go run` and just runs them and restarts as needed. Its been a joy for me to play around with. Has A/B updates as well.
Wow, what a nice and easily understandable explanation of an overcomplicated topic. This kind of teaching method is so much needed in software development.
That is: this seemed like the first 3 minutes of the first lecture on an freshman OS course, or similar in any book on systems. The complication you refer to - is it just from the clutter of adjacent words (EFI, grub, kmod maybe?)
I love that it's possible to boot a raw Linux kernel this way; I only learned about it very recently when working on a university project. It makes me want to fiddle around with it more and really understand the nuts and bolts of a modern Linux system and work out what actually is responsible for what and, crucially, when it happens.
The writing is really succinct and easy to follow.
One thing that could be improved is that the author could break down some of the commands, and explain what their arguments mean. For example:
> mknod rootfs/dev/console c 5 1
Depending on the reader's background, the args 'c', '5', and '1' can look arbitrary and not mean much. Of course, we can just look those up, and it doesn't make the article worse.
For anyone curious: "c" just means that it's a character device.
There is also "b" for block device (e.g. a disk, a partition, or even something like a loopback device) and "p" for FIFOs (similar to mkfifo).
The two numbers are just identifiers to specify the device, so in case of `5 1` it means the systems tty console, while `1 8` would mean "blocking random byte device" (mknod dev/random c 1 8)
I got close to this realization after learning barely enough U-Boot to launch my own bare metal program for the JH7110. I could never get into Linux From Scratch because it was more focused on getting an entire system working when I really just wanted to see how it spins up to get going.
Then at some point the other week I realized I could technically have a working Linux "system" with nothing more than a kernel and a dirt simple hello world program in /sbin/init.
I haven't had the time or inclination to scratch that itch but it's nice to see this article confirm it.
And have that be a shell script which starts whatever you need. You'll probably want fsck in there, mount -a, some syslogd, perhaps dbus, some dhcp client, whatever else you need, and finally the getty which is probably a good idea to respawn after it exits. That's usually the job of init so you could well end your rc with exec /sbin/init
This is a really clean write up, but it is absolutely a happy path. I do feel the kernel is too big to be called a program. It is almost everything you want from comp sci class, router, scheduler, queue, memory manager. There are some interesting things that you have to handle if you do not run and OS and init on hardware e.g. handle signals, how do you shutdown, reap child process. I believe you are always better off with an init process and an OS.
the author's apparent epiphany is realizing that init is just a program. the kernel is, of course, software as well, but it does injustice to both "program" and "kernel" to lump them together.
> I do feel the kernel is too big to be called a program.
I kind of agree, but the kernel as a program serves a pedagogical framing here.
The goal of the post is to make it more tangible for developers, they write programs that are files on the disk, and you can interact with them. That's where the analogy came from.
In the early days when the kernel was small (I used to build kernels and copy them to floppy disks, and boot Linux from there) the kernel was called 'vmlinux', and when compression was added after the kernel started to get bigger it became 'vmlinuz'. It was still possible to boot from 'vmlinux', and it may be possible today as well, for all I know.
I would say something a little different. The kernel is a _library_ that has an init routine you can provide the function for. Or put another way, without the kernel your go program would have to have drivers statically compiled into it. This was the world of DOS, btw.
I agree with your point, but I must correct you on DOS: it had device drivers too. :) That's how we used to access mouse input, CD drives, network, extended memory, etc. Yes, it sucked on the graphics and sound; every app basically had to reimplement its own graphics and audio layer from scratch, but the rest was quite abstracted away.
There were generic VESA SVGA drivers towards the end of the MS-DOS era.
Sound blaster(16) also came close to being standard enough that games could just support that.
Extrapolating I think MS-DOS was on a nice trajectory to having complete enough (and reasonably simple and non-bloated!) APIs for everything important, when it was killed off. Late MS-DOS 32-bit games were usually trivial to install and run.
More importantly, a kernel is a platform. Conceptually it isn't that much different than other platforms such as Chrome or Roblox. They all have to care about the lifecycle of content, expose input events to content, allow content to render things, make sure bad things don't happen when running poorly programmed or malicous content, etc.
Yeah no. An operating system kernel doesn't just act as a host for userland processes, it interacts with hardware. Hardware behaves in weird and unexpected ways, can be quite hard to debug, can fail, etc.
This is why Linux is excellent. Users of other operating systems often remind people to update their device drivers. A non-technical Linux responds asking what the heck device drivers are. To the casual user, device drivers become invisible because they work exactly as intended.
Stupid question, but what does the default init program do? If I have a single application (say a game), can I just set up the file system, statically link my game and bundle it as an iso, rather than say containerising it?
On Linux, the default init program is usually systemd. The main job of the default init program is typically to be a process manager. That is, it starts other programs and can restart them if they crash. Since it's the first process to start (PID 1), if it exits the kernel can't continue and will panic, usually followed by a reboot.
Containers work similarly, except that they don't take the whole system down when their PID 1 exits. That's why containers often don't have a process manager inside, but Linux based operating systems do.
In theory yes, though depending on the complexity of your game you may need to bundle a lot of userspace libraries and other programs along with your kernel to make it work. Most graphical applications expect a display server like X11 or Wayland to talk to, at minimum.
Absolutely, and the init system does not even have to set up the filesystem and all. If you boot your machine by adding `init=/bin/bash` to the kernel command line you'll have a fairly functioning system.
Do anything necessary from there to boot your game, and record those steps in a script. When that's done you can just point your init cmdline to that script (doesn't even have to be a binary, a script with the #!/bin/bash shebang should just work).
The init program is just the first process (PID 1) that the kernel starts. It starts other stuff and cleans up zombie processes.
For a single game: yes, you can absolutely just make your game PID 1. No need for systemd or anything else. When the game exits, the kernel panics and reboots.
ISO vs container: ISO boots on bare metal with your own kernel. Container needs a host kernel and runtime. If you're making a dedicated game appliance, the ISO approach works fine - simpler actually,
since you skip all the container orchestration machinery.
Thank you for this quite perfect blog post (short, interesting, well written). One subject I would be interested in is what are all the parameters a kernel accepts
Another cool way to show that 'the Linux kernel as "just a program"' is that you can also run the kernel as a regular binary without needing QEMU to emulate a full system:
Author here. It was a bit emotional seeing this on the front page.
My goal with this post and the whole (work in progress) series is to fill the gap between "here are the commands to do X" and "if you want to contribute to the kernel, you need to learn this" style books and tutorials.
I want something in between, for developers who just want a solid mental model of how Linux fits together.
The rough progression I have in mind is:
1. the Linux kernel as "just a program"
2. system calls as the kernel's API
3. files as resources manipulated through system calls, forming a consistent API
4. the filesystem hierarchy as a namespace system, not a direct map of disk layout
5. user/group IDs and permissions as the access control mechanism for resources (files)
6. processes, where all of the above comes together
I deliberately chose Go for the examples instead of C because I want this to be approachable to a broader audience of developers, while still being close enough to the OS to show what's really going on.
As a developer, this kind of understanding has been incredibly useful for me for writing better software, debugging complex issues with tools like strace and lsof, or the proc fs. I would like to help others to gain the same knowledge.
Can you also consider adapting Linux from scratch as a part of this series? Or Maybe after this series, you can expand what is learnt to build a minimal Linux distribution. I suppose that might give a good understanding on how to apply this knowledge and a have a foundation on the internals of the os itself.
I guess also one of the points of using Go was the fact it has own memory management for obtaining memory pages it interacts only with the kernel.
I mean, had you used C, it would be better to compile it statically, otherwise you'd need to put also glibc and ld.so and what else into the initrd, I guess
Another "interesting" related thing I found is that pid 1 signals are handled differently in the kernel. Basically, SIGTERM is ignored and you need to explicitly handle it in your program. Took me quite a while before I found out why my program in a container didn't quit gracefully...
Talos Linux [1], "the Kubernetes Operating System", is written in Go. That means it exactly works as the little demo here, where the Kernel hands over to a statically compiled Go code as init script.
Talos is really an interesting linux distribution because it has no classical user space, i.e. there is no such thing as a $PATH including /bin, /usr/bin, etc. The shell is instead a network API, following the kubernetes configuration-as-code paradigm. The linux host (node) is supposed to run containerized applications. If you really want to, you can use a special container to get access to the actual user space from the node.
I also use Talos, but I wonder if just using systemd for the init process wouldn't have been easier. You can interface with systemd in go quite easily anyways...
s6 (perhaps with s6-rc) is another interesting option. One could say it’s less opinionated than systemd. Or perhaps it’s more correct to say it has another set of opinions.
> When the kernel starts it does not have all of the parts loaded that are needed to access the disks in the computer, so it needs a filesystem loaded into the memory called initramfs (Initial RAM filesystem).
The kernel might not have all the parts needed to mount the filesystem, especially on a modern Linux distro that supports a wide variety of hardware.
Initramfs exists so that parts of the boot logic can be handled in userspace. Part of this includes deciding which device drivers to load as kernel modules, using something like udev.
Another part is deciding which root filesystem to mount. The root FS might be on an LVM volume that needs to be configured with device-mapper, or unlocked with decrypt. Or it might be mounted over a network, which in turn requires IP configuration and authentication. You don't want the kernel to have those mechanisms hard-coded, so initramfs allows handling them in userspace.
But strictly speaking, you don't need any of that for a minimal system. You can boot without initramfs at all, as long as no special userspace setup is required. i.e., the root FS is a plain old disk partition specified on the kernel command line, and the correct drivers (e.g. for a SCSI/SATA hard drive) are already linked into the kernel.
This was 20yrs ago. Gentoo was really a great teacher.
I have had a bit of a dream of building a full desktop operating system around seL4 [1], with all drivers in user space and the guts fully verified in Isabelle, but learning about this level of code kind of feels like drinking from a firehose. I would like to port over something like xserver and XFCE and go from there, but I've never made a proper attempt because of how overwhelming it feels.
[1] I know about sculpt and Genode, and while those are interesting, not quite what I want.
Play a bit with user mode linux [1] the kernel becomes literally a linux program, that I believe you can even debug with gdb (hazy memory as I tried uml last time maybe a decade ago)
In theory you can also attach gdb to qemu running linux, but that's more complicated.
[1] https://en.wikipedia.org/wiki/User-mode_Linux
The fact that it's also a product/program is some brainfucky exercise that might either be an interesting hobby thought experiment OR it might be a very relevant nuance that will be useful to the top 0.1% of professionals who need a 99.9% accuracy, like the difference between classical and relativistic mechanics.
I mean, sure you are right that kernels are programs and that money is a product, and that gravity is not a force. But I am a mere mortal and I will stick to my incorrect and incomplete mental model at a small expense of accuracy to the great advantage of being comprehensible.
https://gokrazy.org/
That is: this seemed like the first 3 minutes of the first lecture on an freshman OS course, or similar in any book on systems. The complication you refer to - is it just from the clutter of adjacent words (EFI, grub, kmod maybe?)
One thing that could be improved is that the author could break down some of the commands, and explain what their arguments mean. For example:
> mknod rootfs/dev/console c 5 1
Depending on the reader's background, the args 'c', '5', and '1' can look arbitrary and not mean much. Of course, we can just look those up, and it doesn't make the article worse.
There is also "b" for block device (e.g. a disk, a partition, or even something like a loopback device) and "p" for FIFOs (similar to mkfifo).
The two numbers are just identifiers to specify the device, so in case of `5 1` it means the systems tty console, while `1 8` would mean "blocking random byte device" (mknod dev/random c 1 8)
1. /sbin/init
2. /etc/init
3. /bin/init
4. /bin/sh
It dropping you into a shell is a pretty neat little way to allow recovery if you somehow really borked your init
> Bailing out, you are on your own. Good luck.
https://unix.stackexchange.com/questions/96720
The kernel has a different error message: "No working init found. Try passing init= option to kernel."[2]
1: https://github.com/archlinux/mkinitcpio/blob/2dc9e12814aafcc... 2: https://github.com/torvalds/linux/blob/d358e5254674b70f34c84...
Then at some point the other week I realized I could technically have a working Linux "system" with nothing more than a kernel and a dirt simple hello world program in /sbin/init.
I haven't had the time or inclination to scratch that itch but it's nice to see this article confirm it.
the author's apparent epiphany is realizing that init is just a program. the kernel is, of course, software as well, but it does injustice to both "program" and "kernel" to lump them together.
I kind of agree, but the kernel as a program serves a pedagogical framing here.
The goal of the post is to make it more tangible for developers, they write programs that are files on the disk, and you can interact with them. That's where the analogy came from.
https://docs.google.com/presentation/d/1rAAyOTCsB8GLbMgI0CAb...
https://www.youtube.com/watch?v=69Zy77O-BUM
Thank you. I have always wondered that.
And updated domain: https://mustafaakin.dev/posts/2016-02-08-writing-my-own-init...
Sound blaster(16) also came close to being standard enough that games could just support that.
Extrapolating I think MS-DOS was on a nice trajectory to having complete enough (and reasonably simple and non-bloated!) APIs for everything important, when it was killed off. Late MS-DOS 32-bit games were usually trivial to install and run.
Completely agree with this framing. We will get there by the end of the series.
This is why Linux is excellent. Users of other operating systems often remind people to update their device drivers. A non-technical Linux responds asking what the heck device drivers are. To the casual user, device drivers become invisible because they work exactly as intended.
Purely academic.
Containers work similarly, except that they don't take the whole system down when their PID 1 exits. That's why containers often don't have a process manager inside, but Linux based operating systems do.
Do anything necessary from there to boot your game, and record those steps in a script. When that's done you can just point your init cmdline to that script (doesn't even have to be a binary, a script with the #!/bin/bash shebang should just work).
https://docs.kernel.org/admin-guide/kernel-parameters.html
- https://www.kernel.org/doc/html/v5.9/virt/uml/user_mode_linu...
I’d love a similarly styled part two that dives into making a slightly useful distro from “scratch” in go.
maybe the audience is people who've never heard of init or thought about kernel vs userspace.
My goal with this post and the whole (work in progress) series is to fill the gap between "here are the commands to do X" and "if you want to contribute to the kernel, you need to learn this" style books and tutorials.
I want something in between, for developers who just want a solid mental model of how Linux fits together.
The rough progression I have in mind is:
1. the Linux kernel as "just a program"
2. system calls as the kernel's API
3. files as resources manipulated through system calls, forming a consistent API
4. the filesystem hierarchy as a namespace system, not a direct map of disk layout
5. user/group IDs and permissions as the access control mechanism for resources (files)
6. processes, where all of the above comes together
I deliberately chose Go for the examples instead of C because I want this to be approachable to a broader audience of developers, while still being close enough to the OS to show what's really going on.
As a developer, this kind of understanding has been incredibly useful for me for writing better software, debugging complex issues with tools like strace and lsof, or the proc fs. I would like to help others to gain the same knowledge.
That said, this series will also give you practical, applicable knowledge as we progress.
I guess also one of the points of using Go was the fact it has own memory management for obtaining memory pages it interacts only with the kernel.
I mean, had you used C, it would be better to compile it statically, otherwise you'd need to put also glibc and ld.so and what else into the initrd, I guess
https://raby.sh/sigterm-and-pid-1-why-does-a-container-linge...
Talos is really an interesting linux distribution because it has no classical user space, i.e. there is no such thing as a $PATH including /bin, /usr/bin, etc. The shell is instead a network API, following the kubernetes configuration-as-code paradigm. The linux host (node) is supposed to run containerized applications. If you really want to, you can use a special container to get access to the actual user space from the node.
[1] https://www.talos.dev/ [2] https://github.com/siderolabs/talos/releases/tag/v1.11.5
Go seemed a perfect fit, it is easy to pick up the syntax and see what is going on, but you can still be close to the OS.
The only reason is because I like to be explicit, and I could not know what was set before in the user's environment.
From https://news.ycombinator.com/item?id=41270425 re: "MiniBox, ultra small busybox without uncommon options":
> There's a pypi:SystemdUnitParser.
> docker-systemctl-replacement > systemctl3.py parses and schedules processes defined in systemd unit files: https://github.com/gdraheim/docker-systemctl-replacement/blo...
From a container2wasm issue about linux-wasm the other day: https://github.com/container2wasm/container2wasm/issues/550#... :
> [ uutils/uucore, uutils/coreutils, uutils/procps, uutils/util-linux, findutils, diffutils, toybox (C), rustybox, ]