5 Jun 2014

From pre-history to beyond the global thermonuclear war

This is a short and vague glimpse to the interfaces that the Linux kernel offers to user space for display and graphics management, from the history to what is hot and new, to what might perhaps be coming after. The topic came current for me when I started preparing Weston for global thermonuclear war.

The pre-history


In the age of dragons, kernel mode setting did not exist. There was only user space mode setting, where the job of the kernel driver (if any) was simply to give user space direct access to the graphics card registers. A user space driver (well, Xorg video DDX, really, err... or what it was at the time of XFree86) would then poke the card registers to set a mode. The kernel had no idea of anything.

The kernel DRM infrastructure was started as an out-of-tree kernel module for cooperating between multiple programs wanting to access the graphics card's resources. Later it was (partially?) merged into the kernel tree (the year is a lie, 2.3.18 came out in 1999), and much much later it was finally deleted from the libdrm repository.

The middle age


For some time, the kernel DRM existed alongside user space mode setting. It was a dark time full of crazy hacks to keep it all together with duct tape, barbwire and luck. GPUs and hardware accelerated OpenGL started to come up.

The new age


With the invent of kernel mode setting (KMS), the DRM kernel drivers got in charge of the graphics card resources: outputs, video modes, memory allocations, hotplug! User space mode setting became obsolete and was eventually killed. The kernel driver was finally actually in control of the graphics hardware.

KMS probably started with just setting the main framebuffer (primary plane) for each "CRTC" and programming the video mode. A CRTC is for "cathode-ray tube controller", but essentially means a block that reads memory (a framebuffer) and produces a bitstream according to video mode timings. The bitstream is directed into an "encoder", which turns it into a proper physical/analogue signal, like VGA or digital DVI. The signal then exits the graphics card though a "connector". CRTC, encoder, and connector are the basic concepts in KMS API. Quite often these can be combined in some restricted ways, like a single CRTC feeding two encoders for clone mode.

Even ancient hardware supported hardware cursors: a small sprite that was composited into the outgoing video signal on the fly, which meant that it was very cheap to move around. Cursor being so special, and often with funny color format (alpha!), got its very own DRM ioctl.

There were also hardware overlays (additional or secondary planes) on some hardware. While the primary framebuffer covers the whole display, an overlay is another buffer (just like the cursor) that gets mixed into the bitstream at the CRTC level. It is like basic compositing done on the scanout hardware level. Overlays usually had additional benefits, for example they could apply scaling or color space conversion (hello, video players) very efficiently. Overlays being different, they too got their very own DRM ioctls.

The KMS user space ABI was anything but atomic. With the X11 tradition, it wasn't too important how to update the displays, as long as the end result eventually was what you wanted. Race conditions in content updates didn't matter too much either, as X was racy as hell anyway. You update the CRTC. Then you update each overlay. You might update the cursor, too. By luck, all these updates could hit the same vblank. Or not. Or you don't hit vblank at all, and get tearing. No big deal, as X was essentially all about front-buffer rendering anyway. (And then there were huge efforts in trying to fix it all up with X, GLX, Mesa and GL-compositors, and avoid tearing, and it ended up complicated.)

With the advent of X compositing managers, that did not play well with the  awkward X11 protocol (Xv) or the hardware overlays, and with rise of the  GPU power and OpenGL, it was thought that hardware overlays would  eventually die out. Turned out the benefits of hardware overlays were too great to abandon, and with Wayland we again have a decent chance to make the most of them while still enjoying compositing.

The global thermonuclear war (named after a git branch by Rob Clark)


The quality of display updates became important. People do not like tearing. Someone actually wanted to update the primary framebuffer and the overlays on the same vblank, guaranteed. And the cursor as the cherry on top.

We needed one ABI to rule them all.

Universal planes brings framebuffers (primary planes), overlays (secondary planes) and cursors (cursor planes) together under the same API. No more type specific ioctls, but common ioctls shared by them all. As these objects are still somewhat different, overlays having wildly differing features and vendors wanting to expose their own stuff, object properties were invented.

An object property is essentially a {key, value} pair. In the API, the name of a key is a string. Each object has its own set of keys. To use a key, you must know it by name, fetch the handle, and then use the handle when setting the value. Handles seem to be per-object, so make sure to fetch them separately for each.

Atomic mode setting and nuclear pageflip are two sides of the same feature. Atomicity is achieved by gathering a set of property changes, and then pushing them all into the kernel in a single ioctl call. Then that call either succeeds or fails as a whole. Libdrm offers a drmModePropertySet for gathering the changes. Everything is exposed as properties: the attached FB, overlay position, video mode, etc.

Atomic mode setting means setting the output modes of a single graphics device, more or less. Devices may have hard to express limitations. A simple example is the available scanout memory bandwidth: You can drive either two mid-resolution outputs, or one high-resolution output. Or maybe some crtc-encoder-connector combination is not possible with a particular other combination for another output. Collecting the video mode, encoder and connector setup over the whole grahics card into a single operation avoids flicker. Either the whole set succeeds, or it fails. Without atomic mode setting, changing multiple outputs would not only take longer, but if some step failed, you'd have to undo all earlier steps (and hope the undo steps don't fail). Plus, there would be no way to easily test if a certain combination is possible. Atomic mode setting fixes all this.

Nuclear pageflip is about synchronizing the update of a single output (monitor) and making that atomic. This means that when user space wants to update the primary framebuffer, move the cursor, and update a couple of overlays, all those changes happen at the same vblank. Again it all either succeeds or fails. "Every frame is perfect."

And then there shall be ponies (at the end of the rainbow)


Once the global thermonuclear war is over, we have the perfect ABI for driving display updates.

Well, almost. Enter NVidia G-Sync, or AMD's FreeSync which is actually backed by a VESA standard. Dynamically variable refresh rate. We have no way yet for timing display updates in DRM. All we can do is kick out a display update, and it will hopefully land on the next vblank, whenever that is. But we can't tell the DRM when we would like it to be. Everything so far assumes, that the display refresh rate is a constant, apart from an explicit mode switch. Though I have heard that e.g. Chrome for Intel (i915, LVDS/eDP reclocking) has some hacks that opportunistically drops the refresh rate to save power.

There is also a culprit in the DRM of today (Jun 3rd, 2014). You can schedule a pageflip, but if you have pending rendering on that framebuffer for the same GPU as were you are presenting it, the pageflip will not happen until the rendering completes. And you do not know when it will complete, which means you do not know if you will hit the very next vblank or something later.

If the rendering GPU is not the same graphics device that presents the framebuffer, you do not get synchronization at all. That means that you may be scanning out an incomplete rendering for a frame or two, or you have to stall the GPU to make sure it is done before scheduling the page flip. This should be fixed with the fences related to dma-bufs (Hi, Maarten Lankhorst).

And so the unicorn keeps on running.

7 comments:

Ismael said...

This was very educational, thank you.
I have a little question: how G-Sync/FreeSync work? Can the CRTC just push frames to the monitor at any time, or it's just a seamless transition between 30/45/60 Hz?

R.S. said...

Unfortunately (or not) since the latest x.org versions I have become quite an "expert" in V-Sync / G-Sync / V-blank and all of that jazz, simply because my lovely and expensive GTX 650 doesn't want to play nice with X / Gnome / KDE / XFCE / or anything remotely related to GL, and so I had to choose between living in the present (Ubuntu 14 / Gnome 3.12 / Rolling Releases) and have a bunch of tearing on everything that moves on everywhere and anywhere on my screen, or just do what I ended up doing, to live in the past (Ubuntu 12) and be able to see things on the screen move flawlessly, like it's 2014!.... : (
... well, 2015 it's just around the corner, so who knows? maybe then I'll be able to go to the present then!
... So... maybe....yes!.. a Global Thermonuclear War is really necessary to solve all tearing in the World (of Linux, at least) !!!!
... What are they waiting for ???? FIRE!!!!!

pq said...

Ismael, I haven't really looked deeper into G-Sync nor FreeSync or the VESA standard it relies on, so I can't really tell yet, but at least from Nvidia's marketing info I got the impression that you could decide per-frame when to start scanning it out, as long as you keep within a certain frequency range.

Anonymous said...

You left quite a large portion of modesetting history out. Absolutely nothing from before Q3 2008 for instance. Not a mention as to how we got from "It's all in the BIOS, and it is impossible to do differently" to where the term modesetting came from.

Anonymous said...

This is illuminating! Thank you for taking the time to write this insightful blog.

pq said...

libv, yes, that was intentional, as I knew nothing about that time. "Short and vague", just like I wrote. ;-)

Anonymous said...

Pq: well, it is rather important in this respect. There would've been a full property based system in 2006 if things hadn't been shot down (http://libv.livejournal.com/13443.html). RandR1.2 only gained "attributes" later on, when keithp first encountered a tv encoder. The initial kms header (and it was just a header) was nothing more than the short sighted RandR1.2 thrown into the kernel.