ADR-014: Compositor ↔ Scanout-Driver Protocol

Accepted

2026-04-15

Status: Accepted
Date: 2026-04-15
Depends on: ADR-005 (IPC channels), ADR-011 (Graphics architecture)
Related: ADR-012 (input flow into compositor; same modularity pattern, different transport).
Supersedes: N/A

Problem

ADR-011 specifies the layered graphics stack — compositor in the middle, gpu-driver below, clients above — and names the seam between compositor and gpu-driver as a “scanout channel per display.” It does not specify the protocol crossing that seam: what messages flow, who allocates the scanout buffer, how display capabilities are advertised, what happens on hotplug or mode change, how fallback to a non-GPU backend works, how the compositor remains agnostic to which scanout backend is bound.

The pressure to specify it now is two-pronged. First, the compositor implementation is about to start (Phase GUI-3, this session). It will talk this protocol on day one whether or not the protocol is written down. If it isn’t written down, the protocol is whatever the compositor’s first IPC happens to look like, and the future virtio-gpu / Intel-UHD / Limine-FB backends will have to retrofit themselves to that accident. Second, the user is about to start a parallel build effort targeting real hardware (Dell Precision 3630, Intel UHD). For that effort to slot into the compositor without cross-contamination — without the compositor needing to know it’s talking to Intel — the protocol has to exist as a contract before either side codes against the other.

This ADR writes the contract. The compositor will speak it; every scanout backend (user/scanout-virtio-gpu, user/scanout-intel, user/scanout-limine, future) will implement it. The compositor links no backend; the scanout-driver knows nothing about windows. The seam is the protocol.

Module Boundary

The split, restated in load-bearing form:

┌────────────────────────────────────────────────────────────┐
│ user/compositor                                             │
│  Knows: client surfaces, window state, focus, z-order,     │
│         damage tracking, input routing, scanout protocol.  │
│  Doesn't know: framebuffer addresses, GPU registers,       │
│         hardware-specific buffer formats, vsync mechanics, │
│         which scanout backend is bound today.              │
└────────────────────────────────────────────────────────────┘
                          ↕ scanout protocol (this ADR)
┌────────────────────────────────────────────────────────────┐
│ user/scanout-<name>     One per backend. Examples:          │
│  • user/scanout-virtio-gpu  (QEMU, dev/CI)                  │
│  • user/scanout-intel       (Dell 3630, bare metal)         │
│  • user/scanout-limine      (linear-FB fallback)            │
│  Knows: hardware MMIO, GPU command submission, scanout     │
│         buffer alignment/tiling, vsync, hotplug, EDID,     │
│         scanout protocol.                                   │
│  Doesn't know: window state, clients, focus, damage policy,│
│         compositor's internal data structures.             │
└────────────────────────────────────────────────────────────┘
                          ↕ MapMmio / AllocDma / WaitIrq /
                            SYS_MAP_FRAMEBUFFER (fallback only)
┌────────────────────────────────────────────────────────────┐
│ KERNEL                                                      │
└────────────────────────────────────────────────────────────┘

The compositor links exactly zero backend code. Backend selection is a runtime probe. There is no if cfg!(target = "intel") { ... } in the compositor. Cross-contamination between modules is a bug, not a tradeoff.

Decision

Connection topology

At most one compositor process per system. Singleton. Identified by Principal at boot (similar to POLICY_SERVICE_PID for the policy service).
At most one scanout-driver process per system. The compositor pairs with exactly one. Multiple physical displays are surfaced by the same driver, not by multiple drivers. Multi-GPU topologies (integrated + discrete, dual-GPU workstation) are deferred to a future ADR — see Open Questions § “Two GPUs in one machine (multi-driver topology)” for the deferred design discussion. Until that ADR exists, the singleton-driver assumption is load-bearing for the protocol; revisit when a real multi-GPU target appears.
Pairing is at startup, not per-frame. Driver registers; compositor binds; from then on the pair is fixed for the boot.
Both sides are signed boot modules holding bound Principals. Pairing handshake verifies Principals against each other (driver’s Principal is in compositor’s trusted-driver list and vice versa) — concrete authority list lives in the policy service eventually; for v0 the trust list is compiled into both processes.

Two transports per the ADR-005 control + bulk pattern

Following ADR-005:

Control IPC (256-byte messages, capability-checked, identity-stamped) for protocol messages: handshake, capability advertisement, frame-ready notifications, hotplug events, mode changes, frame-displayed acknowledgments. Low frequency (handshake once, hotplug rarely, frame-ready ~120/sec per display).
Shared-memory channels (ADR-005 § Data Channels ) for scanout buffers themselves. One channel per active display. The bytes never go through the kernel after channel attach.

Endpoints

SCANOUT_DRIVER_ENDPOINT — fixed endpoint number (proposed 27, first free after the ADR-005 channel syscalls). The active scanout-driver registers this endpoint at boot. The compositor sends control messages here.
COMPOSITOR_ENDPOINT — fixed endpoint number (proposed 28). The compositor registers this. The scanout-driver sends async events (hotplug, frame-displayed) here.

These join the existing well-known endpoint table (16 = fs-service, 17 = key-store, etc.). Both numbers documented in the ADR-005 sense — they are part of the OS interface, not internal to either service.

Scanout buffer ownership

The driver allocates, the compositor writes. Reasoning: the driver knows hardware constraints (DMA alignment, GPU-visible memory, tile ordering for some GPUs, NO_CACHE attributes for linear FBs). The compositor knows nothing about hardware. Putting allocation on the driver keeps the compositor hardware-agnostic.

Mechanism:

On display connect (or at handshake for displays already connected), the driver creates a shared-memory channel (SYS_CHANNEL_CREATE) sized for that display’s full scanout (pitch × height bytes), with peer_principal = compositor's Principal, role = Producer-from-compositor's-perspective — meaning the compositor side is RW, the driver side is RO. The driver gets the compositor-facing ChannelId.
Driver sends ScanoutBufferAllocated { display_id, channel_id, geometry, format } to compositor over control IPC.
Compositor attaches (SYS_CHANNEL_ATTACH(channel_id)) and writes pixels into the mapped region.

The scanout buffer is single-buffered in this protocol. The compositor’s own back-buffer-as-mirror pattern (per ADR-011) lives entirely on the compositor side. The driver sees one buffer per display; what flows in is whatever the compositor put there at the moment of FrameReady. If a backend wants double-buffering for tear-free presentation, it does that internally (e.g., an Intel driver might allocate two scanout planes and flip between them; the compositor never sees that). This keeps the protocol minimal and simplifies fallback backends.

Display capability advertisement

At handshake (and on hotplug), the driver enumerates displays and announces each:

struct DisplayInfo {              // packs into the 256-byte control IPC
    display_id: u32,              // stable per physical port for this boot
    state: DisplayState,          // Connected | Disconnected
    physical_geometry: Geometry,  // current mode: width, height, pitch, bpp
    backing_scale: u16,           // 1×=100, 2×=200, fractional×=125, etc.
    refresh_hz: u16,
    pixel_format: PixelFormat,    // XRGB8888 | BGRA8888 | ARGB2_10_10_10 (HDR) | ...
    capabilities: u32,            // bitfield: HDR_HDR10, VRR, partial_update, ...
    edid_hash: [u8; 32],          // Blake3 of full EDID, for identity/fingerprinting
}

Mode lists (alternative resolutions / refresh rates per display) are larger than 256 bytes and are queried on demand via a separate request (QueryDisplayModes { display_id } → response carries mode count, then a paginated mode walk — full design in implementation; the ADR commits to “modes are queryable, not pushed”).

Frame lifecycle

compositor                            scanout-driver
    |                                       |
    |--- write pixels into scanout ch ----->|  (no kernel involvement;
    |    (direct memory, MMU enforces)      |   see ADR-005)
    |                                       |
    |--- FrameReady{display, damage[]} ---->|  (control IPC, 256B)
    |                                       |
    |                                       | program hardware to scan/flip;
    |                                       | wait for vsync/completion IRQ
    |                                       |
    |<-- FrameDisplayed{display, time_ns} --|  (control IPC, 256B)
    |                                       |

damage is a small list of dirty rectangles (≤ 16 rects per message; if more, compositor sends “full surface dirty”). Drivers may use the damage list to optimize (partial scanout, region-of-interest copy) or ignore it (full-frame flip). Either is conformant — damage is a hint, not a constraint.

FrameDisplayed carries the wall-time of presentation in tick units (kernel GetTime units). Compositor uses this for animation timing and vsync alignment without needing direct hardware access.

The compositor SHOULD wait for FrameDisplayed on at least one display before sending the next FrameReady for the same display (back-pressure). Drivers MAY drop FrameReadys that arrive while a previous frame is still in flight (returns FrameDropped { display, reason } instead of acking) — compositor uses this signal to slow its render loop.

Hotplug + mode change events

Driver-initiated, sent to COMPOSITOR_ENDPOINT:

DisplayConnected { display_id, info: DisplayInfo, scanout_channel_id } — new display attached and a scanout buffer is already allocated and ready for compositor to attach. Compositor responds by SYS_CHANNEL_ATTACH-ing and beginning to render to it.
DisplayDisconnected { display_id } — physical disconnect or driver-side teardown. Compositor stops rendering to it and detaches the channel. Driver closes the channel after the compositor detaches.
DisplayModeChanged { display_id, new_info: DisplayInfo, new_scanout_channel_id } — user changed resolution, new HDR mode negotiated, etc. New buffer is allocated; compositor switches over and detaches the old.

Compositor-initiated, sent to SCANOUT_DRIVER_ENDPOINT:

RequestModeChange { display_id, requested_mode } — user setting a new resolution from the OS UI. Driver responds with DisplayModeChanged (success) or ModeRejected { reason } (failure).

Capability model

New CapabilityKind (lands when this protocol is implemented, not now):

ScanoutDriverRegister — authorizes a process to claim the singleton scanout-driver role. Granted at boot to the compiled-in scanout driver service.
CompositorRegister — authorizes a process to claim the singleton compositor role. Granted at boot to user/compositor.
MapFramebuffer (already exists, ADR-011) — required by the fallback scanout-limine driver, not by the compositor.
MapMmio / AllocDma / WaitIrq / LegacyPortIo — required by hardware-touching scanout drivers (virtio-gpu, intel) per their access needs. Compositor needs none of these.

The compositor’s complete kernel-syscall surface is: RegisterEndpoint, RecvMsg/Write (IPC), ChannelCreate/Attach/Close (for client surface buffers), Yield, GetTime, Print. No hardware access, no MMIO, no DMA. This is a load-bearing property — if the compositor ever needs MapMmio or MapFramebuffer, the modular boundary has been violated.

Trait abstraction in the compositor (userspace dyn dispatch)

The compositor’s internal API for talking to whichever scanout-driver is bound:

trait ScanoutBackend {
    fn enumerate_displays(&self) -> &[DisplayInfo];
    fn attach_scanout(&mut self, display_id: u32) -> Result<ScanoutBuffer, ScanoutError>;
    fn submit_frame(&mut self, display_id: u32, damage: &[Rect]) -> Result<(), ScanoutError>;
    fn poll_event(&mut self) -> Option<ScanoutEvent>;
    fn request_mode(&mut self, display_id: u32, mode: Mode) -> Result<(), ScanoutError>;
}

// At compositor startup, after probing which scanout-driver registered:
let backend: Box<dyn ScanoutBackend> = match probe_scanout_backend()? {
    Backend::VirtioGpu => Box::new(VirtioGpuBackend::attach()?),
    Backend::Intel     => Box::new(IntelGpuBackend::attach()?),
    Backend::Limine    => Box::new(LimineFbBackend::attach()?),
};

The dyn dispatch is intra-process, in the compositor only. Each Box<dyn ScanoutBackend> is a thin wrapper over an IPC client to the scanout-driver service; the actual rendering work happens in the scanout-driver process. This is userspace dyn dispatch (Box in user/compositor) — explicitly allowed under CLAUDE.md’s verification stance because:

Verification scope is the kernel, not userspace. The “no trait objects in kernel hot paths” rule in CLAUDE.md applies to src/. User-space services are programs the kernel runs; they are subject to the capability model and audit, not formal verification.
The cost is invisible. A vtable indirect call is a few cycles. Each submit_frame call it dispatches is an IPC roundtrip (thousands of cycles) plus hardware programming (more). The dispatch overhead is rounding error.
The choice is genuinely runtime. Compositor probes hardware at startup and binds the right backend. Generic monomorphization would require shipping multiple compositor binaries or compile-time backend selection — neither acceptable when the dev environment includes QEMU, AArch64, and bare-metal Dell concurrently.

The scanout-driver service itself is monomorphic — it talks to one specific hardware family with concrete types throughout.

Fallback rules + non-local-display backends

No backend baked into the compositor. If no scanout-driver registers within SCANOUT_DRIVER_HANDSHAKE_TIMEOUT (5 seconds at 100Hz tick) of compositor startup, the compositor logs an error and enters Headless mode — accepts client connections, composes to memory, never displays. This is a real mode (useful for screen sharing / capture without a local display) and the cleanest semantics for “no scanout available.”

The Limine fallback is its own service: user/scanout-limine. It uses SYS_MAP_FRAMEBUFFER to map the linear framebuffer Limine provided, and copies compositor-written scanout buffer regions into it on FrameReady. No hardware, no DMA, no IRQs — just memory copies. Good for QEMU dev sessions and as a sanity backend when bringing up new hardware.

Non-local-display backends fit the same protocol. The compositor doesn’t know what a “display” is physically — it talks to a scanout-driver advertising display geometries. Any service that can fulfill that contract is a valid backend:

Remote display (RDP/VNC/SPICE/Looking-Glass-style) — user/scanout-remote would advertise virtual displays over the wire and ship pixels over a network channel rather than a local PCIe bus. Compositor sees ordinary displays; the wire is somebody else’s problem. This means remote access lands as a backend, not as a special compositor mode — the modular boundary holds.
KVM switch / monitor lid close / display sleep — these are hotplug events. When the user flips the KVM, the now-disconnected display becomes DisplayDisconnected; on the other side it becomes DisplayConnected. Lid close on a laptop is the same shape: a DisplayDisconnected { display_id = lid_panel } event from the driver. Compositor reacts identically to physical unplug. No protocol extension needed.
Headless server with on-demand attach — boot in Headless mode, accept a remote-display backend connection later, suddenly the compositor has displays. Same DisplayConnected flow as a hotplug. The 5-second handshake timeout governs when Headless mode is entered, not whether the compositor can leave it later.

The pattern: any change in display availability — physical, virtual, remote, switched — is a DisplayConnected / DisplayDisconnected event from the active scanout-driver. The compositor has one event-handling code path for all of them.

What this protocol does not cover yet: display power state (active vs. DPMS-suspended vs. off-but-still-attached). For v0 we model power-off as disconnect + reconnect — coarse but works. A finer-grained DisplayPowerState event lands when a real workload demands it (laptop suspend/resume, energy-saving display blanking). Listed in Open Questions.

Wire encoding

All control-IPC messages carry a 4-byte tag at offset 0 indicating message type, followed by a packed payload. Layouts are #[repr(C)] little-endian, designed to fit in 256 bytes with room for protocol evolution. Detailed binary layouts go in the implementation, not this ADR — the contract is “what messages exist and what they mean,” not “what bit goes where” (which would calcify too early).

Reserved bounds (lands with implementation, listed here for visibility)

MAX_DISPLAYS_PER_DRIVER = 8 (matches MAX_FRAMEBUFFERS in src/boot/mod.rs )
MAX_DAMAGE_RECTS_PER_FRAME = 16 (fits in 256-byte control message; above this the compositor sends “full surface dirty”)
SCANOUT_DRIVER_HANDSHAKE_TIMEOUT = 500 ticks (5 seconds at 100Hz tick) — after which compositor enters Headless mode.

All three will get full SCAFFOLDING tags + ASSUMPTIONS.md rows when they enter code.

Rationale

Why a separate ADR rather than appending to ADR-011. ADR-011 specifies the stack (what processes exist, what each one’s job is). This ADR specifies the protocol (the wire contract between two of those processes). Different lifetime — ADR-011’s stack design is settled-ish; the protocol will evolve with each new backend. Better to keep the protocol’s evolution in its own divergence record than to muddy ADR-011’s settled design with iterative protocol notes.

Why driver-allocates rather than compositor-allocates the scanout buffer. Hardware constraints dominate: GPU memory may need to be DMA-aligned, tiled, in specific physical address ranges, marked uncacheable. The compositor knows none of that. Putting allocation on the driver keeps the compositor’s syscall surface to the IPC + channel primitives — no AllocDma, no MapMmio. The driver is the only side that ever touches the hardware allocator.

Why singleton driver + singleton compositor. Two compositors fighting over input focus and z-order would be incoherent. Two scanout drivers competing for the same display would be a configuration bug. Singleton-by-Principal is the cleanest enforcement — only one process holds the registration capability — and matches the existing pattern (one policy service, one fs-service, one key-store).

Why control IPC + shared-memory channels rather than one or the other. ADR-005’s pattern: the kernel mediates policy (capability checks per IPC send), not bytes. Frame data is bytes (megabytes per frame at 4K) — must not go through kernel-mediated copies. Frame metadata is policy (which display, what damage, when displayed) — small, structured, identity-stamped, capability-checked, exactly the control IPC’s job. Splitting them is the same call ADR-005 already made for video; this ADR just applies the call.

Why pixels never go through the kernel after channel attach. This is the load-bearing performance property. A 4K @ 120Hz scanout is 32 MiB × 120 = 3.8 GiB/sec per display. Three displays = 11.5 GiB/sec. Kernel-mediated copies cannot scale to that. MMU-enforced shared memory does — at full memory bandwidth, with no per-byte kernel involvement. The capability check at channel-attach time is the single per-channel security decision; everything after is hardware-enforced.

Why ride on the existing ADR-005 channels rather than invent new IPC primitives. Channels already do exactly what we need: capability-gated, MMU-enforced shared memory between two named Principals, with role-based access (Producer/Consumer/Bidirectional). Inventing a “graphics channel” alongside the existing channel would be duplicate design surface for no benefit. The graphics use case is one of the workloads ADR-005’s MAX_CHANNEL_PAGES = 65536 (256 MiB) was sized for — see ADR-011 § Numeric bounds raised .

Phased implementation

Phase	What lands	Prerequisites
Scanout-0 (this ADR)	Protocol contract written. No code.	—
Scanout-1	`user/compositor` scaffold: process, libsys boot module, `ScanoutBackend` trait, no-op render loop, `Headless` mode reachable. No real backend.	This ADR.
Scanout-2	`user/scanout-limine` — simplest backend. Maps Limine framebuffer, copies compositor’s scanout to it on `FrameReady`. Validates the protocol end-to-end if the `SYS_MAP_FRAMEBUFFER` stall (STATUS.md known issue) is fixed first.	Scanout-1; SYS_MAP_FRAMEBUFFER stall resolution.
Scanout-3	First GUI client: simple “hello-window” boot module that opens a window and draws a colored rectangle. End-to-end validation.	Scanout-2.
Scanout-4	`user/scanout-virtio-gpu` — first hardware-accelerated backend. Validates the protocol against a real (well, emulated) GPU.	Scanout-3 + virtio-gpu spec implementation work.
Scanout-5	`user/scanout-intel` — Dell 3630 bare metal target. The user’s parallel build effort lands here; the compositor accepts it without modification because the protocol contract held.	Scanout-4 (architecture proven) + Intel UHD driver work (substantial).

Scanout-1 is what the compositor scaffold lands now. The rest follow.

Open Questions

Two GPUs in one machine (multi-driver topology)

Future workstations may have an integrated GPU + discrete GPU, each driving different displays. Today’s “one scanout-driver per system” rule rules this out. Options when the time comes: (a) Allow multiple scanout-driver registrations, each owning a disjoint display set, with the compositor multiplexing across them; (b) A “scanout-driver multiplexer” service that wraps multiple physical drivers behind one protocol endpoint. Lean toward (b) — keeps the compositor protocol unchanged. Out of scope until a real multi-GPU target appears.

Per-display scanout vs. unified scanout

This ADR assumes one scanout buffer per display. An alternative (“unified scanout”) would have a single virtual desktop scanout that the driver crops/distributes per display. Unified scanout is simpler for the compositor (one buffer to write) but constrains the driver (display geometries must align). Per-display is what every modern OS does; sticking with it.

Color management / HDR pipeline

The protocol carries pixel_format and HDR capability bits, but doesn’t specify color-space conversion responsibility. Compositor in linear-light? Driver in display color-space? Open until first HDR backend lands.

Display power state (DPMS-equivalent)

v0 models display power as binary: connected = on, disconnected = off. Real systems have richer states — DPMS standby/suspend/off, laptop lid close (physical disconnect vs. logical sleep), variable-backlight, etc. A finer-grained DisplayPowerState event would let the compositor stop submitting frames to a sleeping display without losing the display’s state and capabilities. Lands when a battery-life or laptop-suspend workload demands it.

Surface forwarding for direct scanout

A future optimization: compositor tells driver “this client surface IS the scanout buffer for this display, no compositing needed” (e.g., fullscreen video, fullscreen game). Eliminates the compositor copy. Not in v0; design when a real workload demands it.

Divergence

2026-04-20 — Scanout-4.a / 4.b landed; modern virtio-pci transport plumbed through a new kernel syscall

What changed from the plan. The original ADR said the scanout-driver “knows hardware MMIO” and is otherwise free to discover its device however it wants. The 4.a implementation took a specific path worth pinning down so future modern-virtio drivers inherit it:

Kernel-side PCI cap parsing. Modern virtio-pci devices (device IDs 0x1040..=0x107F) advertise their register structures through vendor-specific capabilities (cap_vndr == 0x09, virtio spec §4.1.4). Parsing those caps in userspace would require exposing raw PCI config-space reads through the syscall surface — a broader privilege than scanout drivers should hold. Instead, pci::scan() now parses virtio-modern caps at boot and stashes the (BAR, offset, length) triples for common/notify/isr/device-cfg plus notify_off_multiplier on every PciDevice. The parser is a pure function on a 256-byte config-space snapshot, so host-side unit tests can exercise it without hardware.
SYS_VIRTIO_MODERN_CAPS = 38 as the boundary. A new identity-required (no capability-gated) syscall writes the parsed VirtioModernCaps struct to a user buffer. Landed with ADR-020 UserWriteSlice<'ctx> from day one — the first syscall built on the typed user-buffer slice. Cost to drivers: one syscall + one struct; versus the old “every driver walks PCI caps itself” path, the kernel owns the spec knowledge and the driver owns only its device’s register semantics.
Single-BAR simplifying assumption in scanout-virtio-gpu. QEMU’s virtio-gpu-pci packs all four cap structures into one BAR (BAR 4 on -device virtio-gpu-pci, BAR 2 on -vga virtio). 4.a’s transport layer refuses to init if any cap points at a different BAR (InitError::CapsSpanMultipleBars). Multi-BAR support is deferred; the observable trigger is “a real device splits the structures across BARs.” Both QEMU virtio-gpu shapes work because the driver reads caps.common_cfg.bar at runtime rather than hardcoding an index.
Frame path is double-copy in 4.b. The ADR said “driver allocates, compositor writes.” It did not commit to a single-copy path. 4.b’s implementation accepts a double copy (compositor → channel RAM pages → driver’s alloc_dma backing → virtio-gpu’s internal TRANSFER copy) because channel pages are RAM and RESOURCE_ATTACH_BACKING needs DMA-contiguous physical addresses. A zero-copy path requires a new kernel primitive — either a DMA-backed channel flag on SYS_CHANNEL_CREATE, or a share_dma_with_principal syscall. Both are non-trivial design decisions. Revisit when: compositor frametime exceeds the memcpy budget (measurable: ~240 MB/s per display at 4 MiB × 60 Hz, invisible on QEMU); OR the first real-hardware scanout driver port (Intel UHD, Scanout-5) where DMA-cache behavior makes the copy cost actually visible.
QEMU flag reality: -vga virtio replaces -vga std + -device virtio-gpu-pci. 4.a added -device virtio-gpu-pci as a secondary adapter alongside the default cirrus VGA. Visible output still came from the cirrus side (driven by scanout-limine). 4.b required the visible side to be virtio-gpu; -vga virtio is the single-adapter shape that makes virtio-vga (same device ID, same driver-side probe) the primary display so Cocoa/GTK show what our driver programs. Documented in Makefile run / run-gui comments.
scanout-limine coexistence resolved by boot-manifest swap, not runtime probe. The plan mentioned scanout-limine as a fallback but didn’t specify how the driver is chosen. 4.b removes scanout-limine.elf from limine.conf; the crate remains buildable (make scanout-limine) for future manual runs without virtio-gpu. Runtime probing (“if virtio-gpu not present, fall back to Limine FB”) is a 4.c concern when a real host actually needs it.
hello-window (Scanout-3) paints green, composited over compositor’s blue-green test frame. The ADR doesn’t commit to initial colors; this is implementation detail worth noting because the make run-gui observable is “blue-green canvas + green rectangle” (the compositor test frame uses cyan-ish background, hello-window is pure green).

What did NOT diverge. The ScanoutBackend trait shape in the compositor stayed exactly as specified. The control-IPC + shared-memory-channel split per ADR-005 stayed. Endpoint numbers (27, 28) stayed. Both sides register-by-Principal at boot. Single compositor, single scanout driver, pairing at startup — all as designed.

Phases still deferred, with triggers.

Scanout-4.c — damage-rect-aware partial TRANSFER/FLUSH; zero-copy frame path; multi-display; hotplug; EDID; RequestModeChange; aarch64/riscv64 scanout-virtio-gpu builds.
Scanout-5 (Intel UHD) — bare-metal Dell target. Untouched.
Remote / headless backends — still in the Open Questions section above; no work started.

2026-05-08 — Client ↔ compositor protocol formalized (libgui-proto)

What changed from the plan. ADR-014’s § Module Boundary names “client surfaces, window state, focus, z-order” as things the compositor knows but does not specify the wire protocol crossing the seam between GUI clients and the compositor. The C9-C12c chain (titlebar drag + focus border + cursor + edge-grab resize, all landed 2026-05-07 / 05-08) wrote that protocol into user/libgui-proto/; this appendix ratifies it inside ADR-014 rather than spinning a separate ADR. Promotion criterion below.

Endpoints.

COMPOSITOR_ENDPOINT = 28 — registered by the compositor at boot. Every libgui client sends control messages here. Already defined in ADR-014 § Endpoints; reused rather than duplicated.
COMPOSITOR_INPUT_ENDPOINT = 30 — registered by the compositor at boot specifically for input drivers (per ADR-012). Compositor decodes raw 96-byte InputEvents from this endpoint and forwards to the focused window via the InputEvent libgui-proto message below — the pump_input_once pipeline.

Both numbers are part of the OS interface (same sense as SCANOUT_DRIVER_ENDPOINT = 27) and live as pub const in libgui-proto.

Message tag convention. 16-bit tags split by direction:

0x3xxx — client → compositor (request).
0x4xxx — compositor → client (reply or async event).

Same shape as ADR-014’s existing 0x1xxx (compositor → scanout-driver) / 0x2xxx (scanout-driver → compositor) split, just on a different seam. Direction is enforced by handle_client_payload rejecting any 0x4xxx tag with InvalidMessage (peer can’t send a reply-direction message).

Messages (v1, ten total).

Tag	Name	Direction	Purpose
`0x3001`	`CreateWindow`	C→C	Allocate a window + surface channel. v2 layout (20 B): width, height, z_order, alpha_blend, reply_endpoint. v0 (12 B) + v1 (16 B) layouts still decode for legacy clients.
`0x3010`	`FrameReady`	C→C	Client just wrote pixels; up to 16 damage rects (≥17 sends “full surface dirty” by header `damage_count = 0`).
`0x3020`	`DestroyWindow`	C→C	Owner-Principal teardown of a specific window.
`0x3030`	`DragWindowBy`	C→C	Title-bar drag from libgui’s `decorate()` — moves the window’s `(x, y)` in scanout coords by `(dx, dy)`. No surface realloc.
`0x3031`	`RequestResize`	C→C	Client-initiated resize; compositor runs the two-phase teardown protocol (ADR-027) and replies with `WindowResized`.
`0x4001`	`WelcomeClient`	C→C	Compositor’s reply to `CreateWindow`: window_id, surface channel id, geometry, pixel format.
`0x4010`	`WindowClosed`	C→C	Compositor-initiated teardown notice (currently unused; reserved for forced-close, e.g., compositor shutdown).
`0x4020`	`ErrorResponse`	C→C	Per-request error (TooManyWindows, InvalidDimensions, NoSuchWindow, InvalidMessage, CompositorShuttingDown, SurfaceAllocFailed).
`0x4030`	`InputEvent`	C→C	Forwarded input event (96-byte libinput-proto envelope) for the focused window. PointerButton coords translated to window-local before forward.
`0x4040`	`WindowResized`	C→C	Reply to `RequestResize` and to compositor-initiated edge-grab commits: new channel id + new geometry. Client closes its old surface mapping (returns `InvalidState` because the old slot is `Revoking`; libgui swallows it) and `channel_attach`es the new one.

Detailed binary layouts are intentionally not in this ADR — same call ADR-014 makes for the scanout-driver protocol (“the contract is what messages exist, not what bit goes where”). cambios-libgui-proto’s encoders/decoders are the authoritative wire definition.

Authority model.

Every C→C message is identity-stamped: the compositor reads sender_principal from the kernel’s recv_msg header (Window::owner_principal is set from the CreateWindow sender). Any subsequent message claiming a window_id whose owner_principal doesn’t match the kernel-stamped sender is rejected as NoSuchWindow (chosen over Forbidden to avoid leaking window-existence to non-owners).
C→C messages reach the compositor at COMPOSITOR_ENDPOINT = 28; the compositor replies to the client’s recorded client_endpoint (the first endpoint the client registered, looked up via the kernel’s REPLY_ENDPOINT map). CreateWindow.reply_endpoint != 0 overrides this for multi-layer clients whose reply queue isn’t the kernel-stamped first endpoint.
The drag/resize messages (0x3030 / 0x3031) are the only ones today where the compositor can also synthesize equivalent state changes itself — title-bar drag is libgui-driven, but resize-drag is compositor-driven (edge-grab UX, C12c). For that path the compositor calls commit_window_resize directly, bypassing the per-message Principal check (compositor is the trusted authority for edge-drag UX). The kernel-stamped check is preserved on the client-initiated path.

Surface channel role convention.

SYS_CHANNEL_CREATE is called by the compositor with role = Consumer and peer_principal = client's Principal. From the kernel’s perspective the compositor reads (Consumer); the peer (client) writes. From the user’s mental model: the client renders into the surface, the compositor reads the rendered pixels and composites them into the scanout. RW on both ends — the kernel maps Consumer-role channels RW for the creator and RW for the peer; the role is a documentation/audit hint, not a permission gate.
Surface channels are bound 1:1 to a Window. Resize replaces the channel atomically (begin_teardown → channel_create → WindowResized → complete_teardown, per ADR-027 § Phase 1). Ownership is the window’s owner_principal; revoke on owner exit goes through ADR-007 Divergence 7 tombstone-on-revoke.

Bounds (lands inside libgui-proto, listed here for visibility).

MAX_WINDOWS = 32 — compositor’s window-table size + libgui-proto’s wire-protocol upper bound.
MAX_WINDOW_DIMENSION = 8192 — per-axis cap; RequestResize rejects above this.
MAX_DAMAGE_RECTS_PER_FRAME = 16 — fits a FrameReady in 256 bytes; above this, clients send “full surface dirty.”
MAX_MESSAGE_SIZE = 256 — control-IPC message ceiling, matching ADR-005’s bound.

All four are SCAFFOLDING in libgui-proto/src/lib.rs with replace-when triggers tied to the multi-monitor / cluster-of-windows / 4K-Retina endgame work — same envelope ADR-011 sized the kernel-side bounds for.

What did NOT diverge. ADR-014’s compositor↔scanout-driver protocol shape (ScanoutBackend trait, control-IPC + shared-memory split, endpoint numbers 27/28, register-by-Principal) is unchanged. The C↔C protocol layers on top of the same channel substrate (ADR-005) — no new IPC primitives. Capability surface stays as named in § Capability model (compositor still needs no MMIO, DMA, or framebuffer mappings).

Promotion trigger (when this becomes its own ADR). Lift the appendix into a standalone ADR — numbered at promotion time, not pre-allocated — when any of:

The protocol gains a 5th client→compositor or 5th compositor→client message beyond the v1 ten.
A non-libgui client implementation surfaces (alternate widget toolkit, native app bypassing libgui), making “libgui-proto is the de-facto definition” insufficient as the contract.
A trusted-overlay channel lands for auth UI (mentioned in ADR-011’s spoofing follow-up) — that’s a structurally different conversation between compositor and a privileged client and warrants its own protocol surface.

Until then, this appendix is the ratified record + the libgui-proto crate is the implementation.