Store Halfword Byte-Reverse Indexed

A Power Technical Blog

Fuzzing grub: part 1

Recently a set of 8 vulnerabilities were disclosed for the grub bootloader. I found 2 of them (CVE-2021-20225 and CVE-2021-20233), and contributed a number of other fixes for crashing bugs which we don't believe are exploitable. I found them by applying fuzz testing to grub. Here's how.

This is a multi-part series: I think it will end up being 4 posts. I'm hoping to cover:

  • Part 1 (this post): getting started with fuzzing grub
  • Part 2: going faster by doing lots more work
  • Part 3: fuzzing filesystems and more
  • Part 4: potential next steps and avenues for further work

Fuzz testing

Let's begin with part one: getting started with fuzzing grub.

One of my all-time favourite techniques for testing programs, especially programs that handle untrusted input, and especially-especially programs written in C that parse untrusted input, is fuzz testing. Fuzz testing (or fuzzing) is the process of repeatedly throwing randomised data at your program under test and seeing what it does.

(For the secure boot threat model, untrusted input is anything not validated by a cryptographic signature - so config files are untrusted for our purposes, but grub modules can only be loaded if they are signed, so they are trusted.)

Fuzzing has a long history and has recently received a new lease on life with coverage-guided fuzzing tools like AFL and more recently AFL++.

Building grub for AFL++

AFL++ is extremely easy to use ... if your program:

  1. is built as a single binary with a regular tool-chain
  2. runs as a regular user-space program on Linux
  3. reads a small input files from disk and then exits
  4. doesn't do anything fancy with threads or signals

Beyond that, it gets a bit more complex.

On the face of it, grub fails 3 of these 4 criteria:

  • grub is a highly modular program: it loads almost all of its functionality as modules which are linked as separate ELF relocatable files. (Not runnable programs, but not shared libraries either.)

  • grub usually runs as a bootloader, not as a regular app.

  • grub reads all sorts of things, ranging in size from small files to full disks. After loading most things, it returns to a command prompt rather than exiting.

Fortunately, these problems are not insurmountable.

We'll start with the 'running as a bootloader' problem. Here, grub helps us out a bit, because it provides an 'emulator' target, which runs most of grub functionality as a userspace program. It doesn't support actually booting anything (unsurprisingly) but it does support most other modules, including things like the config file parser.

We can configure grub to build the emulator. We disable the graphical frontend for now.

./bootstrap
./configure --with-platform=emu --disable-grub-emu-sdl

At this point in building a fuzzing target, we'd normally try to configure with afl-cc to get the instrumentation that makes AFL(++) so powerful. However, the grub configure script is not a fan:

./configure --with-platform=emu --disable-grub-emu-sdl CC=$AFL_PATH/afl-cc
...
checking whether target compiler is working... no
configure: error: cannot compile for the target

It also doesn't work with afl-gcc.

Hmm, ok, so what if we just... lie a bit?

./configure --with-platform=emu --disable-grub-emu-sdl
make CC="$AFL_PATH/afl-gcc" 

(Normally I'd use CC=clang and afl-cc, but clang support is slightly broken upstream at the moment.)

After a small fix for gcc-10 compatibility, we get the userspace tools (potentially handy!) but a bunch of link errors for grub-emu:

/usr/bin/ld: disk.module:(.bss+0x20): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here
/usr/bin/ld: regexp.module:(.bss+0x70): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here
/usr/bin/ld: blocklist.module:(.bss+0x28): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here

The problem is the module linkage that I talked about earlier: because there is a link stage of sorts for each module, some AFL support code gets linked in to both the grub kernel (kernel.exec) and each module (here disk.module, regexp.module, ...). The linker doesn't like it being in both, which is fair enough.

To get started, let's instead take advantage of the smarts of AFL++ using Qemu mode instead. This builds a specially instrumented qemu user-mode emulator that's capable of doing coverage-guided fuzzing on uninstrumented binaries at the cost of a significant performance penalty.

make clean
make

Now we have a grub-emu binary. If you run it directly, you'll pick up your system boot configuration, but the -d option can point it to a directory of your choosing. Let's set up one for fuzzing:

mkdir stage
echo "echo Hello sthbrx readers" > stage/grub.cfg
cd stage
../grub-core/grub-emu -d .

You probably won't see the message because the screen gets blanked at the end of running the config file, but if you pipe it through less or something you'll see it.

Running the fuzzer

So, that seems to work - let's create a test input and try fuzzing:

cd ..
mkdir in
echo "echo hi" > in/echo-hi

cd stage
# -Q qemu mode
# -M main fuzzer
# -d don't do deterministic steps (too slow for a text format)
# -f create file grub.cfg
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -d -- ../grub-core/grub-emu -d .

Sadly:

[-] The program took more than 1000 ms to process one of the initial test cases.
    This is bad news; raising the limit with the -t option is possible, but
    will probably make the fuzzing process extremely slow.

    If this test case is just a fluke, the other option is to just avoid it
    altogether, and find one that is less of a CPU hog.

[-] PROGRAM ABORT : Test case 'id:000000,time:0,orig:echo-hi' results in a timeout
         Location : perform_dry_run(), src/afl-fuzz-init.c:866

What we're seeing here (and indeed what you can observe if you run grub-emu directly) is that grub-emu isn't exiting when it's done. It's waiting for more input, and will keep waiting for input until it's killed by afl-fuzz.

We need to patch grub to sort that out. It's on my GitHub.

Apply that, rebuild with FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION, and voila:

cd ..
make CFLAGS="-DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION"
cd stage
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -d -f grub.cfg -- ../grub-core/grub-emu -d .

And fuzzing is happening!

afl-fuzz fuzzing grub, showing fuzzing happening

This is enough to find some of the (now-fixed) bugs in the grub config file parsing!

Fuzzing beyond the config file

You can also extend this to fuzzing other things that don't require the graphical UI, such as grub's transparent decompression support:

cd ..
rm -rf in out stage
mkdir in stage
echo hi > in/hi
gzip in/hi
cd stage
echo "cat thefile" > grub.cfg
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -f thefile -- ../grub-core/grub-emu -d .

You should be able to find a hang pretty quickly with this, an as-yet-unfixed bug where grub will print output forever from a corrupt file: (your mileage may vary, as will the paths.)

cp ../out/main/hangs/id:000000,src:000000,time:43383,op:havoc,rep:16 thefile
../grub-core/grub-emu -d . | less # observe this going on forever

zcat, on the other hand, reports it as simply corrupt:

$ zcat thefile

gzip: thefile: invalid compressed data--format violated

(Feel free to fix that and send a patch to the list!)

That wraps up part 1. Eventually I'll be back with part 2, where I explain the hoops to jump through to go faster with the afl-cc instrumentation.

linux.conf.au 2020 recap

It's that time of year again. Most of OzLabs headed up to the Gold Coast for linux.conf.au 2020.

linux.conf.au is one of the longest-running community-led Linux and Free Software events in the world, and attracts a crowd from Australia, New Zealand and much further afield. OzLabbers have been involved in LCA since the very beginning and this year was no exception with myself running the Kernel Miniconf and several others speaking.

The list below contains some of our highlights that we think you should check out. This is just a few of the talks that we managed to make it to - there's plenty more worthwhile stuff on the linux.conf.au YouTube channel.

We'll see you all at LCA2021 right here in Canberra...

Keynotes

A couple of the keynotes really stood out:

Sean is a forensic structural engineer who shows us a variety of examples, from structural collapses and firefighting disasters, where trained professionals were blinded by their expertise and couldn't bring themselves to do things that were obvious.

There's nothing quite like cryptography proofs presented to a keynote audience at 9:30 in the morning. Vanessa goes over the issues with electronic voting systems in Australia, and especially internet voting as used in NSW, including flaws in their implementation of cryptographic algorithms. There continues to be no good way to do internet voting, but with developments in methodologies like risk-limiting audits there may be reasonably safe ways to do in-person electronic voting.

OpenPOWER

There was an OpenISA miniconf, co-organised by none other than Hugh Blemings of the OpenPOWER Foundation.

Anton (on Mikey's behalf) introduces the Power OpenISA and the Microwatt FPGA core which has been released to go with it.

Anton live demos Microwatt in simulation, and also tries to synthesise it for his FPGA but runs out of time...

Paul presents an in-depth overview of the design of the Microwatt core.

Kernel

There were quite a few kernel talks, both in the Kernel Miniconf and throughout the main conference. These are just some of them:

There's been many cases where we've introduced a syscall only to find out later on that we need to add some new parameters - how do we make our syscalls extensible so we can add new parameters later on without needing to define a whole new syscall, while maintaining both forward and backward compatibility? It turns out it's pretty simple but needs a few more kernel helpers.

There are a bunch of tools out there which you can use to make your kernel hacking experience much more pleasant. You should use them.

Among other security issues with container runtimes, using procfs to setup security controls during the startup of a container is fraught with hilarious problems, because procfs and the Linux filesystem API aren't really designed to do this safely, and also have a bunch of amusing bugs.

Control Flow Integrity is a technique for restricting exploit techniques that hijack a program's control flow (e.g. by overwriting a return address on the stack (ROP), or overwriting a function pointer that's used in an indirect jump). Kees goes through the current state of CFI supporting features in hardware and what is currently available to enable CFI in the kernel.

Linux has supported huge pages for many years, which has significantly improved CPU performance. However, the huge page mechanism was driven by hardware advancements and is somewhat inflexible, and it's just as important to consider software overhead. Matthew has been working on supporting more flexible "large pages" in the page cache to do just that.

Spoiler: the magical fantasy land is a trap.

Community

Lots of community and ethics discussion this year - one talk which stood out to me:

Bradley and Karen argue that while open source has "won", software freedom has regressed in recent years, and present their vision for what modern, pragmatic Free Software activism should look like.

Other

Among the variety of other technical talks at LCA...

Quantum compilers are not really like regular classical compilers (indeed, they're really closer to FPGA synthesis tools). Matthew talks through how quantum compilers map a program on to IBM's quantum hardware and the types of optimisations they apply.

Clevis and Tang provide an implementation of "network bound encryption", allowing you to magically decrypt your secrets when you are on a secure network with access to the appropriate Tang servers. This talk outlines use cases and provides a demonstration.

Christoph discusses how to deal with the hardware and software limitations that make it difficult to capture traffic at wire speed on fast fibre networks.

rfid and hrfid

I was staring at some assembly recently, and for not the first time encountered rfid and hrfid, two instructions that we use when doing things like returning to userspace, returning from OPAL to the kernel, or from a host kernel into a guest.

rfid copies various bits from the register SRR1 (Machine Status Save/Restore Register 1) into the MSR (Machine State Register), and then jumps to an address given in SRR0 (Machine Status Save/Restore Register 0). hrfid does something similar, using HSRR0 and HSRR1 (Hypervisor Machine Status Save/Restore Registers 0/1), and slightly different handling of MSR bits.

The various Save/Restore Registers are used to preserve the state of the CPU before jumping to an interrupt handler, entering the kernel, etc, and are set up as part of instructions like sc (System Call), by the interrupt mechanism, or manually (using instructions like mtsrr1).

Anyway, the way in which rfid and hrfid restores MSR bits is documented somewhat obtusely in the ISA (if you don't believe me, look it up), and I was annoyed by this, so here, have a more useful definition. Leave a comment if I got something wrong.

rfid - Return From Interrupt Doubleword

Machine State Register

Copy all bits (except some reserved bits) from SRR1 into the MSR, with the following exceptions:

  • MSR_3 (HV, Hypervisor State) = MSR_3 & SRR1_3
    [We won't put the thread into hypervisor state if we're not already in hypervisor state]

  • If MSR_29:31 != 0b010 [Transaction State Suspended, TM not available], or SRR1_29:31 != 0b000 [Transaction State Non-transactional, TM not available] then:

    • MSR_29:30 (TS, Transaction State) = SRR1_29:30
    • MSR_31 (TM, Transactional Memory Available) = SRR1_31

    [See the ISA description for explanation on how rfid interacts with TM and resulting interrupts]

  • MSR_48 (EE, External Interrupt Enable) = SRR1_48 | SRR1_49 (PR, Problem State)
    [If going into problem state, external interrupts will be enabled]

  • MSR_51 (ME, Machine Check Interrupt Enable) = (MSR_3 (HV, Hypervisor State) & SRR1_51) | ((! MSR_3) & MSR_51)
    [If we're not already in hypervisor state, we won't alter ME]

  • MSR_58 (IR, Instruction Relocate) = SRR1_58 | SRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

  • MSR_59 (DR, Data Relocate) = SRR1_59 | SRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

Next Instruction Address

  • NIA = SRR0_0:61 || 0b00
    [Jump to SRR0, set last 2 bits to zero to ensure address is aligned to 4 bytes]

hrfid - Hypervisor Return From Interrupt Doubleword

Machine State Register

Copy all bits (except some reserved bits) from HSRR1 into the MSR, with the following exceptions:

  • If MSR_29:31 != 0b010 [Transaction State Suspended, TM not available], or HSRR1_29:31 != 0b000 [Transaction State Non-transactional, TM not available] then:

    • MSR_29:30 (TS, Transaction State) = HSRR1_29:30
    • MSR_31 (TM, Transactional Memory Available) = HSRR1_31

    [See the ISA description for explanation on how rfid interacts with TM and resulting interrupts]

  • MSR_48 (EE, External Interrupt Enable) = HSRR1_48 | HSRR1_49 (PR, Problem State)
    [If going into problem state, external interrupts will be enabled]

  • MSR_58 (IR, Instruction Relocate) = HSRR1_58 | HSRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

  • MSR_59 (DR, Data Relocate) = HSRR1_59 | HSRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

Next Instruction Address

  • NIA = HSRR0_0:61 || 0b00
    [Jump to HSRR0, set last 2 bits to zero to ensure address is aligned to 4 bytes]

TEN THOUSAND DISKS

In OpenPOWER land we have a project called op-test-framework which (for all its strengths and weaknesses) allows us to test firmware on a variety of different hardware platforms and even emulators like Qemu.

Qemu is a fantasic tool allowing us to relatively quickly test against an emulated POWER model, and of course is a critical part of KVM virtual machines running natively on POWER hardware. However the default POWER model in Qemu is based on the "pseries" machine type, which models something closer to a virtual machine or a PowerVM partition rather than a "bare metal" machine.

Luckily we have C├ędric Le Goater who is developing and maintaining a Qemu "powernv" machine type which more accurately models running directly on an OpenPOWER machine. It's an unwritten rule that if you're using Qemu in op-test, you've compiled this version of Qemu!

Teething Problems

Because the "powernv" type does more accurately model the physical system some extra care needs to be taken when setting it up. In particular at one point we noticed that the pretend CDROM and disk drive we attached to the model were.. not being attached. This commit took care of that; the problem was that the PCI topology defined by the layout required us to be more exact about where PCI devices were to be added. By default only three spare PCI "slots" are available but as the commit says, "This can be expanded by adding bridges"...

More Slots!

Never one to stop at a just-enough solution, I wondered how easy it would be to add an extra PCI bridge or two to give the Qemu model more available slots for PCI devices. It turns out, easy enough once you know the correct invocation. For example, adding a PCI bridge in the first slot of the first default PHB is:

-device pcie-pci-bridge,id=pcie.3,bus=pcie.0,addr=0x0

And inserting a device in that bridge just requires us to specify the bus and slot:

-device virtio-blk-pci,drive=cdrom01,id=virtio02,bus=pcie.4,addr=3

Great! Each bridge provides 31 slots, so now we have plenty of room for extra devices.

Why Stop There?

We have three free slots, and we don't have a strict requirement on where devices are plugged in, so lets just plug a bridge into each of those slots while we're here:

-device pcie-pci-bridge,id=pcie.3,bus=pcie.0,addr=0x0 \
-device pcie-pci-bridge,id=pcie.4,bus=pcie.1,addr=0x0 \
-device pcie-pci-bridge,id=pcie.5,bus=pcie.2,addr=0x0

What happens if we insert a new PCI bridge into another PCI bridge? Aside from stressing out our PCI developers, a bunch of extra slots! And then we could plug bridges into those bridges and then..


Thus was born "OpTestQemu: Add PCI bridges to support more devices." and the testcase "Petitboot10000Disks". The changes to the Qemu model setup fill up each PCI bridge as long as we have devices to add, but reserve the first slot to add another bridge if we run out of room... and so on..

Officially this is to support adding interesting disk topologies to test Pettiboot use cases, stress test device handling, and so on, but while we're here... what happens with 10,000 temporary disks?

======================================================================
ERROR: testListDisks (testcases.Petitboot10000Disks.ConfigEditorTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sam/git/op-test-framework/testcases/Petitboot10000Disks.py", line 27, in setUp
    self.system.goto_state(OpSystemState.PETITBOOT_SHELL)
  File "/home/sam/git/op-test-framework/common/OpTestSystem.py", line 366, in goto_state
    self.state = self.stateHandlers[self.state](state)
  File "/home/sam/git/op-test-framework/common/OpTestSystem.py", line 695, in run_IPLing
    raise my_exception
UnknownStateTransition: Something happened system state="2" and we transitioned to UNKNOWN state.  Review the following for more details
Message="OpTestSystem in run_IPLing and the Exception=
"filedescriptor out of range in select()"
 caused the system to go to UNKNOWN_BAD and the system will be stopping."

Yeah that's probably to be expected without some more massaging. What about a more modest 512?

I: Resetting PHBs and training links...
[   55.293343496,5] PCI: Probing slots...
[   56.364337089,3] PHB#0000:02:01.0 pci_find_ecap hit a loop !
[   56.364973775,3] PHB#0000:02:01.0 pci_find_ecap hit a loop !
[   57.127964432,3] PHB#0000:03:01.0 pci_find_ecap hit a loop !
[   57.128545637,3] PHB#0000:03:01.0 pci_find_ecap hit a loop !
[   57.395489618,3] PHB#0000:04:01.0 pci_find_ecap hit a loop !
[   57.396048285,3] PHB#0000:04:01.0 pci_find_ecap hit a loop !
[   58.145944205,3] PHB#0000:05:01.0 pci_find_ecap hit a loop !
[   58.146465795,3] PHB#0000:05:01.0 pci_find_ecap hit a loop !
[   58.404954853,3] PHB#0000:06:01.0 pci_find_ecap hit a loop !
[   58.405485438,3] PHB#0000:06:01.0 pci_find_ecap hit a loop !
[   60.178957315,3] PHB#0001:02:01.0 pci_find_ecap hit a loop !
[   60.179524173,3] PHB#0001:02:01.0 pci_find_ecap hit a loop !
[   60.198502097,3] PHB#0001:02:02.0 pci_find_ecap hit a loop !
[   60.198982582,3] PHB#0001:02:02.0 pci_find_ecap hit a loop !
[   60.435096197,3] PHB#0001:03:01.0 pci_find_ecap hit a loop !
[   60.435634380,3] PHB#0001:03:01.0 pci_find_ecap hit a loop !
[   61.171512439,3] PHB#0001:04:01.0 pci_find_ecap hit a loop !
[   61.172029071,3] PHB#0001:04:01.0 pci_find_ecap hit a loop !
[   61.425416049,3] PHB#0001:05:01.0 pci_find_ecap hit a loop !
[   61.425934524,3] PHB#0001:05:01.0 pci_find_ecap hit a loop !
[   62.172664549,3] PHB#0001:06:01.0 pci_find_ecap hit a loop !
[   62.173186458,3] PHB#0001:06:01.0 pci_find_ecap hit a loop !
[   63.434516732,3] PHB#0002:02:01.0 pci_find_ecap hit a loop !
[   63.435062124,3] PHB#0002:02:01.0 pci_find_ecap hit a loop !
[   64.177567772,3] PHB#0002:03:01.0 pci_find_ecap hit a loop !
[   64.178099773,3] PHB#0002:03:01.0 pci_find_ecap hit a loop !
[   64.431763989,3] PHB#0002:04:01.0 pci_find_ecap hit a loop !
[   64.432285000,3] PHB#0002:04:01.0 pci_find_ecap hit a loop !
[   65.180506790,3] PHB#0002:05:01.0 pci_find_ecap hit a loop !
[   65.181049905,3] PHB#0002:05:01.0 pci_find_ecap hit a loop !
[   65.432105600,3] PHB#0002:06:01.0 pci_find_ecap hit a loop !
[   65.432654326,3] PHB#0002:06:01.0 pci_find_ecap hit a loop !

(That isn't good)

[   66.177240655,5] PCI Summary:
[   66.177906083,5] PHB#0000:00:00.0 [ROOT] 1014 03dc R:00 C:060400 B:01..07 
[   66.178760724,5] PHB#0000:01:00.0 [ETOX] 1b36 000e R:00 C:060400 B:02..07 
[   66.179501494,5] PHB#0000:02:01.0 [ETOX] 1b36 000e R:00 C:060400 B:03..07 
[   66.180227773,5] PHB#0000:03:01.0 [ETOX] 1b36 000e R:00 C:060400 B:04..07 
[   66.180953149,5] PHB#0000:04:01.0 [ETOX] 1b36 000e R:00 C:060400 B:05..07 
[   66.181673576,5] PHB#0000:05:01.0 [ETOX] 1b36 000e R:00 C:060400 B:06..07 
[   66.182395253,5] PHB#0000:06:01.0 [ETOX] 1b36 000e R:00 C:060400 B:07..07 
[   66.183207399,5] PHB#0000:07:02.0 [PCID] 1af4 1001 R:00 C:010000 (          scsi) 
[   66.183969138,5] PHB#0000:07:03.0 [PCID] 1af4 1001 R:00 C:010000 (          scsi) 

(a lot more of this)

[   67.055196945,5] PHB#0002:02:1e.0 [PCID] 1af4 1001 R:00 C:010000 (          scsi) 
[   67.055926264,5] PHB#0002:02:1f.0 [PCID] 1af4 1001 R:00 C:010000 (          scsi) 
[   67.094591773,5] INIT: Waiting for kernel...
[   67.095105901,5] INIT: 64-bit LE kernel discovered
[   68.095749915,5] INIT: Starting kernel at 0x20010000, fdt at 0x3075d270 168365 bytes

zImage starting: loaded at 0x0000000020010000 (sp: 0x0000000020d30ee8)
Allocating 0x1dc5098 bytes for kernel...
Decompressing (0x0000000000000000 <- 0x000000002001f000:0x0000000020d2e578)...
Done! Decompressed 0x1c22900 bytes

Linux/PowerPC load: 
Finalizing device tree... flat tree at 0x20d320a0
[   10.120562] watchdog: CPU 0 self-detected hard LOCKUP @ pnv_pci_cfg_write+0x88/0xa4
[   10.120746] watchdog: CPU 0 TB:50402010473, last heartbeat TB:45261673150 (10039ms ago)
[   10.120808] Modules linked in:
[   10.120906] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.5-openpower1 #2
[   10.120956] NIP:  c000000000058544 LR: c00000000004d458 CTR: 0000000030052768
[   10.121006] REGS: c0000000fff5bd70 TRAP: 0900   Not tainted  (5.0.5-openpower1)
[   10.121030] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 48002482  XER: 20000000
[   10.121215] CFAR: c00000000004d454 IRQMASK: 1 
[   10.121260] GPR00: 00000000300051ec c0000000fd7c3130 c000000001bcaf00 0000000000000000 
[   10.121368] GPR04: 0000000048002482 c000000000058544 9000000002009033 0000000031c40060 
[   10.121476] GPR08: 0000000000000000 0000000031c40060 c00000000004d46c 9000000002001003 
[   10.121584] GPR12: 0000000031c40000 c000000001dd0000 c00000000000f560 0000000000000000 
[   10.121692] GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000000000 
[   10.121800] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   10.121908] GPR24: 0000000000000005 0000000000000000 0000000000000000 0000000000000104 
[   10.122016] GPR28: 0000000000000002 0000000000000004 0000000000000086 c0000000fd9fba00 
[   10.122150] NIP [c000000000058544] pnv_pci_cfg_write+0x88/0xa4
[   10.122187] LR [c00000000004d458] opal_return+0x14/0x48
[   10.122204] Call Trace:
[   10.122251] [c0000000fd7c3130] [c000000000058544] pnv_pci_cfg_write+0x88/0xa4 (unreliable)
[   10.122332] [c0000000fd7c3150] [c0000000000585d0] pnv_pci_write_config+0x70/0x9c
[   10.122398] [c0000000fd7c31a0] [c000000000234fec] pci_bus_write_config_word+0x74/0x98
[   10.122458] [c0000000fd7c31f0] [c00000000023764c] __pci_read_base+0x88/0x3a4
[   10.122518] [c0000000fd7c32c0] [c000000000237a18] pci_read_bases+0xb0/0xc8
[   10.122605] [c0000000fd7c3300] [c0000000002384bc] pci_setup_device+0x4f8/0x5b0
[   10.122670] [c0000000fd7c33a0] [c000000000238d9c] pci_scan_single_device+0x9c/0xd4
[   10.122729] [c0000000fd7c33f0] [c000000000238e2c] pci_scan_slot+0x58/0xf4
[   10.122796] [c0000000fd7c3430] [c000000000239eb8] pci_scan_child_bus_extend+0x40/0x2a8
[   10.122861] [c0000000fd7c34a0] [c000000000239e34] pci_scan_bridge_extend+0x4d4/0x504
[   10.122928] [c0000000fd7c3580] [c00000000023a0f8] pci_scan_child_bus_extend+0x280/0x2a8
[   10.122993] [c0000000fd7c35f0] [c000000000239e34] pci_scan_bridge_extend+0x4d4/0x504
[   10.123059] [c0000000fd7c36d0] [c00000000023a0f8] pci_scan_child_bus_extend+0x280/0x2a8
[   10.123124] [c0000000fd7c3740] [c000000000239e34] pci_scan_bridge_extend+0x4d4/0x504
[   10.123191] [c0000000fd7c3820] [c00000000023a0f8] pci_scan_child_bus_extend+0x280/0x2a8
[   10.123256] [c0000000fd7c3890] [c000000000239b5c] pci_scan_bridge_extend+0x1fc/0x504
[   10.123322] [c0000000fd7c3970] [c00000000023a064] pci_scan_child_bus_extend+0x1ec/0x2a8
[   10.123388] [c0000000fd7c39e0] [c000000000239b5c] pci_scan_bridge_extend+0x1fc/0x504
[   10.123454] [c0000000fd7c3ac0] [c00000000023a064] pci_scan_child_bus_extend+0x1ec/0x2a8
[   10.123516] [c0000000fd7c3b30] [c000000000030dcc] pcibios_scan_phb+0x134/0x1f4
[   10.123574] [c0000000fd7c3bd0] [c00000000100a800] pcibios_init+0x9c/0xbc
[   10.123635] [c0000000fd7c3c50] [c00000000000f398] do_one_initcall+0x80/0x15c
[   10.123698] [c0000000fd7c3d10] [c000000001000e94] kernel_init_freeable+0x248/0x24c
[   10.123756] [c0000000fd7c3db0] [c00000000000f574] kernel_init+0x1c/0x150
[   10.123820] [c0000000fd7c3e20] [c00000000000b72c] ret_from_kernel_thread+0x5c/0x70
[   10.123854] Instruction dump:
[   10.123885] 7d054378 4bff56f5 60000000 38600000 38210020 e8010010 7c0803a6 4e800020 
[   10.124022] e86a0018 54c6043e 7d054378 4bff5731 <60000000> 4bffffd8 e86a0018 7d054378 
[   10.124180] Kernel panic - not syncing: Hard LOCKUP
[   10.124232] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.5-openpower1 #2
[   10.124251] Call Trace:

I wonder if I can submit that bug without someone throwing something at my desk.

dm-crypt: Password Prompts Proliferate

Just recently Petitboot added a method to ask the user for a password before allowing certain actions to proceed. Underneath the covers this is checking against the root password, but the UI "pop-up" asking for the password is relatively generic. Something else which has been on the to-do list for a while is support for mounting encrpyted partitions, but there wasn't a good way to retrieve the password for them - until now!

With the password problem solved, there isn't too much else to do. If Petitboot sees an encrypted partition it makes a note of it and informs the UI via the device_add interface. Seeing this the UI shows this device in the UI even though there aren't any boot options associated with it yet:

encrypted_hdr

Unlike normal devices in the menu these are selectable; once that happens the user is prompted for the password:

encrypted_password

With password in hand pb-discover will then try to open the device with cryptsetup. If that succeeds the encrypted device is removed from the UI and replaced with the new un-encrypted device:

unencrypted_hdr

That's it! These devices can't be auto-booted from at the moment since the password needs to be manually entered. The UI also doesn't have a way yet to select specific options for cryptsetup, but if you find yourself needing to do so you can run cryptsetup manually from the shell and pb-discover will recognise the new unencrypted device automatically.

This is in Petitboot as of v1.10.3 so go check it out! Just make sure your image includes the kernel and cryptsetup dependencies.