Running ppc64le_hello on real hardware

Wed 03 June 2015

Posted by Daniel Axtens Wed 03 June 2015

So today I saw Freestanding “Hello World” for OpenPower on Hacker News. Sadly Andrei hadn't been able to test it on real hardware, so I set out to get it running on a real OpenPOWER box. Here's what I did.

Firstly, clone the repo, and, as mentioned in the README, comment out mambo_write. Build it.

Grab op-build, and build a Habanero defconfig. To save yourself a fair bit of time, first edit openpower/configs/habanero_defconfig to answer n about a custom kernel source. That'll save you hours of waiting for git.

This will build you a PNOR that will boot a linux kernel with Petitboot. This is almost what you want: you need Skiboot, Hostboot and a bunch of the POWER specific bits and bobs, but you don't actually want the Linux boot kernel.

Then, based on op-build/openpower/package/openpower-pnor/openpower-pnor.mk, we look through the output of op-build for a create_pnor_image.pl command, something like this monstrosity:

PATH="/scratch/dja/public/op-build/output/host/bin:/scratch/dja/public/op-build/output/host/sbin:/scratch/dja/public/op-build/output/host/usr/bin:/scratch/dja/public/op-build/output/host/usr/sbin:/home/dja/bin:/home/dja/bin:/home/dja/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/openpower/common/x86_64/bin" /scratch/dja/public/op-build/output/build/openpower-pnor-ed1682e10526ebd85825427fbf397361bb0e34aa/create_pnor_image.pl -xml_layout_file /scratch/dja/public/op-build/output/build/openpower-pnor-ed1682e10526ebd85825427fbf397361bb0e34aa/"defaultPnorLayoutWithGoldenSide.xml" -pnor_filename /scratch/dja/public/op-build/output/host/usr/powerpc64-buildroot-linux-gnu/sysroot/pnor/"habanero.pnor" -hb_image_dir /scratch/dja/public/op-build/output/host/usr/powerpc64-buildroot-linux-gnu/sysroot/hostboot_build_images/ -scratch_dir /scratch/dja/public/op-build/output/host/usr/powerpc64-buildroot-linux-gnu/sysroot/openpower_pnor_scratch/ -outdir /scratch/dja/public/op-build/output/host/usr/powerpc64-buildroot-linux-gnu/sysroot/pnor/ -payload /scratch/dja/public/op-build/output/images/"skiboot.lid" -bootkernel /scratch/dja/public/op-build/output/images/zImage.epapr -sbe_binary_filename "venice_sbe.img.ecc" -sbec_binary_filename "centaur_sbec_pad.img.ecc" -wink_binary_filename "p8.ref_image.hdr.bin.ecc" -occ_binary_filename /scratch/dja/public/op-build/output/host/usr/powerpc64-buildroot-linux-gnu/sysroot/occ/"occ.bin" -targeting_binary_filename "HABANERO_HB.targeting.bin.ecc" -openpower_version_filename /scratch/dja/public/op-build/output/host/usr/powerpc64-buildroot-linux-gnu/sysroot/openpower_version/openpower-pnor.version.txt

Replace the -bootkernel arguement with the path to ppc64le_hello, e.g.: -bootkernel /scratch/dja/public/ppc64le_hello/ppc64le_hello

Don't forget to move it into place!

mv output/host/usr/powerpc64-buildroot-linux-gnu/sysroot/pnor/habanero.pnor output/images/habanero.pnor

Then we can use skiboot's boot test script (written by Cyril and me, coincidentally!) to flash it.

ppc64le_hello/skiboot/external/boot-tests/boot_test.sh -vp -t hab2-bmc -P <path to>/habanero.pnor

It's not going to get into Petitboot, so just interrupt it after it powers up the box and connect with IPMI. It boots, kinda:

[11012941323,5] INIT: Starting kernel at 0x20010000, fdt at 0x3044db68 (size 0x11cc3)
Hello OPAL!
           _start = 0x20010000
                              _bss   = 0x20017E28
                                                 _stack = 0x20018000
                                                                    _end   = 0x2001A000
                                                                                       KPCR   = 0x20017E50
                                                                                                          OPAL   = 0x30000000
                                                                                                                             FDT    = 0x3044DB68
                                                                                                                                                CPU0 not found?

                                                                                                                                                               Pick your poison:
                                                                                                                                                                                Choices: (MMU = disabled):
                                                                                                                                                                                                             (d) 5s delay
                                                                                                                                                                                                                            (e) test exception
    (n) test nested exception
                                (f) dump FDT
                                               (M) enable MMU
                                                                (m) disable MMU
                                                                                  (t) test MMU
                                                                                                 (u) test non-priviledged code
                                                                                                                                 (I) enable ints
                                                                                                                                                   (i) disable ints
                                                                                                                                                                      (H) enable HV dec
                                                                                                                                                                                          (h) disable HV dec
                                                                                                                                                                                                               (q) poweroff
                                                                                                                                                                                                                             1.42486|ERRL|Dumping errors reported prior to registration

Yes, it does wrap horribly. However, the big issue here (which you'll have to scroll to see!) is the "CPU0 not found?". Fortunately, we can fix this with a little patch to cpu_init in main.c to test for a PowerPC POWER8:

    cpu0_node = fdt_path_offset(fdt, "/cpus/cpu@0");
    if (cpu0_node < 0) {
        cpu0_node = fdt_path_offset(fdt, "/cpus/PowerPC,POWER8@20");
    }
    if (cpu0_node < 0) {
        printk("CPU0 not found?\n");
        return;
    }

This is definitely the wrong way to do this, but it works for now.

Now, correcting for weird wrapping, we get:

Hello OPAL!
_start = 0x20010000
_bss   = 0x20017E28
_stack = 0x20018000
_end   = 0x2001A000
KPCR   = 0x20017E50
OPAL   = 0x30000000
FDT    = 0x3044DB68
Assuming default SLB size
SLB size = 0x20
TB freq = 512000000
[13205442015,3] OPAL: Trying a CPU re-init with flags: 0x2
Unrecoverable exception stack top @ 0x20019EC8
HTAB (2048 ptegs, mask 0x7FF, size 0x40000) @ 0x20040000
SLB entries:
1: E 0x8000000 V 0x4000000000000400
EA 0x20040000 -> hash 0x20040 -> pteg 0x200 = RA 0x20040000
EA 0x20041000 -> hash 0x20041 -> pteg 0x208 = RA 0x20041000
EA 0x20042000 -> hash 0x20042 -> pteg 0x210 = RA 0x20042000
EA 0x20043000 -> hash 0x20043 -> pteg 0x218 = RA 0x20043000
EA 0x20044000 -> hash 0x20044 -> pteg 0x220 = RA 0x20044000
EA 0x20045000 -> hash 0x20045 -> pteg 0x228 = RA 0x20045000
EA 0x20046000 -> hash 0x20046 -> pteg 0x230 = RA 0x20046000
EA 0x20047000 -> hash 0x20047 -> pteg 0x238 = RA 0x20047000
EA 0x20048000 -> hash 0x20048 -> pteg 0x240 = RA 0x20048000
...

The weird wrapping seems to be caused by NULLs getting printed to OPAL, but I haven't traced what causes that.

Anyway, now it largely works! Here's a transcript of some things it can do on real hardware.

Choices: (MMU = disabled):
   (d) 5s delay
   (e) test exception
   (n) test nested exception
   (f) dump FDT
   (M) enable MMU
   (m) disable MMU
   (t) test MMU
   (u) test non-priviledged code
   (I) enable ints
   (i) disable ints
   (H) enable HV dec
   (h) disable HV dec
   (q) poweroff
<press e>
Testing exception handling...
sc(feed) => 0xFEEDFACE
Choices: (MMU = disabled):
   (d) 5s delay
   (e) test exception
   (n) test nested exception
   (f) dump FDT
   (M) enable MMU
   (m) disable MMU
   (t) test MMU
   (u) test non-priviledged code
   (I) enable ints
   (i) disable ints
   (H) enable HV dec
   (h) disable HV dec
   (q) poweroff
<press t>
EA 0xFFFFFFF000 -> hash 0xFFFFFFF -> pteg 0x3FF8 = RA 0x20010000
mapped 0xFFFFFFF000 to 0x20010000 correctly
EA 0xFFFFFFF000 -> hash 0xFFFFFFF -> pteg 0x3FF8 = unmap
EA 0xFFFFFFF000 -> hash 0xFFFFFFF -> pteg 0x3FF8 = RA 0x20011000
mapped 0xFFFFFFF000 to 0x20011000 incorrectly
EA 0xFFFFFFF000 -> hash 0xFFFFFFF -> pteg 0x3FF8 = unmap
Choices: (MMU = disabled):
   (d) 5s delay
   (e) test exception
   (n) test nested exception
   (f) dump FDT
   (M) enable MMU
   (m) disable MMU
   (t) test MMU
   (u) test non-priviledged code
   (I) enable ints
   (i) disable ints
   (H) enable HV dec
   (h) disable HV dec
   (q) poweroff
<press u>
EA 0xFFFFFFF000 -> hash 0xFFFFFFF -> pteg 0x3FF8 = RA 0x20080000
returning to user code
returning to kernel code
EA 0xFFFFFFF000 -> hash 0xFFFFFFF -> pteg 0x3FF8 = unmap

I also tested the other functions and they all seem to work. Running non-priviledged code with the MMU on works. Dumping the FDT and the 5s delay both worked, although they tend to stress IPMI a lot. The delay seems to correspond well with real time as well.

It does tend to error out and reboot quite often, usually on the menu screen, for reasons that are not clear to me. It usually starts with something entirely uninformative from Hostboot, like this:

1.41801|ERRL|Dumping errors reported prior to registration
  2.89873|Ignoring boot flags, incorrect version 0x0

That may be easy to fix, but again I haven't had time to trace it.

All in all, it's very exciting to see something come out of the simulator and in to real hardware. Hopefully with the proliferation of OpenPOWER hardware, prices will fall and these sorts of systems will become increasingly accessible to people with cool low level projects like this!

Petitboot Autoboot Changes

Tue 02 June 2015

Posted by Samuel Mendoza-Jonas Tue 02 June 2015

The way autoboot behaves in Petitboot has undergone some significant changes recently, so in order to ward off any angry emails lets take a quick tour of how the new system works.

Old & Busted

For some context, here is the old (or current depending on what you're running) section of the configuration screen.

Old Autoboot

This gives you three main options: don't autoboot, autoboot from anything, or autoboot only from a specific device. For the majority of installations this is fine, such as when you have only one default option, or know exactly which device you'll be booting from.

A side note about default options: it is important to note that not all boot options are valid autoboot options. A boot option is only considered for auto-booting if it is marked default, eg. 'set default' in GRUB and 'default' in PXE options.

New Hotness

Below is the new autoboot configuration.

New Autoboot

The new design allows you to specify an ordered list of autoboot options. The last two of the three buttons are self explanatory - clear the list and autoboot any device, or clear the list completely (no autoboot).

Selecting the first button, 'Add Device' brings up the following screen:

Device Selection

From here you can select any device or class of device to add to the boot order. Once added to the boot order, the order of boot options can be changed with the left and right arrow keys, and removed from the list with the minus key ('-').

This allows you to create additional autoboot configurations such as "Try to boot from sda2, otherwise boot from the network", or "Give priority to PXE options from eth0, otherwise try any other netboot option". You can retain the original behaviour by only putting one option into the list (either 'Any Device' or a specific device).

Presently you can add any option into the list and order them how you like - which means you can do silly things like this:

If you send me a bug report with this in it I may laugh at you

IPMI

Slightly prior to the boot order changes Petitboot also received an update to its IPMI handling. IPMI 'bootdev' commands allow you to override the current autoboot configuration remotely, either by specifying a device type to boot (eg. PXE), or by forcing Petitboot to boot into the 'setup' or 'safe' modes. IPMI overrides are either persistent or non-persistent. A non-persistent override will disappear after a successful boot - that is, a successful boot of a boot option, not booting to Petitboot itself - whereas a persistent override will, well, persist!

If there is an IPMI override currently active, it will appear in the configuration screen with an option to manually clear it:

IPMI Overrides

That sums up the recent changes to autoboot; a bit more flexibility in assigning priority, and options for more detailed autoboot order if you need it. New versions of Petitboot are backwards compatible and will recognise older saved settings, so updating your firmware won't cause your machines to start booting things at random.

Joining the CAPI project

Wed 27 May 2015

Posted by Daniel Axtens Wed 27 May 2015

(I wrote this blog post a couple of months ago, but it's still quite relevant.)

Hi, I'm Daniel! I work in OzLabs, part of IBM's Australian Development Labs. Recently, I've been assigned to the CAPI project, and I've been given the opportunity to give you an idea of what this is, and what I'll be up to in the future!

What even is CAPI?

To help you understand CAPI, think back to the time before computers. We had a variety of machines: machines to build things, to check things, to count things, but they were all specialised --- good at one and only one thing.

Specialised machines, while great at their intended task, are really expensive to develop. Not only that, it's often impossible to change how they operate, even in very small ways.

Computer processors, on the other hand, are generalists. They are cheap. They can do a lot of things. If you can break a task down into simple steps, it's easy to get them to do it. The trade-off is that computer processors are incredibly inefficient at everything.

Now imagine, if you will, that a specialised machine is a highly trained and experienced professional, a computer processor is a hungover university student.

Over the years, we've tried lots of things to make student faster. Firstly, we gave the student lots of caffeine to make them go as fast as they can. That worked for a while, but you can only give someone so much caffeine before they become unreliable. Then we tried teaming the student up with another student, so they can do two things at once. That worked, so we added more and more students. Unfortunately, lots of tasks can only be done by one person at a time, and team-work is complicated to co-ordinate. We've also recently noticed that some tasks come up often, so we've given them some tools for those specific tasks. Sadly, the tools are only useful for those specific situations.

Sometimes, what you really need is a professional.

However, there are a few difficulties in getting a professional to work with uni students. They don't speak the same way; they don't think the same way, and they don't work the same way. You need to teach the uni students how to work with the professional, and vice versa.

Previously, developing this interface – this connection between a generalist processor and a specialist machine – has been particularly difficult. The interface between processors and these specialised machines – known as accelerators – has also tended to suffer from bottlenecks and inefficiencies.

This is the problem CAPI solves. CAPI provides a simpler and more optimised way to interface specialised hardware accelerators with IBM's most recent line of processors, POWER8. It's a common 'language' that the processor and the accelerator talk, that makes it much easier to build the hardware side and easier to program the software side. In our Canberra lab, we're working primarily on the operating system side of this. We are working with some external companies who are building CAPI devices and the optimised software products which use them.

From a technical point of view, CAPI provides coherent access to system memory and processor caches, eliminating a major bottleneck in using external devices as accelerators. This is illustrated really well by the following graphic from an IBM promotional video. In the non-CAPI case, you can see there's a lot of data (the little boxes) stalled in the PCIe subsystem, whereas with CAPI, the accelerator has direct access to the memory subsystem, which makes everything go faster.

Slide showing CAPI's memory access

Uses of CAPI

CAPI technology is already powering a few really cool products.

Firstly, we have an implementation of Redis that sits on top of flash storage connected over CAPI. Or, to take out the buzzwords, CAPI lets us do really, really fast NoSQL databases. There's a video online giving more details.

Secondly, our partner Mellanox is using CAPI to make network cards that run at speeds of up to 100Gb/s.

CAPI is also part of IBM's OpenPOWER initiative, where we're trying to grow a community of companies around our POWER system designs. So in many ways, CAPI is both a really cool technology, and a brand new ecosystem that we're growing here in the Canberra labs. It's very cool to be a part of!

OpenPOWER Powers Forward

Thu 21 May 2015

Posted by Cyril Bur Thu 21 May 2015

I wrote this blog post late last year, it is very relevant for this blog though so I'll repost it here.

With the launch of TYAN's OpenPOWER reference system now is a good time to reflect on the team responsible for so much of the research, design and development behind this very first ground breaking step of OpenPOWER with their start to finish involvement of this new Power platform.

ADL Canberra have been integral to the success of this launch providing the Open Power Abstraction Layer (OPAL) firmware. OPAL breathes new life into Linux on Power finally allowing Linux to run on directly on the hardware. While OPAL harnesses the hardware, ADL Canberra significantly improved Linux to sit on top and take direct control of IBMs new Power8 processor without needing to negotiate with a hypervisor. With all the Linux expertise present at ADL Canberra it's no wonder that a Linux based bootloader was developed to make this system work. Petitboot leverage's all the resources of the Linux kernel to create a light, fast and yet extremely versatile bootloader. Petitboot provides a massive amount of tools for debugging and system configuration without the need to load an operating system.

TYAN have developed great and highly customisable hardware. ADL Canberra have been there since day 1 performing vital platform enablement (bringup) of this new hardware. ADL Canberra have put all the work into the entire software stack, low level work to get OPAL and Linux to talk to the new BMC chip as well as the higher level, enabling to run Linux in either endian and Linux is even now capable of virtualising KVM guests in either endian irrespective of host endian. Furthermore a subset of ADL Canberra have been key to getting the Coherent Accelerator Processor Interface (CAPI) off the ground, enabling more almost endless customisation and greater diversity within the OpenPOWER ecosystem.

ADL Canberra is the home for Linux on Power and the beginning of the OpenPOWER hardware sees much of the hard work by ADL Canberra come to fruition.

Newer →

Store Halfword Byte-Reverse Indexed

A Power Technical Blog