Store Half Byte-Reverse Indexed

A Power Technical Blog

High Power Lustre

(Most of the hard work here was done by fellow blogger Rashmica - I just verified her instructions and wrote up this post.)

Lustre is a high-performance clustered file system. Traditionally the Lustre client and server have run on x86, but both the server and client will also work on Power. Here's how to get them running.

Server

Lustre normally requires a patched 'enterprise' kernel - normally an old RHEL, CentOS or SUSE kernel. We tested with a CentOS 7.3 kernel. We tried to follow the Intel instructions for building the kernel as much as possible - any deviations we had to make are listed below.

Setup quirks

We are told to edit ~/kernel/rpmbuild/SPEC/kernel.spec. This doesn't exist because the directory is SPECS not SPEC: you need to edit ~/kernel/rpmbuild/SPECS/kernel.spec.

I also found there was an extra quote mark in the supplied patch script after -lustre.patch. I removed that and ran this instead:

for patch in $(<"3.10-rhel7.series"); do \
      patch_file="$HOME/lustre-release/lustre/kernel_patches/patches/${patch}" \
      cat "${patch_file}" >> $HOME/lustre-kernel-x86_64-lustre.patch \
done

The fact that there is 'x86_64' in the patch name doesn't matter as you're about to copy it under a different name to a place where it will be included by the spec file.

Building for ppc64le

Building for ppc64le was reasonably straight-forward. I had one small issue:

[build@dja-centos-guest rpmbuild]$ rpmbuild -bp --target=`uname -m` ./SPECS/kernel.spec
Building target platforms: ppc64le
Building for target ppc64le
error: Failed build dependencies:
       net-tools is needed by kernel-3.10.0-327.36.3.el7.ppc64le

Fixing this was as simple as a yum install net-tools.

This was sufficient to build the kernel RPMs. I installed them and booted to my patched kernel - so far so good!

Building the client packages: CentOS

I then tried to build and install the RPMs from lustre-release. This repository provides the sources required to build the client and utility binaries.

./configure and make succeeded, but when I went to install the packages with rpm, I found I was missing some dependencies:

error: Failed dependencies:
        ldiskfsprogs >= 1.42.7.wc1 is needed by kmod-lustre-osd-ldiskfs-2.9.52_60_g1d2fbad_dirty-1.el7.centos.ppc64le
    sg3_utils is needed by lustre-iokit-2.9.52_60_g1d2fbad_dirty-1.el7.centos.ppc64le
        attr is needed by lustre-tests-2.9.52_60_g1d2fbad_dirty-1.el7.centos.ppc64le
        lsof is needed by lustre-tests-2.9.52_60_g1d2fbad_dirty-1.el7.centos.ppc64le

I was able to install sg3_utils, attr and lsof, but I was still missing ldiskfsprogs.

It seems we need the lustre-patched version of e2fsprogs - I found a mailing list post to that effect.

So, following the instructions on the walkthrough, I grabbed the SRPM and installed the dependencies: yum install -y texinfo libblkid-devel libuuid-devel

I then tried rpmbuild -ba SPECS/e2fsprogs-RHEL-7.spec. This built but failed tests. Some failed because I ran out of disk space - they were using 10s of gigabytes. I found that there were some comments in the spec file about this with suggested tests to disable, so I did that. Even with that fix, I was still failing two tests:

  • f_pgsize_gt_blksize: Intel added this to their fork, and no equivalent exists in the master e2fsprogs branches. This relates to Intel specific assumptions about page sizes which don't hold on Power.
  • f_eofblocks: This may need fixing for large page sizes, see this bug.

I disabled the tests by adding the following two lines to the spec file, just before make %{?_smp_mflags} check.

rm -rf tests/f_pgsize_gt_blksize
rm -rf tests/f_eofblocks

With those tests disabled I was able to build the packages successfully. I installed them with yum localinstall *1.42.13.wc5* (I needed that rather weird pattern to pick up important RPMs that didn't fit the e2fs* pattern - things like libcom_err and libss)

Following that I went back to the lustre-release build products and was able to successfully run yum localinstall *ppc64le.rpm!

Testing the server

After disabling SELinux and rebooting, I ran the test script:

sudo /usr/lib64/lustre/tests/llmount.sh

This spat out one scary warning:

mount.lustre FATAL: unhandled/unloaded fs type 0 'ext3'

The test did seem to succeed overall, and it would seem that is a known problem, so I pressed on undeterred.

I then attached a couple of virtual harddrives for the metadata and object store volumes, and having set them up, proceeded to try to mount my freshly minted lustre volume from some clients.

Testing with a ppc64le client

My first step was to test whether another ppc64le machine would work as a client.

I tried with an existing Ubuntu 16.04 VM that I use for much of my day to day development.

A quick google suggested that I could grab the lustre-release repository and run make debs to get Debian packages for my system.

I needed the following dependencies:

sudo apt install module-assistant debhelper dpatch libsnmp-dev quilt

With those the packages built successfully, and could be easily installed:

dpkg -i lustre-client-modules-4.4.0-57-generic_2.9.52-60-g1d2fbad-dirty-1_ppc64el.deblustre-utils_2.9.52-60-g1d2fbad-dirty-1_ppc64el.deb

I tried to connect to the server:

sudo mount -t lustre $SERVER_IP@tcp:/lustre /lustre/

Initially I wasn't able to connect to the server at all. I remembered that (unlike Ubuntu), CentOS comes with quite an aggressive firewall by default. I ran the following on the server:

systemctl stop firewalld

And voila! I was able to connect, mount the lustre volume, and successfully read and write to it. This is very much an over-the-top hack - I should have poked holes in the firewall to allow just the ports lustre needed. This is left as an exercise for the reader.

Testing with an x86_64 client

I then tried to run make debs on my Ubuntu 16.10 x86_64 laptop.

This did not go well - I got the following error:

liblustreapi.c: In function ‘llapi_get_poollist’:
liblustreapi.c:1201:3: error: ‘readdir_r’ is deprecated [-Werror=deprecated-declarations]

This looks like one of the new errors introduced in recent GCC versions, and is a known bug. To work around it, I found the following stanza in a lustre/autoconf/lustre-core.m4, and removed the -Werror:

AS_IF([test $target_cpu == "i686" -o $target_cpu == "x86_64"],
        [CFLAGS="$CFLAGS -Wall -Werror"])

Even this wasn't enough: I got the following errors:

/home/dja/dev/lustre-release/debian/tmp/modules-deb/usr_src/modules/lustre/lustre/llite/dcache.c:387:22: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
         .d_compare = ll_dcompare,
                  ^~~~~~~~~~~
/home/dja/dev/lustre-release/debian/tmp/modules-deb/usr_src/modules/lustre/lustre/llite/dcache.c:387:22: note: (near initialization for ‘ll_d_ops.d_compare’)

I figured this was probably because Ubuntu 16.10 has a 4.8 kernel, and Ubuntu 16.04 has a 4.4 kernel. Work on supporting 4.8 is ongoing.

Sure enough, when I fired up a 16.04 x86_64 VM with a 4.4 kernel, I was able to build and install fine.

Connecting didn't work first time - the guest failed to mount, but I did get the following helpful error on the server:

LNetError: 2595:0:(acceptor.c:406:lnet_acceptor()) Refusing connection from 10.61.2.227: insecure port 1024

Refusing insecure port 1024 made me thing that perhaps the NATing that qemu was performing for me was interfering - perhaps the server expected to get a connection where the source port was privileged, and qemu wouldn't be able to do that with NAT.

Sure enough, switching NAT to bridging was enough to get the x86 VM to talk to the ppc64le server. I verified that ls, reading and writing all succeeded.

Next steps

The obvious next steps are following up the disabled tests in e2fsprogs, and doing a lot of internal performance and functionality testing.

Happily, it looks like Lustre might be in the mainline kernel before too long - parts have already started to go in to staging. This will make our lives a lot easier: for example, the breakage between 4.4 and 4.8 would probably have already been picked up and fixed if it was the main kernel tree rather than an out-of-tree patch set.

In the long run, we'd like to make Lustre on Power just as easy as Lustre on x86. (And, of course, more performant!) We'll keep you up to date!

(Thanks to fellow bloggers Daniel Black and Andrew Donnellan for useful feedback on this post.)

NAMD on NVLink

NAMD is a molecular dynamics program that can use GPU acceleration to speed up its calculations. Recent OpenPOWER machines like the IBM Power Systems S822LC for High Performance Computing (Minsky) come with a new interconnect for GPUs called NVLink, which offers extremely high bandwidth to a number of very powerful Nvidia Pascal P100 GPUs. So they're ideal machines for this sort of workload.

Here's how to set up NAMD 2.12 on your Minsky, and how to debug some common issues. We've targeted this script for CentOS, but we've successfully compiled NAMD on Ubuntu as well.

Prerequisites

GPU Drivers and CUDA

Firstly, you'll need CUDA and the NVidia drivers.

You can install CUDA by following the instructions on NVidia's CUDA Downloads page.

yum install epel-release
yum install dkms
# download the rpm from the NVidia website
rpm -i cuda-repo-rhel7-8-0-local-ga2-8.0.54-1.ppc64le.rpm
yum clean expire-cache
yum install cuda
# this will take a while...

Then, we set up a profile file to automatically load CUDA into our path:

cat >  /etc/profile.d/cuda_path.sh <<EOF
# From http://developer.download.nvidia.com/compute/cuda/8.0/secure/prod/docs/sidebar/CUDA_Quick_Start_Guide.pdf - 4.4.2.1
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
EOF

Now, open a new terminal session and check to see if it works:

cuda-install-samples-8.0.sh ~
cd ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/bandwidthTest
make && ./bandwidthTest

If you see a figure of ~32GB/s, that means NVLink is working as expected. A figure of ~7-8GB indicates that only PCI is working, and more debugging is required.

Compilers

You need a c++ compiler:

yum install gcc-c++

Building NAMD

Once CUDA and the compilers are installed, building NAMD is reasonably straightforward. The one hitch is that because we're using CUDA 8.0, and the NAMD build scripts assume CUDA 7.5, we need to supply an updated Linux-POWER.cuda file. (We also enable code generation for the Pascal in this file.)

We've documented the entire process as a script which you can download. We'd recommend executing the commands one by one, but if you're brave you can run the script directly.

The script will fetch NAMD 2.12 and build it for you, but won't install it. It will look for the CUDA override file in the directory you are running the script from, and will automatically move it into the correct place so it is picked up by the build system..

The script compiles for a single multicore machine setup, rather than for a cluster. However, it should be a good start for an Ethernet or Infiniband setup.

If you're doing things by hand, you may see some errors during the compilation of charm - as long as you get charm++ built successfully. at the end, you should be OK.

Testing NAMD

We have been testing NAMD using the STMV files available from the NAMD website:

cd NAMD_2.12_Source/Linux-POWER-g++
wget http://www.ks.uiuc.edu/Research/namd/utilities/stmv.tar.gz
tar -xf stmv.tar.gz
sudo ./charmrun +p80 ./namd2 +pemap 0-159:2 +idlepoll +commthread stmv/stmv.namd

This binds a namd worker thread to every second hardware thread. This is because hardware threads share resources, so using every hardware thread costs overhead and doesn't give us access to any more physical resources.

You should see messages about finding and using GPUs:

Pe 0 physical rank 0 binding to CUDA device 0 on <hostname>: 'Graphics Device'  Mem: 4042MB  Rev: 6.0

This should be significantly faster than on non-NVLink machines - we saw a gain of about 2x in speed going from a machine with Nvidia K80s to a Minsky. If things aren't faster for you, let us know!

Downloads

Other notes

Namd requires some libraries, some of which they supply as binary downloads on their website. Make sure you get the ppc64le versions, not the ppc64 versions, otherwise you'll get errors like:

/bin/ld: failed to merge target specific data of file .rootdir/tcl/lib/libtcl8.5.a(regfree.o)
/bin/ld: .rootdir/tcl/lib/libtcl8.5.a(regerror.o): compiled for a big endian system and target is little endian
/bin/ld: failed to merge target specific data of file .rootdir/tcl/lib/libtcl8.5.a(regerror.o)
/bin/ld: .rootdir/tcl/lib/libtcl8.5.a(tclAlloc.o): compiled for a big endian system and target is little endian

The script we supply should get these right automatically.

linux.conf.au 2017 review

I recently attended LCA 2017, where I gave a talk at the Linux Kernel miniconf (run by fellow sthbrx blogger Andrew Donnellan!) and a talk at the main conference.

I received some really interesting feedback so I've taken the opportunity to write some of it down to complement the talk videos and slides that are online. (And to remind me to follow up on it!)

Miniconf talk: Sparse Warnings

My kernel miniconf talk was on sparse warnings (pdf slides, 23m video).

The abstract read (in part):

sparse is a semantic parser for C, and is one of the static analysis tools available to kernel devs.

Sparse is a powerful tool with good integration into the kernel build system. However, we suffer from warning overload - there are too many sparse warnings to spot the serious issues amongst the trivial. This makes it difficult to use, both for developers and maintainers.

Happily, I received some feedback that suggests it's not all doom and gloom like I had thought!

  • Dave Chinner told me that the xfs team uses sparse regularly to make sure that the file system is endian-safe. This is good news - we really would like that to be endian-safe!

  • Paul McKenney let me know that the 0day bot does do some sparse checking - it would just seem that it's not done on PowerPC.

Main talk: 400,000 Ephemeral Containers

My main talk was entitled "400,000 Ephemeral Containers: testing entire ecosystems with Docker". You can read the abstract for full details, but it boils down to:

What if you want to test how all the packages in a given ecosystem work in a given situation?

My main example was testing how many of the Ruby packages successfully install on Power, but I also talk about other languages and other cool tests you could run.

The 44m video is online. I haven't put the slides up yet but they should be available on GitHub soonish.

Unlike with the kernel talk, I didn't catch the names of most of the people with feedback.

Docker memory issues

One of the questions I received during the talk was about running into memory issues in Docker. I attempted to answer that during the Q&A. The person who asked the question then had a chat with me afterwards, and it turns out I had completely misunderstood the question. I thought it was about memory usage of running containers in parallel. It was actually about memory usage in the docker daemon when running lots of containers in serial. Apparently the docker daemon doesn't free memory during the life of the process, and the question was whether or not I had observed that during my runs.

I didn't have a good answer for this at the time other than "it worked for me", so I have gone back and looked at the docker daemon memory usage.

After a full Ruby run, the daemon is using about 13.9G of virtual memory, and 1.975G of resident memory. If I restart it, the memory usage drops to 1.6G of virtual and 43M of resident memory. So it would appear that the person asking the question was right, and I'm just not seeing it have an effect.

Other interesting feedback

  • Someone was quite interested in testing on Sparc, once they got their Go runtime nailed down.

  • A Rackspacer was quite interested in Python testing for OpenStack - this has some intricacies around Py2/Py3, but we had an interesting discussion around just testing to see if packages that claim Py3 support provide Py3 support.

  • A large jobs site mentioned using this technique to help them migrate their dependencies between versions of Go.

  • I was 'gently encouraged' to try to do better with how long the process takes to run - if for no other reason than to avoid burning more coal. This is a fair point. I did not explain very well what I meant with diminishing returns in the talk: there's lots you could do to make the process faster, it's just comes at the cost of the simplicity that I really wanted when I first started the project. I am working (on and off) on better ways to deal with this by considering the dependency graph.

Extracting Early Boot Messages in QEMU

Be me, you're a kernel hacker, you make some changes to your kernel, you boot test it in QEMU, and it fails to boot. Even worse is the fact that it just hangs without any failure message, no stack trace, no nothing. "Now what?" you think to yourself.

You probably do the first thing you learnt in debugging101 and add abundant print statements all over the place to try and make some sense of what's happening and where it is that you're actually crashing. So you do this, you recompile your kernel, boot it in QEMU and lo and behold, nothing... What happened? You added all these shiny new print statements, where did the output go? The kernel still failed to boot (obviously), but where you were hoping to get some clue to go on you were again left with an empty screen. "Maybe I didn't print early enough" or "maybe I got the code paths wrong" you think, "maybe I just need more prints" even. So lets delve a bit deeper, why didn't you see those prints, where did they go, and how can you get at them?

__log_buf

So what happens when you call printk()? Well what normally happens is, depending on the log level you set, the output is sent to the console or logged so you can see it in dmesg. But what happens if we haven't registered a console yet? Well then we can't print the message can we, so its logged in a buffer, kernel log buffer to be exact helpfully named __log_buf.

Console Registration

So how come I eventually see print statements on my screen? Well at some point during the boot process a console is registered with the printk system, and any buffered output can now be displayed. On ppc it happens that this occurs in register_early_udbg_console() called in setup_arch() from start_kernel(), which is the generic kernel entry point. From this point forward when you print something it will be displayed on the console, but what if you crash before this? What are you supposed to do then?

Extracting Early Boot Messages in QEMU

And now the moment you've all been waiting for, how do I extract those early boot messages in QEMU if my kernel crashes before the console is registered? Well it's quite simple really, QEMU is nice enough to allow us to dump guest memory, and we know the log buffer is in there some where, so we just need to dump the correct part of memory which corresponds to the log buffer.

Locating __log_buf

Before we can dump the log buffer we need to know where it is. Luckily for us this is fairly simple, we just need to dump all the kernel symbols and look for the right one.

> nm vmlinux > tmp; grep __log_buf tmp;
c000000000f5e3dc b __log_buf

We use the nm tool to list all the kernel symbols and output this into some temporary file, we can then grep this for the log buffer (which we know to be named __log_buf), and presto we are told that it's at kernel address 0xf5e3dc.

Dumping Guest Memory

It's then simply a case of dumping guest memory from the QEMU console. So first we press ^a+c to get us to the QEMU console, then we can use the aptly named dump-guest-memory.

> help dump-guest-memory
dump-guest-memory [-p] [-d] [-z|-l|-s] filename [begin length] -- dump guest memory into file 'filename'.
            -p: do paging to get guest's memory mapping.
            -d: return immediately (do not wait for completion).
            -z: dump in kdump-compressed format, with zlib compression.
            -l: dump in kdump-compressed format, with lzo compression.
            -s: dump in kdump-compressed format, with snappy compression.
            begin: the starting physical address.
            length: the memory size, in bytes.

We just give it a filename for where we want our output to go, we know the starting address, we just don't know the length. We could choose some arbitrary length, but inspection of the kernel code shows us that:

#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);

Looking at the pseries_defconfig file shows us that the LOG_BUF_SHIFT is set to 18, and thus we know that the buffer is 2^18 bytes or 256kb. So now we run:

> dump-guest-memory tmp 0xf5e3dc 262144

And we now get our log buffer in the file tmp. This can simply be viewed with:

> hexdump -C tmp

This gives a readable, if poorly formatted output. I'm sure you can find something better but I'll leave that as an exercise for the reader.

Conclusion

So if like me your kernel hangs somewhere early in the boot process and you're left without your console output you are now fully equipped to extract the log buffer in QEMU and hopefully therein lies the answer to why you failed to boot.

Installing Centos 7.2 on IBM Power System's S822LC for High Performance Computing (Minksy) with USB device

Introduction

If you are installing Linux on your IBM Power System's S822LC server then the instructions in this article will help you to start and run your system. These instructions are specific to installing CentOS 7 on an IBM Power System S822LC for High Performance Computing (Minsky), but also work for RHEL 7 - just swap CentOS for RHEL.

Prerequisites

Before you power on the system, ensure that you have the following items:

  • Ethernet cables;
  • USB storage device of 7G or greater;
  • An installed ethernet network with a DHCP server;
  • Access to the DHCP server's logs;
  • Power cords and outlet for your system;
  • PC or notebook that has IPMItool level 1.8.15 or greater; and
  • a VNC client.

Download CentOS ISO file from the Centos Mirror. Select the "Everything" ISO file.

Note: You must use the 1611 release (dated 2016-12-22) or later due to Linux Kernel support for the server hardware.

Step 1: Preparing to power on your system

Follow these steps to prepare your system:

  1. If your system belongs in a rack, install your system into that rack. For instructions, see IBM POWER8 Systems information.
  2. Connect an Ethernet cable to the left embedded Ethernet port next to the serial port on the back of your system and the other end to your network. This Ethernet port is used for the BMC/IPMI interface.
  3. Connect another Enternet cable to the right Ethernet port for network connection for the operating system.
  4. Connect the power cords to the system and plug them into the outlets.

At this point, your firmware is booting.

Step 2: Determining the BMC firmware IP address

To determine the IP address of the BMC, examine the latest DHCP server logs for the network connected to the server. The IP address will be requested approximately 2 minutes after being powered on.

It is possible to set the BMC to a static IP address by following the IBM documentation on IPMI.

Step 3: Connecting to the BMC firmware with IPMItool

After you have a network connection set up for your BMC firmware, you can connect using Intelligent Platform Management Interface (IPMI). IPMI is the default console to use when connecting to the Open Power Abstraction Layer (OPAL) firmware.

Use the default authentication for servers over IPMI is:

  • Default user: ADMIN
  • Default password: admin

To power on your server from a PC or notebook that is running Linux®, follow these steps:

Open a terminal program on your PC or notebook with Activate Serial-Over-Lan using IPMI. Use other steps here as needed.

For the following impitool commands, server_ip_address is the IP address of the BMC from Step 2, and ipmi_user and ipmi_password are the default user ID and password for IPMI.

Power On using IPMI

If your server is not powered on, run the following command to power the server on:

ipmitool -I lanplus -H server_ip_address -U ipmi_user -P ipmi_password chassis power on

Activate Serial-Over-Lan using IPMI

Activate your IPMI console by running this command:

ipmitool -I lanplus -H server_ip_address -U ipmi_user -P ipmi_password sol activate

After powering on your system, the Petitboot interface loads. If you do not interrupt the boot process by pressing any key within 10 seconds, Petitboot automatically boots the first option. At this point the IPMI console will be connected to the Operating Systems serial. If you get to this stage accidently you can deactivate and reboot as per the following two commands.

Deactivate Serial-Over-Lan using IPMI

If you need to power off or reboot your system, deactivate the console by running this command:

ipmitool -I lanplus -H server_ip_address -U user-name -P ipmi_password sol deactivate

Reboot using IPMI

If you need to reboot the system, run this command:

ipmitool -I lanplus -H server_ip_address -U user-name -P ipmi_password chassis power reset

Step 4: Creating a USB device and booting

At this point, your IPMI console should be contain a Petitboot bootloader menu as illustrated below and you are ready to install Centos 7 on your server.

Petitboot menu over IPMI

Use one of the following USB devices:

  • USB attached DVD player with a single USB cable to stay under 1.0 Amps, or
  • 7 GB (or more) 2.0 (or later) USB flash drive.

Follow the following instructions:

  1. To create the bootable USB device, follow the instructions in the CentOS wiki Host to Set Up a USB to Install CentOS.
  2. Insert your bootable USB device into the front USB port. CentOS AltArch installer will automatically appear as a boot option on the Petitboot main screen. If the USB device does not appear select Rescan devices. If your device is not detected, you might have to try a different type.
  3. Arrow up to select the CentOS boot option. Press e (Edit) to open the Petitboot Option Editor window
  4. Move the cursor to the Boot arguments section and to include the following information: ro inst.stage2=hd:LABEL=CentOS_7_ppc64le:/ console=hvc0 ip=dhcp (if using RHEL the LABEL will be similar to RHEL-7.3\x20Server.ppc64le:/)

Petitboot edited "Install CentOS AltArch 7 (64-bit kernel)

Notes about the boot arguments:

  • ip=dhcp to ensure network is started for VNC installation.
  • console hvc0 is needed as this is not the default.
  • inst.stage2 is needed as the boot process won't automatically find the stage2 install on the install disk.
  • append inst.proxy=URL where URL is the proxy URL if installing in a network that requires a proxy to connect externally.

You can find additional options at Anaconda Boot Options.

  1. Select OK to save your options and return to the Main menu
  2. On the Petitboot main screen, select the CentOS AltArch option and then press Enter.

Step 5: Complete your installation

After you select to boot the CentOS installer, the installer wizard walks you through the steps.

  1. If the CentOS installer was able to obtain a network address via DHCP, it will present an option to enable the VNC. If no option is presented check your network cables. VNC option
  2. Select the Start VNC option and it will provide an OS server IP adress. Note that this will be different to the BMC address previously optained. VNC option selected
  3. Run a VNC client program on your PC or notebook and connect to the OS server IP address.

VNC of Installer

During the install over VNC, there are a couple of consoles active. To switch between them in the ipmitool terminal, press ctrl-b and then between 1-4 as indicated.

Using the VNC client program:

  1. Select "Install Destination"
  2. Select a device from "Local Standard Disks"
  3. Select "Full disk summary and boot device"
  4. Select the device again from "Selected Disks" with the Boot enabled
  5. Select "Do not install boot loader" from device. Disabling install of boot loader which results in Result after disabling boot loader install.

Without disabling boot loader, the installer complains about an invalid stage1 device. I suspect it needs a manual Prep partition of 10M to make the installer happy.

If you have a local Centos repository you can set this by selecting "Install Source" - the directories at this url should look like CentOS's Install Source for ppc64le.

Step 6: Before reboot and using the IPMI Serial-Over-LAN

Before reboot, generate the grub.cfg file as Petitboot uses this to generate its boot menu:

  1. Using the ipmitool's shell (ctrl-b 2):
  2. Enter the following commands to generate a grub.cfg file
chroot /mnt/sysimage
rm /etc/grub.d/30_os-prober
grub2-mkconfig -o /boot/grub2/grub.cfg
exit

/etc/grub.d/30_os-prober is removed as Petitboot probes the other devices anyway so including it would create lots of duplicate menu items.

The last step is to restart your system.

Note: While your system is restarting, remove the USB device.

After the system restarts, Petitboot displays the option to boot CentOS 7.2. Select this option and press Enter.

Conclusion

After you have booted CentOS, your server is ready to go! For more information, see the following resources: