Store Half Byte-Reverse Indexed

A Power Technical Blog

Visual Studio Code for Linux kernel development

Here we are again - back in 2016 I wrote an article on using Atom for kernel development, but I didn't stay using it for too long, instead moving back to Emacs. Atom had too many shortcomings - it had that distinctive Electron feel, which is a problem for a text editor - you need it to be snappy. On top of that, vim support was mediocre at best, and even as a vim scrub I would find myself trying to do things that weren't implemented.

So in the meantime I switched to spacemacs, which is a very well integrated "vim in Emacs" experience, with a lot of opinionated (but good) defaults. spacemacs was pretty good to me but had some issues - disturbingly long startup times, mediocre completions and go-to-definitions, and integrating any module into spacemacs that wasn't already integrated was a big pain.

After that I switched to Doom Emacs, which is like spacemacs but faster and closer to Emacs itself. It's very user configurable but much less user friendly, and I didn't really change much as my elisp-fu is practically non-existent. I was decently happy with this, but there were still some issues, some of which are just inherent to Emacs itself - like no actually usable inbuilt terminal, despite having (at least) four of them.

Anyway, since 2016 when I used Atom, Visual Studio Code (henceforth referred to as Code) came along and ate its lunch, using the framework (Electron) that was created for Atom. I did try it years ago, but I was very turned off by its Microsoft-ness, it seeming lack of distinguishing features from Atom, and it didn't feel like a native editor at all. Since it's massively grown in popularity since then, I decided I'd give it a try.

Visual Studio Code

Vim emulation

First things first for me is getting a vim mode going, and Code has a pretty good one of those. The key feature for me is that there's Neovim integration for Ex-commands, filling a lot of shortcomings that come with most attempts at vim emulation. In any case, everything I've tried to do that I'd do in vim (or Emacs) has worked, and there are a ton of options and things to tinker with. Obviously it's not going to do as much as you could do with Vimscript, but it's definitely not bad.

Theming and UI customisation

As far as the editor goes - it's good. A ton of different themes, you can change the colour of pretty much everything in the config file or in the UI, including icons for the sidebar. There's a huge sore point though, you can't customise the interface outside the editor pretty much at all. There's an extension for loading custom CSS, but it's out of the way, finnicky, and if I wanted to write CSS I wouldn't have become a kernel developer.

Extensibility

Extensibility is definitely a strong point, the ecosystem of extensions is good. All the language extensions I've tried have been very fully featured with a ton of different options, integration into language-specific linters and build tools. This is probably Code's strongest feature - the breadth of the extension ecosystem and the level of quality found within.

Kernel development

Okay, let's get into the main thing that matters - how well does the thing actually edit code. The kernel is tricky. It's huge, it has its own build system, and in my case I build it with cross compilers for another architecture. Also, y'know, it's all in C and built with make, not exactly great for any kind of IDE integration.

The first thing I did was check out the vscode-linux-kernel project by GitHub user "amezin", which is a great starting point. All you have to do is clone the repo, build your kernel (with a cross compiler works fine too), and run the Python script to generate the compile_commands.json file. Once you've done this, go-to-definition (gd in vim mode) works pretty well. It's not flawless, but it does go cross-file, and will pop up a nice UI if it can't figure out which file you're after.

Code has good built-in git support, so actions like staging files for a commit can be done from within the editor. Ctrl-P lets you quickly navigate to any file with fuzzy-matching (which is impressively fast for a project of this size), and Ctrl-Shift-P will let you search commands, which I've been using for some git stuff.

git command completion in Code

There are some rough edges, though. Code is set on what so many modern editors are set on, which is the "one window per project" concept - so to get things working the way you want, you would open your kernel source as the current project. This makes it a pain to just open something else to edit, like some script, or checking the value of something in firmware, or chucking something in your bashrc.

Auto-triggering builds on change isn't something that makes a ton of sense for the kernel, and it's not present here. The kernel support in the repo above is decent, but it's not going to get you close to what more modern languages can get you in an editor like this.

Oh, and it has a powerpc assembly extension, but I didn't find it anywhere near as good as the one I "wrote" for Atom (I just took the x86 one and switched the instructions), so I'd rather use the C mode.

Terminal

Code has an actually good inbuilt terminal that uses your login shell. You can bring it up with Ctrl-`. The biggest gripe I have always had with Emacs is that you can never have a shell that you can actually do anything in, whether it's eshell or shell or term or ansi-term, you try to do something in it and it doesn't work or clashes with some Emacs command, and then when you try to do something Emacs-y in there it doesn't work. No such issue is present here, and it's a pleasure to use for things like triggering a remote build or doing some git operation you don't want to do with commands in the editor itself.

Not the most important feature, but I do like not having to alt-tab out and lose focus.

Well...is it good?

Yeah, it is. It has shortcomings, but installing Code and using the repo above to get started is probably the simplest way to get a competent kernel development environment going, with more features than most kernel developers (probably) have in their editors. Code is open source and so are its extensions, and it'd be the first thing I recommend to new developers who aren't already super invested into vim or Emacs, and it's worth a try if you have gripes with your current environment.

Article Review: Curing the Vulnerable Parser

Every once in a while I read papers or articles. Previously, I've just read them myself, but I was wondering if there were more useful things I could do beyond that. So I've written up a summary and my thoughts on an article I read - let me know if it's useful!

I recently read Curing the Vulnerable Parser: Design Patterns for Secure Input Handling (Bratus, et al; USENIX ;login: Spring 2017). It's not a formal academic paper but an article in the Usenix magazine, so it doesn't have a formal abstract I can quote, but in short it takes the long history of parser and parsing vulnerabilities and uses that as a springboard to talk about how you could design better ones. It introduces a toolkit based on that design for more safely parsing some binary formats.

Background

It's worth noting early on that this comes out of the LangSec crowd. They have a pretty strong underpinning philosophy:

The Language-theoretic approach (LANGSEC) regards the Internet insecurity epidemic as a consequence of ad hoc programming of input handling at all layers of network stacks, and in other kinds of software stacks. LANGSEC posits that the only path to trustworthy software that takes untrusted inputs is treating all valid or expected inputs as a formal language, and the respective input-handling routines as a recognizer for that language. The recognition must be feasible, and the recognizer must match the language in required computation power.

A big theme in this article is predictability:

Trustworthy input is input with predictable effects. The goal of input-checking is being able to predict the input’s effects on the rest of your program.

This seems sensible enough at first, but leads to some questionable assertions, such as:

Safety is predictability. When it's impossible to predict what the effects of the input will be (however valid), there is no safety.

They follow this with an example of Ethereum contracts stealing money from the DAO. The example is compelling enough, but again comes with a very strong assertion about the impossibility of securing a language virtual machine:

From the viewpoint of language-theoretic security, a catastrophic exploit in Ethereum was only a matter of time: one can only find out what such programs do by running them. By then it is too late.

I'm not sure that (a) I buy the assertions, or that (b) they provide a useful way to deal with the world as we find it.

Is this even correct?

You can tease out 2 contentions in the first part of the article:

  • there should be a formal language that describes the data, and
  • this language should be as simple as possible, ideally being regular and context-free.

Neither of these are bad ideas - in fact they're both good ideas - but I don't know that I draw the same links between them and security.

Consider PostScript as a possible counter-example. It's a Turing-complete language, so it absolutely cannot have predictable results. It has a well documented specification and executes in a restricted virtual machine. So let's say that it satisfies only the first plank of their argument.

I'd say that PostScript has a good security record, despite being Turing complete. PostScript has been around since 1985 and apart from the recent bugs in GhostScript, it doesn't have a long history of bugs and exploits. Maybe this just because no-one has really looked, or maybe it is possible to have reasonably safe complex languages by restricting the execution environment, as PostScript consciously and deliberately does.

Indeed, if you consider the recent spate of GhostScript bugs, perhaps some may be avoided by stricter compliance with a formal language specification. However, most seem to me to arise from the desirability of implementing some of the PostScript functionality in PostScript itself, and some of the GhostScript-specific, stupendously powerful operators exposed to the language to enable this. The bugs involve tricks to allow a user to get access to these operators. A non-Turing-complete language may be sufficient to prevent these attacks, but it is not necessary: just not doing this sort of meta-programming with such dangerous operators would also have worked. Storing the true values of the security state outside of a language-accessible object would also be good.

Is this a useful way to deal with the world as we find it?

My main problem with the general LangSec approach that this article takes is this: to get to their desired world, we need to rewrite a bunch of things with entirely different language foundations. The article talks about HTML and PDFs as examples of unsafe formats, but I cannot imagine the sudden wholesale replacement of either of these - although I would love to be proven wrong.

Can we get even part of the way with existing standards? Kinda-sorta, but mostly no, and to the authors' credit, they are open about this. They argue that formal definition parsing the language should be the "most restrictive input definition" - they specifically require you to "give up attempting to accept arbitrarily complex data", and call for "subsetting of many protocols, formats, encodings and command languages, including eliminating unneeded variability and introducing determinism and static values".

No doubt we would be in a better place if people took up these ideas for future programs. However, for our current set of programs and use cases, this is probably not tractable in any meaningful way.

The rest of the paper

The rest of the paper is reasonably interesting. Their general theory is that you should build your parsers based on a formal definition of a language, and that the parser should convert the input data to a set of objects, and then your business logic should deal with those objects. This is the 'recognizer pattern', and is illustrated below:

The recognizer pattern: separate code parses input according to a formal grammar, creating valid objects that are passed to the business logic

In short, the article is full of great ideas if you happen to be parsing a simple language, or are designing not just a parser but a full language ecosystem. They do also provide a binary parser toolkit that might be helpful if you are parsing a binary format that can be expressed with a parser combinator.

Overall, however, I think the burden of maintaining old systems is such that a security paradigm that relies on new code is pretty unlikely, and one that relies on new languages is fatally doomed from the outset. New systems should take up these ideas, yes. But I'd really like to see people grappling with how to deal with the complex and irregular languages that we're stuck with (HTML, PDF, etc) in secure ways.

What Do You Mean "No"?

Quite often when building small Linux images having separate user accounts isn't always at the top of the list of things to include. Petitboot is no different; the most common operations like mounting disks, configuring interfaces, and calling kexec all require root and Petitboot generally only exists long enough to boot into the next thing, so why not run it all as root?

The picture is less clear when we start to think about what is possible to do in Petitboot by default. If someone comes across an open Petitboot console they're only a few keystrokes away from wiping disks, changing files, or even flashing firmware. Depending on how your system is used that may or may not be something you care about, but over time there have been a few requests to "add a password screen to Petitboot" to at least make it so that the system isn't open season for whoever sees it.

Enter Password:

The most direct way to avoid this would be to slap a password prompt onto Petitboot before any changes can be made. There are two immediate drawbacks to this:

  • The Petitboot UI still runs as root, and
  • Exiting to the shell gives the user root permissions as well.

There is already a mechanism to prevent the user exiting to the shell, but this puts all of our eggs in the basket of petitboot-nc being a secure program. If a user can accidentally or otherwise find a way to exit or crash the UI then they're immediately in a root shell, and while petitboot-nc is a good UI it was never designed to be a hardened program protecting the system.

You Have No Power Here

The idea instead as of Petitboot v1.10.0 is not to care if the user drops to the shell; because now it's completely unprivileged.

Normal shell

The only process now that runs as root is pb-discover itself; the console, UI, and helper scripts run as a new 'petituser'. For the server and clients to still communicate the "petitiboot.ui" socket permissions are modified to allow processes that are part of the 'petitgroup' to connect. However now if pb-discover notices that a client in the petitgroup is connecting (or more accurately the client isn't running as root) by default it ignores any commands from it that would configure or boot the system.

A new command, PB_PROTOCOL_ACTION_AUTHENTICATE, lets a client send a password to the server to then be allowed to send all the usual commands like updating the config or booting a specific option. This keeps all the authentication on the server side, avoiding writing any "secure" ncurses code. In the UI the biggest difference is that when trying to change something the user will hit a password field:

Denied

Then the password is sent to the server, checked, and if correct the action goes ahead.

Whose Passwords?

But where does this password come from? Technically it's just the root password. The server computes a hash of the supplied password and compares it against the system's root password. Similarly in the shell the user can run sudo with the root password to enter a full shell if needed:

Oops

Petitboot of course runs in memory, and writing a root password into the image itself would mean recompiling to change the password, so instead Petitboot pulls the root password from NVRAM. On startup Petitboot reads the petitboot,password parameter which is the hash of the root password and updates /etc/shadow with it. This happens before any clients are up or can connect to the server.

Don't Panic

By default no password is set. After all we don't want people upgrading and then being somehow locked out of their system. For ease of use, and for testing-purposes, if no password is configured and the user drops to the shell it is automatically upgraded to a root shell:

Elevated

To set a password there is a new subscreen in System Configuration:

New Password

This sends an authentication command to the server, and assuming the client is authenticated with the current password as well pb-discover updates the shadow file, and writes the hash back to NVRAM.


User support exists in Petitboot v1.10.0 onwards. You will also need some support in your build system to set up the users, see how op-build did it for an example.

There are a few items on the TODO list still which would be good to have. For example storing the password hash in an attached TPM if available, as well as splitting out more of what runs as root; for example the bootloader parsers in pb-discover preferably wouldn't run with root privileges, but they are all part of the one binary.

As always, comments, suggestions, and patches welcome on the list!

IPMI: Initiating Better Overrides

On platforms that support it Petitboot can interact with the inband IPMI interface to pull information from the BMC. One particularly useful example of this is the "Get System Boot Options" command which we use to implement boot "overrides". By setting parameter 5 of the command a user can remotely force Petitboot to boot from only one class of device or disable autoboot completely. This is great for automation or debug purposes, but since it can only specify device types like "disk" or "network" it can't be used to boot from specific devices.

Introducing..

The Boot Initiator Mailbox

Alexander Amelkin pointed out that the "Get System Boot Options" command also specifies parameter 7, "Boot Initiator Mailbox". This parameter just defines a region of vendor-defined data that can be used to influence the booting behaviour of the system. The parameter description specifies that a BMC must support at least 80 bytes of data in that mailbox so as Alex pointed out we could easily use it to set a partition UUID. But why stop there? Let's go further and use the mailbox to provide an alterate "petitboot,bootdevs=.." parameter and let a user set a full substitute boot order!

The Mailbox Format

Parameter 7 has two fields, 1 byte for the "set selector", and up to 16 bytes of "block data". The spec sets the minimum amount of data to support at 80 bytes, which means a BMC must support at least 5 of these 16-byte "blocks" which can be individually accessed via the set selector. Aside from the first 3 bytes which must be an IANA ID number, the rest of the data is defined by us.

So if we want to set an alternate Petitboot boot order such as "network, usb, disk", the format of the mailbox would be:

Block # |       Block Data              |
-----------------------------------------
0       |2|0|0|p|e|t|i|t|b|o|o|t|,|b|o|o|
1       |t|d|e|v|s|=|n|e|t|w|o|r|k| |u|s|
2       |b| |d|i|s|k| | | | | | | | | | |
3       | | | | | | | | | | | | | | | | |
4       | | | | | | | | | | | | | | | | |

Where the string is null-terminated, 2,0,0 is the IBM IANA ID, and the contents of any remaining data is not important. The ipmi-mailbox-config.py script constructs and sends the required IPMI commands from a given parameter string to make this easier, eg:

./utils/ipmi-mailbox-config.py -b bmc-ip -u user -p pass -m 5 \
        -c "petitboot,bootdevs=uuid:c6e4c4f9-a9a2-4c30-b0db-6fa00f433b3b"

Active Mailbox Override


That is basically all there is to it. Setting a boot order this way overrides the existing order from NVRAM if there is one. Parameter 7 doesn't have a 'persistent' flag so the contents need to either be manually cleared from the BMC or cleared via the "Clear" button in the System Configuration screen.

From the machines I've been able to test on at least AMI BMCs support the mailbox, and hopefully OpenBMC will be able to add it to their IPMI implementation. This is supported in Petitboot as of v1.10.0 so go ahead and try it out!

OpenPOWER Summit Europe 2018: A Software Developer's Introduction to OpenCAPI

Last month, I was in Amsterdam at OpenPOWER Summit Europe. It was great to see so much interest in OpenPOWER, with a particularly strong contingent of researchers sharing how they're exploiting the unique advantages of OpenPOWER platforms, and a number of OpenPOWER hardware partners announcing products.

(It was also my first time visiting Europe, so I had a lot of fun exploring Amsterdam, taking a few days off in Vienna, then meeting some of my IBM Linux Technology Centre colleagues in Toulouse. I also now appreciate just what ~50 hours on planes does to you!)

One particular area which got a lot of attention at the Summit was OpenCAPI, an open coherent high-performance bus interface designed for accelerators, which is supported on POWER9. We had plenty of talks about OpenCAPI and the interesting work that is already happening with OpenCAPI accelerators.

I was invited to present on the Linux Technology Centre's work on enabling OpenCAPI from the software side. In this talk, I outline the OpenCAPI software stack and how you can interface with an OpenCAPI device through the ocxl kernel driver and the libocxl userspace library.

My slides are available, though you'll want to watch the presentation for context.

Apart from myself, the OzLabs team were well represented at the Summit:

Unfortunately none of their videos are up yet, but they'll be there over the next few weeks. Keep an eye on the Summit website and the Summit YouTube playlist, where you'll find all the rest of the Summit content.

If you've got any questions about OpenCAPI feel free to leave a comment!