Monday, June 6, 2022

Exploiting the Wii U's USB Descriptor parsing

In this write-up we're going to take a look at exploiting the Wii U's USB Host Stack. Over the past few months I spent a lot of time reverse engineering USB related things on the Wii U. 

Overview

The Wii U contains an ARM chip running an embedded operating system called IOSU. IOSU consists of several modules which contain device drivers and other components not handled on the main PPC CPU.

The most relevant module for this write-up is IOS-USB, which handles all USB devices which can be plugged into the external USB ports.
IOS-USB contains UHS which is the USB Host Stack of the Wii U.

USB Descriptor parsing

Every USB device contains several descriptors which describe different information about the device to the host. All devices have a device descriptor and several configuration descriptors. Each descriptor has a size and type field.
After plugging in a device, UHS starts with reading and parsing the device descriptor. The device descriptor contains things like the USB version, the Vendor and Product ID of the device, and the amount of config descriptors.
After that all configuration descriptors are read from the device. The configuration descriptor is a bit more complicated than the device descriptor. Instead of having a fixed sized, the configuration can have any size specified in the wTotalLength field of the configuration descriptor. After the configuration descriptor, multiple interface and endpoint descriptors are appended to the configuration.

UHS will load up to 32 configurations into memory, and then parses all interface and endpoint descriptors of the first configuration.

The bug

To read the full configuration into a buffer, UHS starts with reading the first 9 bytes of the configuration, which contains the configuration descriptor. It then allocates a buffer from the heap with the total configuration size, which is specified in the wTotalLength field. The full configuration is now read into that buffer.

Since all USB descriptor values are stored as little-endian and the ARM is running as big-endian, some descriptor fields need to be byteswapped.
To byteswap all of the endpoint descriptor fields of the configuration the following code is used. This code usually also parses interface descriptors, which has been left out for simplicity.

The loop goes over all endpoint descriptors and byteswaps them until wTotalLength is reached. So, where's the bug?
UHS doesn't verify that wTotalLength matches the total length of the initially read config descriptor, which was used to determine the buffer size. This means the total length can be larger than the actual configuration, which allows pointing endpoint descriptors past the configuration buffer, causing out of bounds byteswaps.

Exploiting a 16-bit byteswap

So can that byteswap be exploited?
There are several devices which make emulating a USB device possible. Microcontrollers like the Raspberry Pi Pico allow full control over descriptors when emulating a USB client
To perform a byteswap of an endpoint descriptor the second byte of the descriptor needs to be 0x05, which is the endpoint descriptor type. Additionally the first byte cannot be 0, otherwise UHS will never break out of the byteswap loop and gets stuck since the offset never increases.
If these conditions are met, bytes 4 and 5 (wMaxPacketSize), containing the maximum packet size of that endpoint, will be swapped.

After looking for days for structures stored after the configuration on the heap which meet these conditions, I gave up. This seems extremely hard, if not impossible to exploit.

After working on other projects for a while, I decided to give this another try. This time I decided to look directly at the heap block headers. 

Heap Blocks

Each allocated block on the heap has a header which contains a magic value, the size of the block, a pointer to the previous block, and a pointer to the next block.

Example of a free heap block

The magic also indicates the state of this heap block. The following magic values are used:
  • 0xBABE0000: Free block
  • 0xBABE0001: Allocated outer block
  • 0xBABE0002: Allocated inner block
The magic never contains a 0x05, the size is rounded to 0x10 and we can only allocates blocks up to 0x10000 bytes in size, so we can't use any of those fields.
What if the previous pointer contains a 0x05 though? This would swap 2 bytes in the next pointer! Ideally the previous pointer should look something like XXXX05XX, this would cause the 2 bytes in the middle of the next pointer (byte 4 and 5 of the endpoint descriptor) to be swapped.

To achieve this we need to prepare the heap a bit. IOSU will merge consecutive free blocks, so we need a way to create "free holes" in the heap.
By placing an endpoint descriptor right at the end of the first configuration and setting the total size accordingly, we can swap 2 bytes in the magic of the next heap block.
Example of swapping the magic of the next configuration buffer

The magic value of Configuration 1's heap block is now
0xBEBA0001 instead of 0xBABE0001. Since the heap state is checked once the block is freed from the heap, this block can no longer be freed.
We can have up to 32 configurations, which will be allocated before parsing the first configuration. If we carefully choose various sizes for these descriptors we can create the ideal heap header. 
Let's connect a device which uses 6 configurations. By controlling the size of buffer 1 we can make sure the address of buffer 2 contains a 0x05This way we can create an ideal heap layout.
After disconnecting our emulated device, the heap looks like this:

The ideal heap layout
Red: corrupted blocks, Green: free Blocks
So what if we reconnect the emulated device and point an endpoint descriptor to the
prev pointer, which now matches the XXXX05XX pattern?
Swapping the next pointer of buffer 4
The
next pointer gets swapped and now points into the middle of the heap!
If we now reconnect the device, the next configuration buffer gets allocated in the middle of the heap, as long as what would be the size field of the memory next is pointing to is large enough. By controlling the size of buffer 5 we can control the address of the free heap block, allowing us to roughly control the next pointer.
This now allows overwriting existing buffers on the heap.

My initial idea was to point the next block directly into the stack, which is also allocated on the heap. Since UHS memsets the buffer after allocation, this would only result in a crash though. So I needed to find some other structure on the heap to overwrite.

UhsCtrlXferMgr

As the name implies, the "UHS Control Transfer Manager (UhsCtrlXferMgr)" manages transfers on the control endpoint of the device. In front of the transfer manager on the heap, is a large buffer where transfers on the control endpoint (Endpoint 0) are transferred to, called pEp0DmaBuf. If we point the next pointer into this buffer, we can overwrite the transfer manager after it.
Since each configuration can "only" be 0xffff bytes in size, we'll use 2 buffers to reach the transfer manager.
Overwriting UhsCtrlXferMgr

With full control over the transfer manager we can insert a custom transfer event into its' transfer queue. This allows us to transfer any amount of data anywhere.
It is now possible to start a transfer into the stack and get kernel code execution using a ROP chain.
The data for the kernel code can simply be placed into one of the existing config descriptors which have been allocated on the heap.
The ROP is based on the one in mocha and extremely similar to the one found in bluubomb, so check out the previous write-up if you're interested in details.

Conclusion

So what am I calling this?
UDPIH (pronounced like "mud pie" without the m), which stands for USB Descriptor Parsing Is Hard, since apparently it's really hard to properly parse these descriptors :P
It is now possible to get IOSU code execution by simply plugging in a Raspberry Pi Pico/Zero or similar device into the console. This even works before the PPC side has booted properly, allowing for things like CBHC bricks to be fixed.

Since everything using IOS-USB will shift around the heap, UDPIH roughly works after you can see the "Wii U" logo. This is right after IOS-NET and IOS-FS have registered their drivers for USB Mass storage and USB Ethernet, and before the Wii U menu has booted and queries connected hard drives.

Additionally to UDPIH itself, I'll also release a simple recovery menu which allows to fix several bricks from the IOSU side.

Hope this is useful for someone :)

Links:

Sunday, May 16, 2021

BluuBomb: Exploiting Bluetooth on the Wii U

BluuBomb allows running an IOSU kernel binary by sending data from an emulated Wii Remote.  

Overview

The Wii U's operating system consists of CafeOS, which runs on the PowerPC, and the IOSU, which itself runs on an ARM chip called the Starbuck. 

While the Wii U has several exploits and entrypoints on the PPC side, the IOSU only has a few exploits and no direct entrypoints.  

To get IOSU code execution you always have to go through the PPC side.

The IOSU consists of several modules. I was working on reverse engineering some IOSU modules to gain some additional knowledge for a project I was working on.  

The most interesting modules for this write-up are IOS-PAD and IOS-KERNEL.  

IOS-PAD handles most of the controller-related things. It communicates with 2 libraries on the PPC side: vpad.rpl and padscore.rpl

I was curious to see how the communication between padscore and IOS-PAD is handled, so I decided to take a look at IOS-PAD and padscore. 

IOS-KERNEL is the kernel of the IOSU (obviously). It has the most permissions in the IOSU and can write to executable memory which is usually read-only.  

The main goal of an IOSU exploit would be to load a custom binary into the kernel.  

Communication between padscore and IOS-PAD

IOS-PAD uses a resource manager named /dev/usb/btrm. Padscore will then use ioctl/ioctlv calls to communicate with it.  

Once padscore is loaded it will set up a dequeue called SMD, which I assume stands for "Simple Message Deque" (?). It uses a shared buffer between the PPC and IOSU and is able to send and receive messages between both of the chips.  

The SMD functions on the PPC side are called smdPpc and are part of the coreinit library, while on the IOSU they're called smdIop.  

The buffer is allocated in padscore and its' address is passed to the IOSU using btrm's ioctl call 1.

Once an HID-report is received via Bluetooth it's copied to the stack and passed to the PPC via a smdIopSendMessage call.  

The bug

When an HID report is received a function called bta_hh_co_data is called.  

That function looks something like this:

void bta_hh_co_data(uint8_t dev_handle, void *p_rpt, uint16_t len, uint8_t mode, uint8_t sub_class, uint8_t ctry_code, void *peer_addr, uint8_t app_id)
{
    HidBuffer hid;
...
    if (len == 0) {
        log_printf("BT: [Err] received invalid HID len==0 \n");
    }
    else {
        hid.len = len;
        hid.sub_class = sub_class;
        hid.app_id = app_id;
        hid.dev_handle = dev_handle;
        hid.mode = mode;
        memcpy(hid.data, p_rpt, len);
...
        res = smdIopSendMessage(smdIopIndex, &hid, 0x40);
        if (res == 0) {
            numSucessfulSmdMessages = numSucessfulSmdMessages + 1;
        }
        else {
...
            pad_printf("BT: [Err] SMD send message failed with status %d:%d\n", res);
            numFailedSmdMessages = numFailedSmdMessages + 1;
        }
    }
...
}

The vulnerability is really easy to spot here.

It checks that len isn't 0 and then copies the report data to the buffer on the stack.

This buffer can store up to 58 bytes. We can send as many bytes as the MTU allows though.

This allows writing to the stack and we can easily overwrite the LR stored on the stack, which allows us to jump anywhere in IOS-PAD.

Exploiting the bug

At first, I needed a fake controller that can send malicious data to the Wii U.  

I found a repository called WiimoteEmulator on GitHub. Sadly I wasn't able to pair the emulated Wii Remote to my Wii U.

After hooking into some functions on the IOSU, I figured out the Wii U will try to use something called "Secure Simple Pairing". If the "Secure Simple Pairing" Mode (SSP) is enabled on the client, pairing on the Wii U will fail.

The fix was relatively simple. All I had to do was to disable SSP on the client while BluuBomb was running.

We can now send our own data using the WiimoteEmulator.


All data packets start with 0xa1, if they're sent from the Wii Remote to the Wii U, and start with 0xa2, if sent from the Wii U to the Wii remote.  

The Wii U will copy all the data behind this byte to the earlier mentioned buffer.  

We can't write to any executable memory without kernel access, so we'll have to use existing instructions stored in executable memory. 

On ARM the function address returned to after calling a function will be stored in the Link Register.  

When a function needs to call another function this Link Register is pushed to the stack and popped back into the program counter when returning from the original function.  

Since we now control the stack we can modify the return addresses stored in it. This allows us to create a so-called ROP chain.  

We can use existing useful instructions, so-called "gadgets", and jump to them.

When the gadget returns it will read the return address from our controlled stack.

I've mostly used ROPgadget or Ghidra itself to find useful gadgets.

We can also execute ARM instructions as Thumb instructions. Thumb is a special mode in ARM processors that allows using 2-byte instructions instead of 4 bytes.

By loading an address with the last bit set to 1 into the program counter the following instructions will be interpreted as Thumb.

The MTU is around 512 bytes and we only have around 130 bytes until we reached the top of our stack.

This won't be enough to exploit the kernel or load a kernel binary, so we need a way to load more data into memory.

I started by making a simple payload which will upload data.

It uses the 58 bytes of the HID buffer and calls memcpy to copy the data to a location we've specified.

It then continues with normal execution. To achieve this we only overwrite a specific amount of the stack and jump to a location that would expect the stack pointer at the location it's currently at.


We can now upload 58 bytes at a time!

This allows us to upload a bigger ROP chain and a kernel binary.  


Unfortunately, we overwrite the address of the report buffer packet and can no longer free it.  

That means with all the ROP chains we only have 871 bytes for the kernel binary. Ouch, that's small.  


Let's start with the big ROP chain we'll use to gain kernel access.  

We can use a flaw in the IOS_CreateThread call which will clear parts of the specified stack with zeroes. Since this is cleared with kernel permissions we can use this anywhere in memory.  

Zeroes are interpreted as NOPs on ARM. This allows us to patch parts of the IOS_SetPanicBehaviour syscall which makes it possible to use it for arbitrary write.  

We can now use this syscall to write bytes anywhere in memory.  

This allows us to write our own instructions and turn the IOS_SetFaultBehaviour syscall into a function which copies our kernel binary from memory and executes it with kernel permissions.  

This is highly based on the ROP chain used in Mocha CFW, but adjusted to work in IOS-PAD.  


Now all that was left is a payload that pivots the stack into our bigger ROP chain which is placed into memory.  

We need to pivot the stack or else we'll write over the top of our stack. The IOSU will prevent use from using syscalls if the stack pointer is invalid.  

There only seems to be a single instruction in IOS-PAD which makes it possible to pivot the stack.  

This instruction was add sp, sp, r2, which adds the value in register 2 to our stack pointer.  

To properly return from this gadget, without executing instructions we don't want, we use IOS_CreateThread again to nop out some instructions.  

We can now offset the stack by any amount we want. The thread we're running on has a stack size of 0x1000.  

We'll now set register 2 to -0x600 and run the stack pivot ROP chain.  

This will give us enough space for the big ROP chain.  


And that's basically it. We can now:

  • Use the upload payload to upload our kernel binary into unused memory
  • Use the upload payload to upload our bigger ROP chain to the stack
  • Use the stack pivot payload to pivot the stack into the bigger ROP chain
  • Copy the kernel binary into the kernel and run it

The 871 bytes are luckily enough for loading a custom .rpx or a custom fw.img.

Conclusion

This is the first fully implemented Wii U exploit that directly exploits the IOSU.  

The only thing you need for BluuBomb is a Wii U that is in a state able to pair a Wii Remote and a PC with Bluetooth support.

For the fw.img loader you also need to be able to access and exit System Settings on the Wii U.

While the browser is still the more convenient entrypoint, this should be able to repair a few soft bricks. 


Hope this is useful for someone :)


Funnily enough, at the same time as I was finishing implementing this exploit, the Switch got an update that fixes this exact issue.  

Looking into the Nintendo Alarmo

While everyone was waiting on news for the successor of the Nintendo Switch, Nintendo released the Alarmo. A small plastic alarm clock that ...