Dumping the MSP430FR BSL
Go to file
Triss 0e63d67649 a 2022-04-09 23:06:02 +02:00
include initial stuff 2022-04-09 03:48:43 +02:00
src dumper code that works! 2022-04-09 21:56:28 +02:00
.gitignore initial stuff 2022-04-09 03:48:43 +02:00
Makefile initial stuff 2022-04-09 03:48:43 +02:00
README.md a 2022-04-09 23:06:02 +02:00
logtracer.py dumper code that works! 2022-04-09 21:56:28 +02:00
msp430fr5962.ld initial stuff 2022-04-09 03:48:43 +02:00
msp430fr5962_symbols.ld initial stuff 2022-04-09 03:48:43 +02:00
msp430fr5994.ld initial stuff 2022-04-09 03:48:43 +02:00
msp430fr5994_symbols.ld initial stuff 2022-04-09 03:48:43 +02:00

README.md

MSP430FR BSL dumper

Tools to try to dump the MSP430FR BSL, mainly targetting the MSP430FR5994 (on an MSP-EXP430FR5994 devboard).

The idea

The MSP430FR bootloader ('BSL') resides at 0x1000. This memory cannot be read, and user code can only jump to 0x1000 or 0x1002, called the "Z-area", to run certian functions of the BSL. Though, it is very likely that when the CPU is running from inside this memory region, it can access this memory as data, as that is often needed to store eg. structs, lookup tables, and so on. Several other "execute-only" memory implementations function in a similar way, such as the Nintendo GameBoy Advance and DS boot ROMS ("BIOS"es, citation below), as well as in other systems analyzed by Schink and Obermaier, publication also linked below.

The BSL (according to the datasheet ) doesn't disable interrupts. That means that, while the BSL is executing, it is possible to interrupt this execution flow to jump to code controlled by the user. An interrupt can inspect and modify the registers of the BSL code at the time when the interrupt happened, as well as the stack contents. Having a timer at the same frequency as the CPU, and having it dump the register and stack contents after a certain interval increasing by one cycle every iteration, it is possible to trace the instruction flow of the CPU, as well as which registers and stack contents it is accessing, and how, even though the code itself is not visible. Furthermore, the MSP430 CPU uses a variable-length instruction set and instructions can use a variable amount of cycles, therefore these traces can also be used to infer more infromation about which instructions are executed, as the pc CPU register will never point to the middle of an instruction, and will only advance to the next instruction depending on how long the current instruction takes to execute.

Function epilogues typically first pop a number of values off the stack and load these back into registers, and then return. By controlling the stack pointer value, these can be used as a way to perform arbitrary reads. However, as we are targetting nonwritable memory, an interrupt needs to happen before the return occurs, otherwise CPU execution becomes very unpredictable.

You can find these epilogues by staring at many, many execution traces (obtained from these timer interrupts) and thinking really hard (this is the hard, time-consuming and labor-intensive part).

Alternatively, by timing an interrupt or a DMA transfer such that it happens after a function is called but before it returns, it is possible to overwrite the memory popped off the stack when an epilogue executes, thereby gaining control of a few register values as well as the program counter. Then, CPU execution can be redirected to another code snippet performing the memory read before returning. With control over the address it reads from, this can be used as an arbitrary read to read one word of the BSL, then return to use code to do the next iteration.

The "using interrupts to figure out what execute-only code is doing" trick was first (afaik) used by Martin Korth to find such a gadget inside the Nintendo DS ARM7 boot ROM to read it out (and dump some keys), see here and here , but is also described in the academic literature, eg. here .

The "use DMA to get ROP" trick comes from here , described near the end, the article is quite large.

What has been implemented correctly

  1. Memory in the BSL region cannot be read using data accesses from user code. Reads come back as 3f ff, which decodes as an infinite loop.
  2. Arbitrary code in the BSL region cannot be jumped to from user code, the CPU execution path has to go through the Z-area. Doing this will cause an infinite loop or a reset.

Vulnerabilities of the BSL against a readout attack

  1. When the CPU is executing the BSL, it can perform data accesses to other BSL areas. Thus, if an arbitrary read gadget is found, it can be used to dump the entire BSL region. This is the same issue as present in the Nintendo DS ARM7 boot ROM.
  2. The routine at 0x1002 provides such a gadget, as indicatd in SLAU550AA.
  3. The BSL execution is allowed to be interrutped, thus the instruction flow can be traced by dumping CPU register values throughout the BSL execution. This allows for finding arbitrary read gadgets.
  4. Interrupts can also be used to change any register value while the BSL is executing, even at a specific point in time. This can be used to skip over certain instructions during analysis, for example.

Vulnerabilities of the BSL against use as a source of ROP gadgets

  1. The routine at 0x1002 returns quickly, as indicatd in SLAU550AA. Therefore, it can be used as an easy ROP entrypoint. This bypasses the "only call code from the Z-area" limitation.
  2. Interrutps can be used to change return addresses etc., to jump to arbitrary locations inside the BSL.
  3. Potentially, DMA transfers can also be used to change the stack contents, including return addresses, while the BSL is executing.

Inaccurracies of the datasheets

  1. The BSL clears all RAM from 0x1C00 to 0x3FC7, not just 0x1C00 to 0x1FFF.
  2. The BSL also clears Tiny RAM and some "reserved" low addresses, from 6 to 0x1F.
  3. The BSL sets up Timer A, while the datasheet only mentions Timer B usage in other BSLs, and nothing about this one. This is wrong, it changes the clock settings, which has an influence on which clock source a timer uses.

What has not been checked

  1. Pipelining: can code running at 0x0FFE (or a similar address) access the BSL memory, (mis)using the possibility that the effective value of pc might differ from the executed address due to pipelining effects? (cf. MerryMage's GBA BIOS dump)
  2. DMA: can a DMA transfer be used to change the stack contents during BSL execution? (Most likely, just like interrupts can, I simply haven't checked.)

Hashes

This is the hash of the memory region 0x1000 to 0x17FF, on an MSP430FR5994, with BSL 00.08.35.B3:

Hash function value
MD5 4bb3bb753face80fffe1fef7a762884a
SHA-1 1b4c13e006121a9b1c1ebcd4fbc6ec7c96cc017f
SHA-256 e4d0d171013f847a357eebe5467bcd413ecb41dc01424b7e4ee636538d820766
SHA-512 fed28a7e9643a551789075b79d9b04fa6e8cdca74d783c1c3830ece07e5c9141dda9532b3c442416a1ddab90d752e679c6918c0d5333ac6da9fd23ab6c33d1bb

Proof of concept

The code in src/main.c will dump the content of the BSL to eUSCI_A0 in UART mode. Tested on an MSP430FR5994, but no other chips.

By setting the DUMP_MODE preprocessor definition to 0, it can instead be used as an instruction tracer, accompanied by logtracer.py.