Lazy FPU store (without disabling FPU) #23

nielsdos · 2020-08-24T22:34:04Z

Some quick benchmarks show that a FPU store+restore takes about 32ns on my system (i7 4th gen 3.6GHz).
A single FPU store requires about 16ns.
Pingpong IPC requires two context switches, which means a 64ns overhead on FPU state.

Doing a lazy FPU store + restore the classic way requires disabling the FPU and relying on an ISR to re-enable and restore.
However, this approach has more overhead than what you get back (> 32ns) as I measured.
Besides the overhead issue, there's also the lazy FPU restore vulnerability. If a lazy mechanism is implemented, it can only be applied to storing the FPU state: the current task must always have its own FPU state loaded.

An alternative method to implement FPU lazy store is relying in the compiler.
If we could detect the use of FPU instructions in a basic block, we could insert an instruction in that basic block which sets a flag, something like this:

movb $1, %fs:0 // With fs the TLS register, assuming offset 0 is the flag offset

This would mean that the hardware overhead for lazy FPU store is replaced by a single move instruction, which should be pretty cheap.

nielsdos added the enhancement New feature or request label Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy FPU store (without disabling FPU) #23

Lazy FPU store (without disabling FPU) #23

nielsdos commented Aug 24, 2020

Lazy FPU store (without disabling FPU) #23

Lazy FPU store (without disabling FPU) #23

Comments

nielsdos commented Aug 24, 2020