Re: SBI: We can fast handle some SBI functions for extreme performance in assembly code implementation if SBI extension‘s FID equals to zero

Anup Patel

On Wed, Dec 22, 2021 at 9:55 AM 洛佳 Luo Jia <me@...> wrote:

[Edited Message Follows]
[Reason: Typo fix]

RISC-V SBI provides platform agnostic functions for kernels. The nomal handling procedure in an SBI implementation would include context switch and call higher language code, e.g. Rust or C code. However when SBI function has FID=0 (no matter what extension module EID is), we can figure out another way to provide such SBI functions without context switch. The assembly code would write as follows:

unsafe fn trap_begin() {
"bnez a6, 1f", // set_timer EID(a7): 0x54494D45, FID(a6): 0
"li a6, 0x54494D45",
"bne a6, a7, 1f",
"csrr a6, mcause",
"li a7, 9",
"bne a6, a7, 1f", // if mcause != supervisor ecall, jump to conventional way of handling
"li a6, 0x200bff8", // CLINT mtimecmp address (if device tree match this address, use trap_begin as mtvec, otherwise don't use it then it would be performance loss only but still correct)
This is platform specific address and it will break for platforms
having MTIME register at some other address. Such optimizations,
increasingly make a SBI implementation platform specific.

"sd a0, 0(a6)", // a0: stime_value
This is another place where this code is broken because it always
program's MTIMECMP register at offset 0x0.

On SMP systems, the mapping of MTIMECMP registers to HART could be
totally arbitrary. In fact, this mapping will not be related to
mhartid for systems with sparse hartids. The only source of truth for
MTIMECMP to HART mapping is the ACLINT/CLINT DT node.

"mret", // return to supervisor without context restore
"csrrw sp, mscratch, sp",
"sd ra, 0(sp)",
// ...context save...
"call {rust_trap}",
"ld ra, 0(sp)",
// ...context restore...
"csrrw sp, mscratch, sp",

The core idea of this assembly code is that the condition of entry of certain SBI function (in this example, set_timer) can be concluded as: mcause == 9 && EID == 0x54494D45 && FID == 0. Such comparison of register equals a constant non-zero value requires another register to store the constant value; but equals zero does not, because RISC-V provides the `zero` register where we can compare to, so `bnez` would run without any auxiliary registers to store constant. The arithmetic condition `&&` allows to switch equal comparisons in math, usually we compare `mcause` first, but comparing FID first is also correct in arithmetic result. Then in this way after FID == 0 is compared, we can compare EID and following mcause as well, using the `a6` register formally as a temporary register storing the FID value.

In this way we can accelerate such SBI calls faster, as only few assembly code is run, no context switch and higher programming language calls is required. But such ’irregular‘ way only come into effect if any comparison requires Value == 0, in SBI it would be FID == 0 (EID == 0 means a legacy module).
Well, an SBI implementation can become more-n-more platform specific
and do more stuff in assembly but the platform vendor will end-up
maintaining their platform specific SBI implementation.

Instead of creating such platform specific SBI implementation for
optimizing SBI timer calls, platforms can instead go for Priv Sstc
extension using which SBI Timer calls can be avoided. Same thing can
be done by platforms to avoid SBI IPI calls using ACLINT SSWI or AIA
IMSIC devices.

In future SBI extensions and vendor defined extensions, it might be better if we suggest any function that requires extreme performance or is called freqently has an FID that equals zero. The current SBI 1.0-rc extensions has already defined most performance required functions (FENCE.I ipi, set_timer, etc.) as FID == 0. If it's possible, we can set a rule or a formal advice in SBI standard that performance functions in extensions should be best to define as FID == 0; or if any SBI module includes only one function, the function's FID should best be FID == 0 to help the implementations to improve SBI call performace.
If an SBI extension is in hot-path for OSes then it's functionality
will be eventually replaced by some ISA or non-ISA specification. For
example, the SBI IPI and Timer extension functionality is already
replaced by AIA, ACLINT, and Priv Sstc specifications.

We only define a SBI extension for a functionality when there is no
other way left and corresponding ISA or non-ISA specifications are not
desired/possible in near future (1-2 years). Further, defining a new
SBI extension also requires a detailed proof-of-concept implementation
(such as QEMU, Linux, OpenSBI, etc) which allows us to further refine
the SBI extension based on real-world experience.


Join { to automatically receive all group messages.