Re: Resumable NMI proposal


Jonathan Behrens <behrensj@...>
 

I'd expect that S-mode software will also want to be able to receive RNMIs, so it is probably worth thinking through how that would work at the same time.

Jonathan


On Tue, Jan 19, 2021 at 11:42 AM Brian Grayson via lists.riscv.org <brian.grayson=sifive.com@...> wrote:
I'll jump in with a few more. :)

From an architectural point of view, I don't like the fact that an NMI blocks further NMI until it leaves its handler. There may be NMIs that can save the state they need, and then unblock further NMIs, i.e., they might be stackable in some cases. This spec precludes that by making rnmie not software-writable.

I also fundamentally dislike the use of the term "non-maskable", because the proposed rnmie bit literally masks non-maskable interrupts. That's an impossible thing to put into a spec, as remarked upon in your NOTE.

From a prior art point of view, the PowerPC Book E architecture (the embedded flavor) handled this differently. It called NMIs "critical interrupts", and all critical interrupts are higher priority than non-critical. They can be masked, and further critical interrupts are automatically masked when a critical interrupt is taken, but software in the critical interrupt handler can save the state it needs, and then re-enable further critical interrupts if that is desired. I believe this is ultimately the desired behavior for us as well, and is IMO better architectural terminology and a better mental framework.

Within non-critical interrupts, some debug events were considered highest priority, some were considered lowest priority, depending on the type of debug event. It can be difficult (or even impossible) to assert that all debug events are all higher or all lower than a given exception event.

See Chapter 7 "Interrupts and Exceptions" of https://www.nxp.com/docs/en/user-guide/BOOK_EUM.pdf for more details, and in particular 7.9.1 that talks about the relative priority of all the different types of interrupts in both classes, and where various debug interrupts fell.

Back to the proposal, I think there's a mistake near the end:

If the hart encounters an exception while the `rnmie` bit is clear, the
exception state is written to `mepc` and `mcause`, `mstatus.mpp` is
set to M-mode, and the hart jumps to the RNMI exception handler
address.  

If the enable bit is clear, the exception can't be taken, right? Is this supposed to discuss the case of an ordinary exception being observed while in RNMI, where the exception will actually be instantly taken once we leave RNMI state, in which case we would jump to the ordinary exception handler? Or am I misunderstanding?

Brian

On Mon, Jan 18, 2021 at 9:09 PM Greg Favor <gfavor@...> wrote:
Even though this is hot off the press, I'll jump in with a few small comments:

- In mnstatus, shouldn't there also be a bit like the mstatus.MPV bit (for when the H extension is implemented and enabled)?

- The width of the mnstatus CSR is not explicitly defined (e.g. as an XLEN-bit or 32-bit read-write register).

- What is the relative priority for RNMI versus Debug Halt Request?  Maybe it is the responsibility ultimately for the Debug spec to specify this (?), but what should be said over there?

- Can you briefly comment on the types of systems that show the most need for recoverable NMIs (versus being not ideal but ok with the current NMI situation).  Semi-equivalently, would you expect this extension to eventually become a requirement (or optional) in RVA22 and/or RVM22?

Greg

On Mon, Jan 18, 2021 at 6:39 PM Krste Asanovic <krste@...> wrote:

Current RISC-V specs only have a non-resumable NMI definition.  The
following proposal would add resumable NMI support.  This was one of
the features requested for priv 1.12 or RVA/RVM22.

This is up for discussion, but I think it is small enough to go
through fast track process.

Krste

:sectnums:
:toc: left

= Resumable NMI support in RISC-V
Version 0.2.1-Draft

== Background and Motivation

The RISC-V privileged architecture version 1.11 supports only
unresumable non-maskable interrupts (UNMIs), where the NMI jumps to a
handler in machine mode, overwriting the current `mepc` and `mcause`
register values.  If the hart had been executing machine-mode code in
a trap handler, the previous values in `mepc` and `mcause` would not
be recoverable and so execution is not generally resumable.

This proposal adds support for resumable non-maskable interrupts
(RNMIs) to RISC-V.  The extension adds four new CSRs (`mnepc`,
`mncause`, `mnstatus`, and `mnscratch`) to hold the interrupted state,
and a new instruction to resume from the RNMI handler.

== RNMI Interrupt Signals

The `rnmi` interrupt signals are inputs to
the hart.  These interrupts have higher priority than any other
interrupt or exception on the hart and cannot be disabled by software.
Specifically, they are not disabled by clearing the `mstatus.mie`
register.

== RNMI Handler Addresses

The RNMI interrupt trap handler address is implementation-defined.

RNMI also has an associated exception trap handler address, which is
implementation defined.

== New RNMI CSRs

This proposal adds additional M-mode CSRs to enable a resumable
non-maskable interrupt (RNMI).

.NMI additional CSRs
[cols="2,2,2,2"]
[%autowidth]
|===
| Number | Privilege | Name        | Description

| 0x350  | MRW       | `mnscratch` | Resumable Non-maskable scratch register
| 0x351  | MRW       | `mnepc`     | Resumable Non-maskable EPC value
| 0x352  | MRW       | `mncause`   | Resumable Non-maskable cause value
| 0x353  | MRW       | `mnstatus`  | Resumable Non-maskable status
|===

The `mnscratch` CSR holds an XLEN-bit read-write register which
enables the NMI trap handler to save and restore the context that was
interrupted.

The `mnepc` CSR is an XLEN-bit read-write register which on entry
to the NMI trap handler holds the PC of the instruction that took the
interrupt. The lowest bit of `mnepc` is hardwired to zero.

The `mncause` CSR holds the reason for the NMI, with bit XLEN-1 set to
1, and the NMI cause encoded in the least-significant bits or zero if
NMI causes are not supported.

The `mnstatus` CSR holds a two-bit field which on entry to the trap
handler holds the privilege mode of the interrupted context encoded in
bits `mnstatus[12:11]` in the same manner as `mstatus.mpp`.  The other
bits in `mnstatus` are _reserved_, but software should write zeros and
hardware implementations should return zeros.

== New MNRET instruction

This new M-mode only instruction uses the values in `mnepc` and
`mnstatus` to return to the program counter and privileged mode of the
interrupted context respectively.  This instruction also sets the
`rnmie` state bit.

MNRET instruction encoding is same as MRET except with bit 30 set
(i.e.,`funct7`=`0111000`).

== RNMI Operation

When an RNMI interrupt is detected, the interrupted PC is written to
the `mnepc` CSR, the type of RNMI to the `mncause` CSR, and the
privilege mode of the interrupted context to the `mnstatus` CSR.  An
internal microarchitectural state bit `rnmie` is cleared to indicate
that processor is in an RNMI handler and cannot take a new RNMI
interrupt.  The internal `rnmie` bit when clear also disables all
other interrupts.

NOTE: These interrupts are called non-maskable because software cannot
mask the interrupts, but for correct operation other instances of the
same interrupt must be held off until the handler is completed, hence
the internal state bit.

The core then enters machine-mode and jumps to the RNMI trap handler
address.

The RNMI handler can resume original execution using the new MNRET
instruction, which restores the PC from `mnepc`, the privilege mode
from `mnstatus`, and also sets the internal `rnmie` state bit, which
reenables other interrupts.

If the hart encounters an exception while the `rnmie` bit is clear, the
exception state is written to `mepc` and `mcause`, `mstatus.mpp` is
set to M-mode, and the hart jumps to the RNMI exception handler
address.

NOTE: Traps in the RNMI handler can only be resumed if they occur while
the handler was servicing an interrupt that occured outside of
machine-mode.

== Interaction with debugger

The debugger can be configured such that an RNMI event drops the
system into the debugger.





Join tech-privileged@lists.riscv.org to automatically receive all group messages.