Re: Resumable NMI proposal

Krste Asanovic

On Tue, 19 Jan 2021 10:36:26 -0600, Brian Grayson <brian.grayson@...> said:
| I'll jump in with a few more. :)
| From an architectural point of view, I don't like the fact that an NMI blocks further NMI until it leaves its handler. There may be NMIs that can save
| the state they need, and then unblock further NMIs, i.e., they might be stackable in some cases. This spec precludes that by making rnmie not
| software-writable.

| I also fundamentally dislike the use of the term "non-maskable", because the proposed rnmie bit literally masks non-maskable interrupts. That's an
| impossible thing to put into a spec, as remarked upon in your NOTE.

rnmie should be settable but not clearable in M-mode to support
nesting NMIs.

(This was an editing error on my part, as I simplified earlier spec
but didn't include this mod.)

This still makes them non-maskable by software; NMI is
industry-standard term.

| From a prior art point of view, the PowerPC Book E architecture (the embedded flavor) handled this differently. It called NMIs "critical interrupts",
| and all critical interrupts are higher priority than non-critical. They can be masked, and further critical interrupts are automatically masked when a
| critical interrupt is taken, but software in the critical interrupt handler can save the state it needs, and then re-enable further critical interrupts
| if that is desired. I believe this is ultimately the desired behavior for us as well, and is IMO better architectural terminology and a better mental
| framework.

Fast interrupts proposal has pre-emptible levels intended for embedded

I think these are qualitatively different than NMI, which are needed
in application processors too.

| Within non-critical interrupts, some debug events were considered highest priority, some were considered lowest priority, depending on the type of debug
| event. It can be difficult (or even impossible) to assert that all debug events are all higher or all lower than a given exception event.

| See Chapter 7 "Interrupts and Exceptions" of for more details, and in particular 7.9.1 that talks
| about the relative priority of all the different types of interrupts in both classes, and where various debug interrupts fell.

| Back to the proposal, I think there's a mistake near the end:

| If the hart encounters an exception while the `rnmie` bit is clear, the
| exception state is written to `mepc` and `mcause`, `mstatus.mpp` is
| set to M-mode, and the hart jumps to the RNMI exception handler
| address.  

| If the enable bit is clear, the exception can't be taken, right? Is this supposed to discuss the case of an ordinary exception being observed while in
| RNMI, where the exception will actually be instantly taken once we leave RNMI state, in which case we would jump to the ordinary exception handler? Or
| am I misunderstanding?

Exceptions are not interrupts. NMI exceptions go to different vector address
than regular exceptions,


| Brian

| On Mon, Jan 18, 2021 at 9:09 PM Greg Favor <gfavor@...> wrote:

| Even though this is hot off the press, I'll jump in with a few small comments:

| - In mnstatus, shouldn't there also be a bit like the mstatus.MPV bit (for when the H extension is implemented and enabled)?

| - The width of the mnstatus CSR is not explicitly defined (e.g. as an XLEN-bit or 32-bit read-write register).

| - What is the relative priority for RNMI versus Debug Halt Request?  Maybe it is the responsibility ultimately for the Debug spec to specify this
| (?), but what should be said over there?

| - Can you briefly comment on the types of systems that show the most need for recoverable NMIs (versus being not ideal but ok with the current NMI
| situation).  Semi-equivalently, would you expect this extension to eventually become a requirement (or optional) in RVA22 and/or RVM22?

| Greg

| On Mon, Jan 18, 2021 at 6:39 PM Krste Asanovic <krste@...> wrote:

| Current RISC-V specs only have a non-resumable NMI definition.  The
| following proposal would add resumable NMI support.  This was one of
| the features requested for priv 1.12 or RVA/RVM22.

| This is up for discussion, but I think it is small enough to go
| through fast track process.

| Krste

| :sectnums:
| :toc: left

| = Resumable NMI support in RISC-V
| Version 0.2.1-Draft

| == Background and Motivation

| The RISC-V privileged architecture version 1.11 supports only
| unresumable non-maskable interrupts (UNMIs), where the NMI jumps to a
| handler in machine mode, overwriting the current `mepc` and `mcause`
| register values.  If the hart had been executing machine-mode code in
| a trap handler, the previous values in `mepc` and `mcause` would not
| be recoverable and so execution is not generally resumable.

| This proposal adds support for resumable non-maskable interrupts
| (RNMIs) to RISC-V.  The extension adds four new CSRs (`mnepc`,
| `mncause`, `mnstatus`, and `mnscratch`) to hold the interrupted state,
| and a new instruction to resume from the RNMI handler.

| == RNMI Interrupt Signals

| The `rnmi` interrupt signals are inputs to
| the hart.  These interrupts have higher priority than any other
| interrupt or exception on the hart and cannot be disabled by software.
| Specifically, they are not disabled by clearing the `mstatus.mie`
| register.

| == RNMI Handler Addresses

| The RNMI interrupt trap handler address is implementation-defined.

| RNMI also has an associated exception trap handler address, which is
| implementation defined.

| == New RNMI CSRs

| This proposal adds additional M-mode CSRs to enable a resumable
| non-maskable interrupt (RNMI).

| .NMI additional CSRs
| [cols="2,2,2,2"]
| [%autowidth]
| |===
| | Number | Privilege | Name        | Description

| | 0x350  | MRW       | `mnscratch` | Resumable Non-maskable scratch register
| | 0x351  | MRW       | `mnepc`     | Resumable Non-maskable EPC value
| | 0x352  | MRW       | `mncause`   | Resumable Non-maskable cause value
| | 0x353  | MRW       | `mnstatus`  | Resumable Non-maskable status
| |===

| The `mnscratch` CSR holds an XLEN-bit read-write register which
| enables the NMI trap handler to save and restore the context that was
| interrupted.

| The `mnepc` CSR is an XLEN-bit read-write register which on entry
| to the NMI trap handler holds the PC of the instruction that took the
| interrupt. The lowest bit of `mnepc` is hardwired to zero.

| The `mncause` CSR holds the reason for the NMI, with bit XLEN-1 set to
| 1, and the NMI cause encoded in the least-significant bits or zero if
| NMI causes are not supported.

| The `mnstatus` CSR holds a two-bit field which on entry to the trap
| handler holds the privilege mode of the interrupted context encoded in
| bits `mnstatus[12:11]` in the same manner as `mstatus.mpp`.  The other
| bits in `mnstatus` are _reserved_, but software should write zeros and
| hardware implementations should return zeros.

| == New MNRET instruction

| This new M-mode only instruction uses the values in `mnepc` and
| `mnstatus` to return to the program counter and privileged mode of the
| interrupted context respectively.  This instruction also sets the
| `rnmie` state bit.

| MNRET instruction encoding is same as MRET except with bit 30 set
| (i.e.,`funct7`=`0111000`).

| == RNMI Operation

| When an RNMI interrupt is detected, the interrupted PC is written to
| the `mnepc` CSR, the type of RNMI to the `mncause` CSR, and the
| privilege mode of the interrupted context to the `mnstatus` CSR.  An
| internal microarchitectural state bit `rnmie` is cleared to indicate
| that processor is in an RNMI handler and cannot take a new RNMI
| interrupt.  The internal `rnmie` bit when clear also disables all
| other interrupts.

| NOTE: These interrupts are called non-maskable because software cannot
| mask the interrupts, but for correct operation other instances of the
| same interrupt must be held off until the handler is completed, hence
| the internal state bit.

| The core then enters machine-mode and jumps to the RNMI trap handler
| address.

| The RNMI handler can resume original execution using the new MNRET
| instruction, which restores the PC from `mnepc`, the privilege mode
| from `mnstatus`, and also sets the internal `rnmie` state bit, which
| reenables other interrupts.

| If the hart encounters an exception while the `rnmie` bit is clear, the
| exception state is written to `mepc` and `mcause`, `mstatus.mpp` is
| set to M-mode, and the hart jumps to the RNMI exception handler
| address.

| NOTE: Traps in the RNMI handler can only be resumed if they occur while
| the handler was servicing an interrupt that occured outside of
| machine-mode.

| == Interaction with debugger

| The debugger can be configured such that an RNMI event drops the
| system into the debugger.


Join to automatically receive all group messages.