Re: Resumable NMI proposal

Krste Asanovic

On Mon, 18 Jan 2021 19:09:26 -0800, Greg Favor <gfavor@...> said:
| Even though this is hot off the press, I'll jump in with a few small comments:
| - In mnstatus, shouldn't there also be a bit like the mstatus.MPV bit (for when the H extension is implemented and enabled)?

I'll let hypervisor authors address this.

| - The width of the mnstatus CSR is not explicitly defined (e.g. as an XLEN-bit or 32-bit read-write register).
Should be XLEN to match mstatus.

| - What is the relative priority for RNMI versus Debug Halt Request?  Maybe it is the responsibility ultimately for the Debug spec to specify this (?),
| but what should be said over there?
I'll let debug group figure this out.

| - Can you briefly comment on the types of systems that show the most need for recoverable NMIs (versus being not ideal but ok with the current NMI
| situation).  Semi-equivalently, would you expect this extension to eventually become a requirement (or optional) in RVA22 and/or RVM22?

Where external agent outside OS stack has to respond to the interrupt
and resume OS, e.g, hardware error logging/reporting inc watchdogs, or
some forms of power down.

Yes for RVA22/RVM22, and I would think mandatory in RVA22, optional in

Platform specs would have to indicate what/how NMIs are


| Greg

| On Mon, Jan 18, 2021 at 6:39 PM Krste Asanovic <krste@...> wrote:

| Current RISC-V specs only have a non-resumable NMI definition.  The
| following proposal would add resumable NMI support.  This was one of
| the features requested for priv 1.12 or RVA/RVM22.

| This is up for discussion, but I think it is small enough to go
| through fast track process.

| Krste

| :sectnums:
| :toc: left

| = Resumable NMI support in RISC-V
| Version 0.2.1-Draft

| == Background and Motivation

| The RISC-V privileged architecture version 1.11 supports only
| unresumable non-maskable interrupts (UNMIs), where the NMI jumps to a
| handler in machine mode, overwriting the current `mepc` and `mcause`
| register values.  If the hart had been executing machine-mode code in
| a trap handler, the previous values in `mepc` and `mcause` would not
| be recoverable and so execution is not generally resumable.

| This proposal adds support for resumable non-maskable interrupts
| (RNMIs) to RISC-V.  The extension adds four new CSRs (`mnepc`,
| `mncause`, `mnstatus`, and `mnscratch`) to hold the interrupted state,
| and a new instruction to resume from the RNMI handler.

| == RNMI Interrupt Signals

| The `rnmi` interrupt signals are inputs to
| the hart.  These interrupts have higher priority than any other
| interrupt or exception on the hart and cannot be disabled by software.
| Specifically, they are not disabled by clearing the `mstatus.mie`
| register.

| == RNMI Handler Addresses

| The RNMI interrupt trap handler address is implementation-defined.

| RNMI also has an associated exception trap handler address, which is
| implementation defined.

| == New RNMI CSRs

| This proposal adds additional M-mode CSRs to enable a resumable
| non-maskable interrupt (RNMI).

| .NMI additional CSRs
| [cols="2,2,2,2"]
| [%autowidth]
| |===
| | Number | Privilege | Name        | Description

| | 0x350  | MRW       | `mnscratch` | Resumable Non-maskable scratch register
| | 0x351  | MRW       | `mnepc`     | Resumable Non-maskable EPC value
| | 0x352  | MRW       | `mncause`   | Resumable Non-maskable cause value
| | 0x353  | MRW       | `mnstatus`  | Resumable Non-maskable status
| |===

| The `mnscratch` CSR holds an XLEN-bit read-write register which
| enables the NMI trap handler to save and restore the context that was
| interrupted.

| The `mnepc` CSR is an XLEN-bit read-write register which on entry
| to the NMI trap handler holds the PC of the instruction that took the
| interrupt. The lowest bit of `mnepc` is hardwired to zero.

| The `mncause` CSR holds the reason for the NMI, with bit XLEN-1 set to
| 1, and the NMI cause encoded in the least-significant bits or zero if
| NMI causes are not supported.

| The `mnstatus` CSR holds a two-bit field which on entry to the trap
| handler holds the privilege mode of the interrupted context encoded in
| bits `mnstatus[12:11]` in the same manner as `mstatus.mpp`.  The other
| bits in `mnstatus` are _reserved_, but software should write zeros and
| hardware implementations should return zeros.

| == New MNRET instruction

| This new M-mode only instruction uses the values in `mnepc` and
| `mnstatus` to return to the program counter and privileged mode of the
| interrupted context respectively.  This instruction also sets the
| `rnmie` state bit.

| MNRET instruction encoding is same as MRET except with bit 30 set
| (i.e.,`funct7`=`0111000`).

| == RNMI Operation

| When an RNMI interrupt is detected, the interrupted PC is written to
| the `mnepc` CSR, the type of RNMI to the `mncause` CSR, and the
| privilege mode of the interrupted context to the `mnstatus` CSR.  An
| internal microarchitectural state bit `rnmie` is cleared to indicate
| that processor is in an RNMI handler and cannot take a new RNMI
| interrupt.  The internal `rnmie` bit when clear also disables all
| other interrupts.

| NOTE: These interrupts are called non-maskable because software cannot
| mask the interrupts, but for correct operation other instances of the
| same interrupt must be held off until the handler is completed, hence
| the internal state bit.

| The core then enters machine-mode and jumps to the RNMI trap handler
| address.

| The RNMI handler can resume original execution using the new MNRET
| instruction, which restores the PC from `mnepc`, the privilege mode
| from `mnstatus`, and also sets the internal `rnmie` state bit, which
| reenables other interrupts.

| If the hart encounters an exception while the `rnmie` bit is clear, the
| exception state is written to `mepc` and `mcause`, `mstatus.mpp` is
| set to M-mode, and the hart jumps to the RNMI exception handler
| address.

| NOTE: Traps in the RNMI handler can only be resumed if they occur while
| the handler was servicing an interrupt that occured outside of
| machine-mode.

| == Interaction with debugger

| The debugger can be configured such that an RNMI event drops the
| system into the debugger.


Join to automatically receive all group messages.