Resumable NMI proposal


Paul Donahue
 

Thank you for pointing out that there are two RNMI vectors: one for NMI and one for exceptions.  As you surmised, I missed that but I think that addresses my concern.


-Paul

On Thu, Jan 28, 2021 at 2:34 PM John Hauser <jh.riscv@...> wrote:
1.
I think some of the questions and discussion concerning this proposal
have arisen because the proposal is a little too terse.  It would
be helpful to have more about the specific use cases that the RNMI
hardware is intended to support, and not support, and why, including
about nested NMIs (whether supported or not).  There seems to be an
assumption that the author and readers are already in agreement about
these matters, and that's much too optimistic, I think.

Some of the feedback has concerned the intersection of NMIs with
synchronous exceptions, such as what happens if a synchronous exception
occurs while in an NMI handler.  The proposal has this sentence:

    RNMI also has an associated exception trap handler address, which
    is implementation defined.

and then:

    If the hart encounters an exception while the `rnmie` bit is
    clear, the exception state is written to `mepc` and `mcause`,
    `mstatus.mpp` is set to M-mode, and the hart jumps to the RNMI
    exception handler address.

But that all went by too quickly for some reviewers, it seems.  (It may
also not be adequate in conjuction with nested NMIs.)

Not every reader will immediately intuit the clear distinction you
intend between _interrupt_ and _exception_, especially as even the
RISC-V Privileged Architecture sometimes muddies the two terms.
For example, note that an *interrupt trap* saves the interrupted
instruction address to one of the *Exception* Program Counter CSRs
such as mepc or mnepc, implying that the term _exception_ includes
interrupts.

2.
I urge that we not commit to CSR addresses 0x350-0x353 for the RNMI
CSRs.

As part of the planning for a possible future extension to improve
support for nested hypervisors, a pattern has been tentatively
established for M, S, U, VS, and V2S variants of CSRs, such as mstatus,
sstatus, ustatus, vsstatus, and a future v2sstatus.  The M-level
CSR addresses in the range 0x340-0x39F would be among those that can
support the full set of M, S, U, VS, and V2S variants of the same CSR.
But RNMI would never use the VS and V2S slots, and probably not the
U slot either.  It would be better to relocate the RNMI CSRs to other,
more constrained addresses, and leave the more flexible locations open
for CSRs that may need them.

To best manage the remaining CSR address space, I propose that the main
responsibility for assigning CSR addresses be given to the new Opcode
and Consistency Review group.

3.
Greg Favor wrote:
> - In mnstatus, shouldn't there also be a bit like the mstatus.MPV bit
>   (for when the H extension is implemented and enabled)?

Krste:
> I'll let hypervisor authors address this.

Greg is correct.  When the hypervisor extension is implemented, the
existing 2-bit mstatus.MPP is extended with another bit, mstatus.MPV.
The saved operating mode is fully represented by 3 bits, MPV and MPP
together.

I would suggest allocating mnstatus bit 13 for the MPV-like bit.

(I'd expect these two fields in mnstatus to be named either MPV and
MPP, or MNPV and MNPP.)

In anticipation of a possible future extension for nested hypervisors,
you should also keep bit 14 free for another bit (MPV2).

4.
Krste wrote:
> rnmie should be settable but not clearable in M-mode to support
> nesting NMIs.

I think, when you lay out exactly, step by step, how nested NMIs would
work, you'll discover that's inadequate.  (Or, if I'm wrong, then you
have different plans for how it will work than I can guess, so, again,
it would be good to have this in writing as part of the proposal.)

    - John Hauser






Richard Trauben
 

1) Is anyone using non maskable
interrupt for anything other
than fatal conditions?

a) if so,why not just increase the range of interrupt
priority levels, relocate the maskable interrupt priority handlers to the lower portion,
convert non fatal NMI sources into
maskable interrupts in the upper portion of the range?

2) if handling fatal interrupts is rare
(it better be), the probability of 2 back to back in close proximity or
actually having a hope of even fielding the subsequent fatal interrupt is small.

you could reserve the top say 8 priority fatal sources to ignore
the subsequent instances.

3) worst corne to worst you might partition
the M-mode privilege ring into two rings
a) an administrative
M1 (aka red mode) for handling fatal nmi irq icu events gracefully and
b) a business as usual M2 privilege ring for typical usage.

4) if you are worried about losing NMI perhaps a sorted elastic queue of pending NMI
might an option.

at some point recovery is no longer an option.

On Jan 28, 2021, at 2:34 PM, John Hauser <jh.riscv@...> wrote:

1.
I think some of the questions and discussion concerning this proposal
have arisen because the proposal is a little too terse. It would
be helpful to have more about the specific use cases that the RNMI
hardware is intended to support, and not support, and why, including
about nested NMIs (whether supported or not). There seems to be an
assumption that the author and readers are already in agreement about
these matters, and that's much too optimistic, I think.

Some of the feedback has concerned the intersection of NMIs with
synchronous exceptions, such as what happens if a synchronous exception
occurs while in an NMI handler. The proposal has this sentence:

RNMI also has an associated exception trap handler address, which
is implementation defined.

and then:

If the hart encounters an exception while the `rnmie` bit is
clear, the exception state is written to `mepc` and `mcause`,
`mstatus.mpp` is set to M-mode, and the hart jumps to the RNMI
exception handler address.

But that all went by too quickly for some reviewers, it seems. (It may
also not be adequate in conjuction with nested NMIs.)

Not every reader will immediately intuit the clear distinction you
intend between _interrupt_ and _exception_, especially as even the
RISC-V Privileged Architecture sometimes muddies the two terms.
For example, note that an *interrupt trap* saves the interrupted
instruction address to one of the *Exception* Program Counter CSRs
such as mepc or mnepc, implying that the term _exception_ includes
interrupts.

2.
I urge that we not commit to CSR addresses 0x350-0x353 for the RNMI
CSRs.

As part of the planning for a possible future extension to improve
support for nested hypervisors, a pattern has been tentatively
established for M, S, U, VS, and V2S variants of CSRs, such as mstatus,
sstatus, ustatus, vsstatus, and a future v2sstatus. The M-level
CSR addresses in the range 0x340-0x39F would be among those that can
support the full set of M, S, U, VS, and V2S variants of the same CSR.
But RNMI would never use the VS and V2S slots, and probably not the
U slot either. It would be better to relocate the RNMI CSRs to other,
more constrained addresses, and leave the more flexible locations open
for CSRs that may need them.

To best manage the remaining CSR address space, I propose that the main
responsibility for assigning CSR addresses be given to the new Opcode
and Consistency Review group.

3.
Greg Favor wrote:
- In mnstatus, shouldn't there also be a bit like the mstatus.MPV bit
(for when the H extension is implemented and enabled)?
Krste:
I'll let hypervisor authors address this.
Greg is correct. When the hypervisor extension is implemented, the
existing 2-bit mstatus.MPP is extended with another bit, mstatus.MPV.
The saved operating mode is fully represented by 3 bits, MPV and MPP
together.

I would suggest allocating mnstatus bit 13 for the MPV-like bit.

(I'd expect these two fields in mnstatus to be named either MPV and
MPP, or MNPV and MNPP.)

In anticipation of a possible future extension for nested hypervisors,
you should also keep bit 14 free for another bit (MPV2).

4.
Krste wrote:
rnmie should be settable but not clearable in M-mode to support
nesting NMIs.
I think, when you lay out exactly, step by step, how nested NMIs would
work, you'll discover that's inadequate. (Or, if I'm wrong, then you
have different plans for how it will work than I can guess, so, again,
it would be good to have this in writing as part of the proposal.)

- John Hauser





John Hauser
 

1.
I think some of the questions and discussion concerning this proposal
have arisen because the proposal is a little too terse. It would
be helpful to have more about the specific use cases that the RNMI
hardware is intended to support, and not support, and why, including
about nested NMIs (whether supported or not). There seems to be an
assumption that the author and readers are already in agreement about
these matters, and that's much too optimistic, I think.

Some of the feedback has concerned the intersection of NMIs with
synchronous exceptions, such as what happens if a synchronous exception
occurs while in an NMI handler. The proposal has this sentence:

RNMI also has an associated exception trap handler address, which
is implementation defined.

and then:

If the hart encounters an exception while the `rnmie` bit is
clear, the exception state is written to `mepc` and `mcause`,
`mstatus.mpp` is set to M-mode, and the hart jumps to the RNMI
exception handler address.

But that all went by too quickly for some reviewers, it seems. (It may
also not be adequate in conjuction with nested NMIs.)

Not every reader will immediately intuit the clear distinction you
intend between _interrupt_ and _exception_, especially as even the
RISC-V Privileged Architecture sometimes muddies the two terms.
For example, note that an *interrupt trap* saves the interrupted
instruction address to one of the *Exception* Program Counter CSRs
such as mepc or mnepc, implying that the term _exception_ includes
interrupts.

2.
I urge that we not commit to CSR addresses 0x350-0x353 for the RNMI
CSRs.

As part of the planning for a possible future extension to improve
support for nested hypervisors, a pattern has been tentatively
established for M, S, U, VS, and V2S variants of CSRs, such as mstatus,
sstatus, ustatus, vsstatus, and a future v2sstatus. The M-level
CSR addresses in the range 0x340-0x39F would be among those that can
support the full set of M, S, U, VS, and V2S variants of the same CSR.
But RNMI would never use the VS and V2S slots, and probably not the
U slot either. It would be better to relocate the RNMI CSRs to other,
more constrained addresses, and leave the more flexible locations open
for CSRs that may need them.

To best manage the remaining CSR address space, I propose that the main
responsibility for assigning CSR addresses be given to the new Opcode
and Consistency Review group.

3.
Greg Favor wrote:
- In mnstatus, shouldn't there also be a bit like the mstatus.MPV bit
(for when the H extension is implemented and enabled)?
Krste:
I'll let hypervisor authors address this.
Greg is correct. When the hypervisor extension is implemented, the
existing 2-bit mstatus.MPP is extended with another bit, mstatus.MPV.
The saved operating mode is fully represented by 3 bits, MPV and MPP
together.

I would suggest allocating mnstatus bit 13 for the MPV-like bit.

(I'd expect these two fields in mnstatus to be named either MPV and
MPP, or MNPV and MNPP.)

In anticipation of a possible future extension for nested hypervisors,
you should also keep bit 14 free for another bit (MPV2).

4.
Krste wrote:
rnmie should be settable but not clearable in M-mode to support
nesting NMIs.
I think, when you lay out exactly, step by step, how nested NMIs would
work, you'll discover that's inadequate. (Or, if I'm wrong, then you
have different plans for how it will work than I can guess, so, again,
it would be good to have this in writing as part of the proposal.)

- John Hauser