Re: Watchdog Spec Questions


Greg Favor
 

On Thu, Oct 28, 2021 at 12:44 PM Jonathan Behrens <behrensj@...> wrote:
Even the first-stage timeout would probably be more useful to the OS if it was a "non-maskable interrupt" or otherwise able to arrive even with sstatus.SIE bit unset.

Some system designers will feel that way and others won't.  The spec doesn't try to get into the business of telling a system designer how to incorporate and use it in their system.  These choices are implementation and/or platform specific.

In architectures like ARM (that doesn't have an NMI) and RISC-V (that barely architects anything, and doesn't require anything, about NMI functionality), treating the signal as an NMI is not or may not be an option.  Also note that what little "NMI" architecture RISC-V currently has, defines a nonrecoverable NMI.  Which means that an OS, even if it "recovers" from the first-stage timeout and refreshes the watchdog, it might still be screwed since an NMI is not guaranteed to be recoverable.

This obviously is an area where RISC-V can stand to develop a more-than-very-minimal NMI architecture.  (Support for recoverable NMIs being one part of such.)
 
If the system has locked up to the point that the watchdog timer is expiring, that probably means that normal timer interrupts aren't arriving to the OS. And that could be caused either by stimecmp being unset/configured wrong (unlikely) or because the OS has interrupts disabled and is either blocking or entered an infinite loop.

That's the point of the second-stage timeout, i.e. the OS didn't successfully respond to the first timeout and so something pretty bad is going on.  In which case one ideally wants that timeout signal to go to something other than a hart (e.g. a platform microcontroller or a hardware reset control block or ...).

Greg

Join tech-unixplatformspec@lists.riscv.org to automatically receive all group messages.