Re: [PATCH v1] System Peripherals - watchdog timer

Mayuresh Chitale

On Fri, Jul 9, 2021 at 8:54 AM Abner Chang <renba.chang@...> wrote:

Mayuresh Chitale <mchitale@...> 於 2021年7月7日 週三 上午1:42寫道:
This patch describes requirements for the watchdog timer
for the server extension.

Signed-off-by: Greg Favor <gfavor@...>
Signed-off-by: Mayuresh Chitale <mchitale@...>
 riscv-platform-spec.adoc | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/riscv-platform-spec.adoc b/riscv-platform-spec.adoc
index 87ab7f8..3b5728a 100644
--- a/riscv-platform-spec.adoc
+++ b/riscv-platform-spec.adoc
@@ -470,6 +470,28 @@[Sstc] extension.
 ** Platforms are required to delegate the supervisor timer interrupt to 'S'
 mode. If the 'H' extension is implemented then the platforms are required to
 delegate the virtual supervisor timer interrupt to 'VS' mode.
+===== Watchdog Timers
+Implementation of a two-stage watchdog timer, as defined in the WatchDog Timer
+appendix footnote:[Watchdog Timer Appendix (TBD)] ,is required. Software must
+periodically refresh the watchdog timer, otherwise a first-stage watchdog
+timeout occurs. If the watchdog timer remains un-refreshed for a second period,
+then a second-stage watchdog timeout occurs.
Does it mean the second-stage watchdog timer would be timeout 1 second after the first-stage watchdog timer has not been updated?
No, actually it just means that if the watchdog timer is never refreshed after the first stage timeout then a second stage timeout would occur. 
The mechanism to configure the timeout value is not specified in this patch but will be specified in the watchdog timer appendix which is TBD.

On the server platform, firmware usually updates the watchdog timer in the background even under OS in runtime. When the watchdog interrupt is triggered, the firmware takes over, sends the log to BMC, and asks BMC to either shutdown or reset the system according to scenarios. We don't like to see OS or hypervisor in the middle to delay the system recovery, for example, OS or processor is somehow halted and the temperature in the box goes pretty high rapidly.
Is 1 second too long for the crisis recovery? Should the first-stage watchdog timer be able to configure as either M or S mode interrupt? So firmware can handle the crisis immediately.


+If a first-stage watchdog timeout occurs, a Supervisor-level interrupt request
+is generated and sent to the system interrupt controller, targeting a specific
+If a second-stage watchdog timeout occurs, a system-level interrupt request is
+generated and sent to a system component more privileged than Supervisor-mode
+such as:
+- The system interrupt controller, with a Machine-level interrupt request
+targeting a specific hart
+- A platform management processor
+- Dedicated reset control logic
+The resultant action taken is platform-specific.
 * PCI-E

 ==== Secure Boot

Join to automatically receive all group messages.