Watchdog Spec Questions
Aaron Durbin
Hi, I have some questions related to the Watchdog spec found here: https://github.com/riscv-non-isa/riscv-watchdog/blob/main/riscv-watchdog.adoc 1. The spec goes to great lengths to describe the watchdog tick frequency in terms of MTIME frequency and a bit position within MTIME serving as a divider. However: "The choice of MTIME resolution and MTIME bit position for watchdog tick is platform specific and these parameters should be discoverable by software via platform-specific means. It is recommended that these parameters are chosen so as to provide a watchdog tick resolution between 0.1 sec and 1 sec, ensuring a maximum timeout period (WTOCNT=0x3FF) greater than 100 seconds." If the effective watchdog tick frequency is platform specific then it is my opinion that the only thing that should be specified as parameters for the watchdog block is its tick frequency. Why complicate it with an assumption of the backing clock when all that matters is the effective watchdog frequency. Guidance is already provided w.r.t. expectations. I suggest we simplify the spec. Existing implementations can provide their own determinism of the frequency by exposing some bit and mtime frequency, but that does not need to be true for all implementations. 2. We have the following statements w.r.t. WTOCNT: "The 10-bit WTOCNT value initializes a 10-bit timeout counter." "If timeout counter is now zero then if S1WTO=1 then set S2WTOelse set S1WTO and re-initialize the timeout counter with WTOCNT" There's a 'timeout counter' term being used. Is the intention that this timeout counter value actually be hidden? i.e. there's no way to read the timeout counter itself? If so, why is this timeout counter not a part of the register set? Thanks. -Aaron |
|
Aaron Durbin
Do people have any responses to my comments/questions? I can send a pull request to remove assumptions about implementation details. Please let me know. Lastly, where is this spec getting ratified/approved? It was my understanding that this list was the forum for conducting such discussions. Thank you. -Aaron On Thu, Oct 7, 2021 at 8:34 PM Aaron Durbin <adurbin@...> wrote:
|
|
Apologies for the delay in response, I am traveling currently.
From: <tech-unixplatformspec@...> on behalf of Aaron Durbin <adurbin@...>
Hi,
I have some questions related to the Watchdog spec found here: https://github.com/riscv-non-isa/riscv-watchdog/blob/main/riscv-watchdog.adoc
1. The spec goes to great lengths to describe the watchdog tick frequency in terms of MTIME frequency and a bit position within MTIME serving as a divider. However:
"The choice of MTIME resolution and MTIME bit position for watchdog tick is platform specific and these parameters should be discoverable by software via platform-specific means. It is recommended that these parameters are chosen so as to provide a watchdog tick resolution between 0.1 sec and 1 sec, ensuring a maximum timeout period (WTOCNT=0x3FF) greater than 100 seconds."
If the effective watchdog tick frequency is platform specific then it is my opinion that the only thing that should be specified as parameters for the watchdog block is its tick frequency. Why complicate it with an assumption of the backing clock when all that matters is the effective watchdog frequency. Guidance is already provided w.r.t. expectations. I suggest we simplify the spec. Existing implementations can provide their own determinism of the frequency by exposing some bit and mtime frequency, but that does not need to be true for all implementations.
[Anup] The MTIME frequency is already available to software (via DT or ACPI) so software only needs to know the MTIME bit position used for watchdog tick. This means we can either provide MTIME bit position OR effective watchdog tick frequency in DT or ACPI. We choose MTIME bit position to be available in DT or ACPI because:
2. We have the following statements w.r.t. WTOCNT:
"The 10-bit WTOCNT value initializes a 10-bit timeout counter."
"If timeout counter is now zero then if S1WTO=1 then set S2WTO
There's a 'timeout counter' term being used. Is the intention that this timeout counter value actually be hidden? i.e. there's no way to read the timeout counter itself? If so, why is this timeout counter not a part of the register set?
[Anup] We did not see any use-case for exposing “timeout counter” as separate registers from Linux watchdog framework perspective. If it is useful then we can certainly add read-only “timeout counter” register.
Regards, Anup
Thanks.
-Aaron
|
|
Greg Favor
On Thu, Oct 7, 2021 at 7:34 PM Aaron Durbin <adurbin@...> wrote:
I agree that all that needs to be discoverable is the watchdog tick period or frequency. Further, if a platform spec requires a specific frequency, then even that doesn't need to be discoverable.
There is no specific intention to hide the timeout counter. There just hasgn't appeared to be a justifiable need yet to support reading of it. Greg |
|
Aaron Durbin
On Thu, Oct 14, 2021 at 1:07 AM Greg Favor <gfavor@...> wrote:
OK. I'll send a pull request to remove implementation assumptions.
I think it's informative in that one can read the current timeout counter value and observe it tick. That has been useful in the past from my experience.
|
|
Aaron Durbin
On Thu, Oct 14, 2021 at 12:26 AM Anup Patel <Anup.Patel@...> wrote:
Yes, but that's assuming implementation. I don't see the need. The watchdog spec has expectations for timer tick frequency. Moreover, one needs to assemble information that is not directly discoverable in the register set in the spec so one has to defer to DT or ACPI for the necessary parameters anyway. It's simple to expose watchdog tick frequency while also not assuming the backing implementation.
It is informative in that one can observe it tick and stop according to the control register.
|
|
Greg Favor
On Thu, Oct 14, 2021 at 5:36 AM Aaron Durbin <adurbin@...> wrote:
This raises the question of how to represent or virtualize the current timeout counter value (with, of course, trap and emulate by a hypervisor) - as well as now needing to do this. I've heard some people argue for a clean "one-way" (write-only) interface by an OS to a watchdog for this reason. Greg |
|
Jonathan Behrens <behrensj@...>
I don't think doing trap-and-emulate to handle reads of the watchdog timer would be much of a problem for a hypervisor. That functionality would be needed for emulating almost any other common device. To be somewhat flippant, if someone is thinking of emulating this watchdog timer and it is anywhere close to the most complex device they're emulating, then something is probably going seriously wrong. Jonathan
|
|
Greg Favor
On Thu, Oct 14, 2021 at 10:14 AM Jonathan Behrens <behrensj@...> wrote:
I agree that M-mode software could emulate the idea of a counter and its continual changes in value, but that's just added software complexity that doesn't seem warranted. Also, T&E to M-mode raises concerns about a dependency on M-mode still being healthy while S-mode is not. In some cases, M-mode will still be healthy; in other cases M and S modes have become unhealthy and the watchdog functionality becomes disabled for the very cases that it is intended to be functional for. Greg |
|
Ved Shanbhogue
On 10/14/21 2:22 PM, Greg Favor wrote:
On Thu, Oct 14, 2021 at 10:14 AM Jonathan Behrens <behrensj@... <mailto:behrensj@...>> wrote:I am missing why M-mode needs to trap and emulate a virtual watchdog device access from a guest OS. That should be a hypervisor function. A hypervisor that passes through a virtual watchdog device to its guest has to trap and emulate the virtual watchdog device anyway - including having a virtual watchdog timer that is emulated by the hypervisor. regards ved |
|
Greg Favor
On Thu, Oct 14, 2021 at 12:32 PM Vedvyas Shanbhogue <ved@...> wrote: I am missing why M-mode needs to trap and emulate a virtual watchdog I was referring T&E of a host watchdog (from S/HS-mode to M-mode). A hypervisor that passes through a virtual watchdog device to its guest That isn't so clear as being the case in some or all hypervisors (i.e. that the watchdog device will be faithfully emulated). Just like all the power management stuff that a guest OS does is not simply 1-for-1 emulated literally by a hypervisor. Greg |
|
Ved Shanbhogue
On 10/14/21 2:47 PM, Greg Favor wrote:
On Thu, Oct 14, 2021 at 12:32 PM Vedvyas Shanbhogue <ved@... <mailto:ved@...>> wrote:Thanks. Sorry, I missed that your response was orthogonal and was not a response to Jonathon's comment that its not a problem for the hypervisor to emulate. I agree with you that I don't see under what circumstance M-mode would want to trap and emulate a host watchdog so M-mode complexity to T&E should be orthogonal to this topic. A hypervisor that passes through a virtual watchdog device to its guestI agree. It's an emulation and is only so good as the code written for emulation. Its not a hard device to emulate however considering everything else a hypervisor may need to emulate as Jonathon said earlier. regards ved |
|
Andrew Jones <drjones@...>
On Thu, Oct 14, 2021 at 03:00:30PM -0500, Vedvyas Shanbhogue wrote:
On 10/14/21 2:47 PM, Greg Favor wrote:Will the watchdog timer have hardware support for scaling and offsettingOn Thu, Oct 14, 2021 at 12:32 PM Vedvyas Shanbhogue <ved@...Thanks. Sorry, I missed that your response was orthogonal and was not a the virtual watchdog timer? Guest timekeeping is quite complex when one considers a guest may be paused at any time and migrated to other hosts which have different clock frequencies. Thanks, drew
|
|
Phil McCoy <pnm@...>
Is it intended/required that S1WTO and S2WTO be literal interrupts? In particular, it might be desirable for S2WTO to actually be an NMI or reset to recover a system that is not healthy enough to handle the S1WTO.
Thanks, Phil |
|
Greg Favor
Is it intended/required that S1WTO and S2WTO be literal interrupts? In particular, it might be desirable for S2WTO to actually be an NMI or reset to recover a system that is not healthy enough to handle the S1WTO. I agree. The text needs to describe the two timeout signals in more generic terms as simply signals indicating the occurrence of a timeout. It is up to a platform or implementation to decide what it does with those signals. (Typically the first-stage timeout would be a hart interrupt request directed to the OS to give it a chance to recover or gracefully react to the timeout condition, and the second-stage timeout would go somewhere else (whether as an "interrupt request", NMI, or whatever), e.g. to M-mode on a hart, or a platform microcontroller, or a hardware system block, or a BMC, or ...). Greg |
|
Ved Shanbhogue
On Thu, Oct 28, 2021 at 2:29 PM Greg Favor <gfavor@...> wrote:
The watchdog spec does not provide a means to configure an MSI destination for the S1WTO. Is that planned/discussed? regards ved |
|
Jonathan Behrens <behrensj@...>
Even the first-stage timeout would probably be more useful to the OS if it was a "non-maskable interrupt" or otherwise able to arrive even with sstatus.SIE bit unset. If the system has locked up to the point that the watchdog timer is expiring, that probably means that normal timer interrupts aren't arriving to the OS. And that could be caused either by stimecmp being unset/configured wrong (unlikely) or because the OS has interrupts disabled and is either blocking or entered an infinite loop. Jonathan
|
|
Greg Favor
On Thu, Oct 28, 2021 at 12:39 PM Vedvyas Shanbhogue <ved@...> wrote: The watchdog spec does not provide a means to configure an MSI The output signal from the watchdog can be (but doesn't have to be) treated as an interrupt request signal. That choice (and more generally where the signal goes to) are implementation and/or platform specific. The spec doesn't try to get into the business of telling a system designer how to incorporate and use it in their system. If this signal is treated as a wired interrupt request signal, and depending on the interrupt controller architecture being used, it can be connected to an input of a PLIC, a CLIC, or an AIA APLIC. In a full AIA-based system where MSIs are used to send around interrupt requests, the APLIC converts wired interrupts into message-based interrupts. Greg |
|
Greg Favor
On Thu, Oct 28, 2021 at 12:44 PM Jonathan Behrens <behrensj@...> wrote:
Some system designers will feel that way and others won't. The spec doesn't try to get into the business of telling a system designer how to incorporate and use it in their system. These choices are implementation and/or platform specific. In architectures like ARM (that doesn't have an NMI) and RISC-V (that barely architects anything, and doesn't require anything, about NMI functionality), treating the signal as an NMI is not or may not be an option. Also note that what little "NMI" architecture RISC-V currently has, defines a nonrecoverable NMI. Which means that an OS, even if it "recovers" from the first-stage timeout and refreshes the watchdog, it might still be screwed since an NMI is not guaranteed to be recoverable. This obviously is an area where RISC-V can stand to develop a more-than-very-minimal NMI architecture. (Support for recoverable NMIs being one part of such.)
That's the point of the second-stage timeout, i.e. the OS didn't successfully respond to the first timeout and so something pretty bad is going on. In which case one ideally wants that timeout signal to go to something other than a hart (e.g. a platform microcontroller or a hardware reset control block or ...). Greg |
|
On 15/10/21, 2:34 PM, "tech-unixplatformspec@... on behalf of Andrew Jones" <tech-unixplatformspec@... on behalf of drjones@...> wrote:
On Thu, Oct 14, 2021 at 03:00:30PM -0500, Vedvyas Shanbhogue wrote: > On 10/14/21 2:47 PM, Greg Favor wrote: > > On Thu, Oct 14, 2021 at 12:32 PM Vedvyas Shanbhogue <ved@... > > <mailto:ved@...>> wrote: > > > > I am missing why M-mode needs to trap and emulate a virtual watchdog > > device access from a guest OS. That should be a hypervisor function. > > > > > > I was referring T&E of a host watchdog (from S/HS-mode to M-mode). > > > Thanks. Sorry, I missed that your response was orthogonal and was not a > response to Jonathon's comment that its not a problem for the hypervisor to > emulate. > > I agree with you that I don't see under what circumstance M-mode would want > to trap and emulate a host watchdog so M-mode complexity to T&E should be > orthogonal to this topic. > > > > A hypervisor that passes through a virtual watchdog device to its guest > > has to trap and emulate the virtual watchdog device anyway - including > > having a virtual watchdog timer that is emulated by the hypervisor. > > > > > > That isn't so clear as being the case in some or all hypervisors (i.e. > > that the watchdog device will be faithfully emulated). Just like all > > the power management stuff that a guest OS does is not simply 1-for-1 > > emulated literally by a hypervisor. > > > I agree. It's an emulation and is only so good as the code written for > emulation. Its not a hard device to emulate however considering everything > else a hypervisor may need to emulate as Jonathon said earlier. Will the watchdog timer have hardware support for scaling and offsetting the virtual watchdog timer? Guest timekeeping is quite complex when one considers a guest may be paused at any time and migrated to other hosts which have different clock frequencies. [Anup] Apologies for missing this questions. [Anup] The watchdog timer does not define any scaling and offsetting support for virtualization because current expectation is that hypervisors will emulate Guest/VM watchdog totally in software. [Anup] As suggested by in this email thread, it's better to expose only effective watchdog tick frequency via DT/ACPI. This will further simplify Guest/VM migration because hypervisors can emulate a fixed watchdog tick frequency on both source and destination hosts irrespective of the MTIME frequency on these hosts. Regards, Anup Thanks, drew > > regards > ved > > > > > > |
|