Re: Watchdog timer per hart?
Allen Baum
That's a bit looser a definition than I'd expect, but that explains your comments, certainly. Thx.
On Wed, Mar 2, 2022 at 5:14 PM Greg Favor <gfavor@...> wrote:
|
|
Re: Watchdog timer per hart?
Greg Favor
On Wed, Mar 2, 2022 at 4:54 PM Allen Baum <allen.baum@...> wrote:
Since the suitable response to a first or second stage timeout is rather system-specific, ARM didn't try to ordain exactly where the timeout signals go and what happens as a result. In SBSA they just described the general expected possibilities (which my previous remarks were based on). But here's what a 2020 version of BSA says (which is roughly similar to SBSA but a bit narrower in the possibilities it describes): The basic function of the Generic Watchdog is to count for a fixed period of time, during which it expects to be refreshed by the system indicating normal operation. If a refresh occurs within the watch period, the period is refreshed to the start. If the refresh does not occur then the watch period expires, and a signal is raised and a second watch period is begun. The initial signal is typically wired to an interrupt and alerts the system. The system can attempt to take corrective action that includes refreshing the watchdog within the second watch period. If the refresh is successful, the system returns to the previous normal operation. If it fails, then the second watch period expires and a second signal is generated. The signal is fed to a higher agent as an interrupt or reset for it to take executive action. Greg
|
|
Re: Watchdog timer per hart?
Allen Baum
Don't they even define whether restartability is required or not?
On Wed, Mar 2, 2022 at 4:00 PM Greg Favor <gfavor@...> wrote:
|
|
Re: Watchdog timer per hart?
Greg Favor
Even ARM SBSA allowed a lot of flexibility as to where the first-stage and second-stage timeout "signals" went (which ultimately then placed the handling in the hands of software somewhere). In other words, SBSA didn't prescribe the details of the overall watchdog handling picture. Greg
On Wed, Mar 2, 2022 at 2:35 PM Allen Baum <allen.baum@...> wrote:
|
|
Re: Watchdog timer per hart?
Allen Baum
Now we're starting to drill down appropriately. There is a wide range. This is me thinking out loud and trying desperately to avoid the real work I should be doing: - A watchdog time event can cause an interrupt (as opposed to a HW reset) -- maskable or non-maskable? -- Using xTVEC to vector or a platform defined vector.? (e.g. the reset vector) -- A new cause type or reuse an existing one? (e.g.using the reset cause) -- restartable or non-restartable or both? (both implies - to me at least- the 2 stage watchdog concept, "pulling the emergency cord") If the watchdog timer is restartable, either it must --- be maskable, or --- implement something like the restartable-NMI spec to be able to save state. -- what does "pulling the emergency cord" do? e.g. --- some kind of HW reset (we had a light reset at Intel that cleared as little as possible so that a post-mortem dump could identify what was going on) --- just vector to a SW handler (obviously this should depend on why the watchdog timer was activated, e.g. waiting for a HW event or SW event)
On Wed, Mar 2, 2022 at 12:41 PM Kumar Sankaran <ksankaran@...> wrote: From a platform standpoint, the intent was to have a single platform
|
|
Re: Watchdog timer per hart?
From a platform standpoint, the intent was to have a single platform
toggle quoted messageShow quoted text
level watchdog that is shared across the entire platform. This platform watchdog could be the 2-level watchdog as described below by Greg. Whether S-mode software or M-mode software would handle the tickling of this watchdog and handle timeouts is a subject for further discussion.
On Wed, Mar 2, 2022 at 12:34 PM Greg Favor <gfavor@...> wrote:
--
Regards Kumar
|
|
Re: Watchdog timer per hart?
Greg Favor
On Wed, Mar 2, 2022 at 12:23 PM Aaron Durbin <adurbin@...> wrote:
One comment - for when any concrete discussion about having a system-level watchdog occurs: One can have a one-stage or a two-stage watchdog. The former yanks the emergency cord on the system upon timeout. The latter (which is what ARM defined in SBSA and the subsequent SBA) interrupts the OS on the first timeout and gives it a chance to take remedial actions (and refresh the watchdog). Then, if a second timeout occurs (without a refresh after the first timeout), the emergency cord is yanked. ARM also defined separate Secure and Non-Secure watchdogs (akin to what one might call S-mode and M-mode watchdogs). The OS has its own watchdog to tickle and an emergency situation results in reboot of the OS (for example). And the Secure Monitor has its own watchdog and an emergency situation results in reboot of the system (for example). Greg
|
|
Re: Watchdog timer per hart?
Aaron Durbin
On Wed, Mar 2, 2022 at 1:19 PM Greg Favor <gfavor@...> wrote:
Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out.
|
|
Re: Watchdog timer per hart?
Greg Favor
A core-level watchdog can mean quite different things to different people and their core designs. In some cases this "watchdog" would be a micro-architectural thing that, for example, recognizes that the core is not making forward progress and would temporarily invoke some low-performance uarch mechanism that guarantees forward progress (out of the circumstances currently causing livelock). Although the details of that very much depend on what types of livelock causes one is concerned about. In other cases this "watchdog" might generate a local interrupt to take the core into a "lack of forward progress" software handler; or a global interrupt to inform someone else that this core is livelocked. In general, there's an enormous range of possibilities as to what a core-level watchdog means. And an enormous range as to what one is trying to accomplish or defend against. Greg
On Wed, Mar 2, 2022 at 12:09 PM James Robinson <jrobinson@...> wrote: Hi Aaron,
|
|
Re: Watchdog timer per hart?
James Robinson
Hi Aaron,
Thanks for the response. Would you be able to give any more details on how a core level watchdog would differ from a platform level one? James
|
|
Re: Watchdog timer per hart?
Aaron Durbin
On Wed, Mar 2, 2022 at 12:35 AM James Robinson <jrobinson@...> wrote: Hi Greg, If one is operating the machine with 16 harts without any sharding or partitioning, I don't see why one would need a watchdog per hart. System watchdogs, or TCO timers from other architecture's parlance, are for system use. Now a core would normally have it's own watchdog for instruction retirement forward progress purposes, but that's a completely different use-case than the intention of a system level watchdog. As for Greg's question about putting that in OS-A SEE or a Platform itself, I'm open to suggestions. However, my initial thinking is that it would be deferred to a Platform. The thinking is that OS-A SEE is about targeting SW expectations for the kernel. Kernels are really good about runtime binding of drivers based on the presence of hardware so I'm not overly inclined to mandate such things. That said, I'd be open to hear other opinions.
|
|
Re: Watchdog timer per hart?
James Robinson
Hi Greg,
Thanks for your response. I'm not sure if I'm missing something about there being a connection between having a supervisor level watchdog timer and having a timer per hart, but I wasn't particularly imagining a distinction between machine and supervisor mode watch dog timers. I'll repose the question I was thinking about: Suppose I have a system containing 16 harts. Should I have a separate WDCSR memory mapped register and associated counter for each of the 16 harts, with each counter directing an interrupt to its associated hart if it is not reset before the timeout expires? Or should I have one WDCSR memory mapped register and associated counter for the whole system, with the interrupt directed to one specific hart, and that hart being responsible for responding to a lack of timer update? Thanks, James
|
|
Re: Watchdog timer per hart?
Greg Favor
On Mon, Feb 28, 2022 at 6:18 PM James Robinson <jrobinson@...> wrote:
For now (this year) RVI is focusing on standardizing an initial OS-A SEE (Supervisor Execution Environment) and an OS-A Platform standardizing Supervisor and User level functionality, i.e. not Machine-level functionality. While that doesn't rule out incorporating some form of Supervisor-level watchdog standardization into these specs, I think (?) the current thoughts are not focused on doing so. FYI - Last year there was an initial proposal for standard hardware watchdog functionality, and then later a proposal instead for an SBI API (e.g. a call to tickle the supervisor watchdog, and a callback on a first-stage timeout). But certainly speak up with your own arguments or justifications for having and standardizing supervisor watchdog functionality. (Note: ARM SBSA - for server and high-end embedded class systems - defined and required the equivalent of S-mode (aka Non-Secure) and M-mode (aka Secure) two-stage watchdog functionality.) Aaron (acting chair of the OS-A SEE TG) and others in the OS-A SEE group, what do you think? Should some form of support for Supervisor software tickling a watchdog through some form of standardized hardware (e.g. memory-mapped registers) or software (e.g. SBI) interface be included in the OS-A SEE spec? Greg
|
|
Watchdog timer per hart?
James Robinson
Is it expected that there should be a watchdog timer and timeout signal per hart in the system, or is okay for there to be one timer in the system and for the timeout signal to be delivered to a specific hart?
|
|
Next Platform HSC Meeting on Wed Feb 23rd 2022 9AM PST
Hi All,
The next platform HSC meeting is scheduled on Wed Feb 23rd 2022 at 9AM PST. This meeting is moved to Wed as Monday Feb 21st is a holiday for President's Day in the US. Here are the details: Agenda and minutes kept on the github wiki: https://github.com/riscv/riscv-platform-specs/wiki Meeting info Zoom meeting: https://zoom.us/j/2786028446 Passcode: 901897 Or iPhone one-tap : US: +16465588656,,2786028466# or +16699006833,,2786028466# Or Telephone: Dial(for higher quality, dial a number based on your current location): US: +1 646 558 8656 or +1 669 900 6833 Meeting ID: 278 602 8446 International numbers available: https://zoom.us/zoomconference?m=_R0jyyScMETN7-xDLLRkUFxRAP07A-_ Regards Kumar
|
|
Re: Possible progress on M Platform?
Philipp Tomsich <philipp.tomsich@...>
Chris, The Platforms effort is being reorganized and we'll spin up a task group for RVM-CSI (a source-level abstraction framework) up in the near future. RVM-CSI is very much in focus (for the Software HC's ecosystem efforts), and the goal is to sprint towards a first draft late in the year. The discussions have been going on for a while and led to Alibaba donating their documentation and sources. The plan is to draw inspiration from Alibaba's donated abstraction layer (just as well as what our other members bring to the table), from what exists in competing ecosystems, and from the abstractions standardized in C17 a.k.a. ISO/IEC 9899:2018 (e.g., atomics, mutexes, conditions, threads, memory management) to specify a thin (as in "flexible" and "unintrusive"), best-in-class abstraction with C-language bindings for a self-contained small subset (enough to bring up a platform and handle interrupts) of the platform as phase 1. Follow-on efforts will then add abstractions for additional features of common platforms (e.g. common peripherals, networking, data processing, AI/ML, …) and standardize language bindings for C++ and Rust. Philipp.
On Mon, 7 Feb 2022 at 21:38, Greg Favor <gfavor@...> wrote:
|
|
Re: Possible progress on M Platform?
Greg Favor
Chris, I'm cc'ing the chairs of the Software HC and the Platforms HSC. All platform efforts are being re-organized a bit as we speak (compared to thus far one group was trying to address a number of needs at both HSC and TG levels but, not surprisingly, only able to focus on one TG effort). There will be a separate TG created to focus on the "M" platform (under a new name). Philipp and Kumar will be driving setting that up and will be interested to talk with you. Greg P.S. This email list also needs to be replaced by a set of lists (for the HSC and for each new SIG/TG), with appropriate new naming of each list.
On Mon, Feb 7, 2022 at 10:26 AM Chris Owen <Chris.Owen@...> wrote: Hi all,
|
|
Possible progress on M Platform?
Chris Owen
Hi all,
I lead the CPU software / SDK team at Imagination Technologies, we are entering the RISC-V space but I'm still quite new around here. At present we are most interested in embedded applications and I am particularly interested in standardising platform aspects in this area. For example, a Hardware Adaption Layer for bare-metal apps. I realise, though, that the M platform is rather on hold and all the focus is on OS-A. Is this the only mailing list for the Platforms HSC? Or is there another one which could be used for discussions around the M platform? I realise the title for this list says its for unix-class, but I didn't see any other... I was just wondering if we could start progressing M platform in parallel with OS-A. My team and I are ready and willing to engage in this area. I believe people like SiFive and AliBaba have done lots of work in this area and it would be great to bring them together and standardise rather than allowing things to fragment further. Thanks for any help and advice, Chris Owen
|
|
Re: [PATCH] UEFI: Add RISCV_EFI_BOOT_PROTOCOL requirement
Heinrich Schuchardt
On 1/31/22 08:43, Sunil V L wrote:
RISC-V UEFI systems need to support new RISCV_BOOT_PROTOCOL.nits: %s/RISCV_BOOT_PROTOCOL/RISCV_EFI_BOOT_PROTOCOL/ This protocol is required to communicate the boot hart IDThis new protocol is needed because ACPI cannot make use of the current device-tree based approach to transfer the boot hart ID to the next boot stage. The protocol has been implemented in upstream U-Boot (v2022.04-rc1, patch by Sunil). Acked-by: Heinrich Schuchardt <heinrich.schuchardt@...> ---
|
|
Configuration Structure Review
Tim Newsome
Hi all! I just sent this to tech-chairs, but due to the nature of your work Stephano suggested getting feedback here as well. The Configuration Structure task group has been working on how software can determine the capabilities/configuration of the hardware it is running on. At long last we have something that is ready for wider review. (Few people spend time in tech-config regularly, so just a few people have really read this spec so far. There might still be some glaring holes.) Please take some time to read and review https://github.com/riscv/configuration-structure/blob/master/riscv-configuration-structure-draft.adoc (3200 words). I'm looking forward to your feedback so we can make the spec better, and hopefully freeze it in a few weeks. Thank you, Tim
|
|