Date   

Handoff between secure firmware and non-secure Firmware via HOB lists

Heinrich Schuchardt
 

Currently the SBI specification defines how to hand device-trees from the SEE to the S-mode firmware.

In the context of Trusted Firmware A a document has been developed describing what a more generic handover structure may look like that will also encompass ACPI tables and additional information like TPM measurements.

https://developer.arm.com/documentation/den0135/a

As probably EDK II and U-Boot will adopt parsing this structure it would make sense to discuss if the same can be used in the RISC-V world too.

Best regards

Heinrich


Next Platform HSC Meeting on Mon Apr 4th 2022 8AM PST

Kumar Sankaran
 

Hi All,
The next platform HSC meeting is scheduled on Mon Apr 4th 2022 at 8AM PST.

Here are the details:

Agenda and minutes kept on the github wiki:
https://github.com/riscv/riscv-platform-specs/wiki

Here are the slides:
https://docs.google.com/presentation/d/1yRfVWjIqKK0QvjAx-oaFYWjxATegQAqB1zxsOZhPbxM/edit#slide=id.g120b4f4f100_0_0

Meeting info
Zoom meeting: https://zoom.us/j/2786028446
Passcode: 901897

Or iPhone one-tap :
US: +16465588656,,2786028466# or +16699006833,,2786028466# Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 646 558 8656 or +1 669 900 6833
Meeting ID: 278 602 8446
International numbers available:
https://zoom.us/zoomconference?m=_R0jyyScMETN7-xDLLRkUFxRAP07A-_

Regards
Kumar


[PATCH] Fix typos in introduction for RISCV_EFI_BOOT_PROTOCOL

Heinrich Schuchardt
 

UEFI uses to talk of configuration tables not of firmware tables.

Add missing 'the', 'and'.

Enhance readability of sentence concerning ExitBootServices().

Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@...>
---
boot_protocol.adoc | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/boot_protocol.adoc b/boot_protocol.adoc
index 12846b6..5c56edd 100644
--- a/boot_protocol.adoc
+++ b/boot_protocol.adoc
@@ -1,7 +1,7 @@
[[boot_protocol]]
=3D=3D RISCV_EFI_BOOT_PROTOCOL
Either Device Tree (DT) or Advanced Configuration and Power Interface (A=
CPI)
-firmware tables are used to convey the information about hardware to the
+configuration tables are used to convey the information about hardware t=
o the
Operating Systems. Some of the information are known only at boot time a=
nd
needed very early before the Operating Systems/boot loaders can parse th=
e
firmware tables.=20
@@ -9,16 +9,17 @@ firmware tables.
One example is the boot hartid on RISC-V systems. On non-UEFI systems, t=
his is
typically passed as an argument to the kernel (in a0). However, UEFI sys=
tems need
to follow UEFI application calling conventions and hence it can not be p=
assed in
-a0. There is an existing solution which uses /chosen node in DT based sy=
stems to
-pass this information. However, this solution doesn't work for ACPI base=
d
+a0. There is an existing solution which uses the /chosen node in DT base=
d systems
+to pass this information. However, this solution doesn't work for ACPI b=
ased
systems. Hence, a UEFI protocol is preferred for both DT and ACPI based =
systems.
=20
This UEFI protocol for RISC-V systems provides early information to the
-bootloaders or Operating Systems. Firmwares like EDK2/u-boot need to imp=
lement
-this protocol on RISC-V UEFI systems.
+bootloaders or Operating Systems. Firmwares like EDK2 and u-boot need to
+implement this protocol on RISC-V UEFI systems.
=20
-This protocol is typically used by the bootloaders before *ExitBootServi=
ces()*
-call and pass the information to the Operating Systems.
+This protocol is typically called by the bootloaders before invoking
+*ExitBootServices()*. They then pass the information to the Operating
+Systems.
=20
The version of RISCV_EFI_BOOT_PROTOCOL specified by this specification i=
s
0x00010000. All future revisions must be backwards compatible. If a new =
version
--=20
2.34.1


Re: Public review of RISC-V UEFI Protocol Specification

Anup Patel
 

(Resending for RISC-V ISA-DEV and RISC-V SW-DEV because previous email
was not received on these lists.)

On Wed, Mar 23, 2022 at 9:46 PM Sunil V L <sunilvl@...> wrote:

This is to announce the start of the public review period for
the RISC-V UEFI Protocol specification. This specification is
considered as frozen now as per the RISC-V International policies.

The review period begins today, Wednesday March 23, and ends on Friday
May 6 (inclusive).

The specification can be found here
https://github.com/riscv-non-isa/riscv-uefi/releases/download/1.0-rc3/RISCV_UEFI_PROTOCOL-spec.pdf

which was generated from the source available in the following GitHub
repo:
https://github.com/riscv-non-isa/riscv-uefi

The specification is also attached in this email.

To respond to the public review, please either reply to this email or
send comments to the platform mailing list[1] or add issues to the
GitHub repo[2]. We welcome all input and appreciate your time and
effort in helping us by reviewing the specification.

During the public review period, corrections, comments, and
suggestions, will be gathered for review by the Platform HSC members.
Any minor corrections and/or uncontroversial changes will be
incorporated into the specification. Any remaining issues or proposed changes
will be addressed in the public review summary report. If there are no
issues that require incompatible changes to the public review
specification, the platform HSC will recommend the updated
specifications be approved and ratified by the RISC-V Technical
Steering Committee and the RISC-V Board of Directors.

Thanks to all the contributors for all their hard work.

[1] tech-unixplatformspec@...
[2] https://github.com/riscv-non-isa/riscv-uefi/issues

Regards
Sunil






Public review of RISC-V UEFI Protocol Specification

Sunil V L
 

This is to announce the start of the public review period for
the RISC-V UEFI Protocol specification. This specification is
considered as frozen now as per the RISC-V International policies.

The review period begins today, Wednesday March 23, and ends on Friday
May 6 (inclusive).

The specification can be found here
https://github.com/riscv-non-isa/riscv-uefi/releases/download/1.0-rc3/RISCV_UEFI_PROTOCOL-spec.pdf

which was generated from the source available in the following GitHub
repo:
https://github.com/riscv-non-isa/riscv-uefi

The specification is also attached in this email.

To respond to the public review, please either reply to this email or
send comments to the platform mailing list[1] or add issues to the
GitHub repo[2]. We welcome all input and appreciate your time and
effort in helping us by reviewing the specification.

During the public review period, corrections, comments, and
suggestions, will be gathered for review by the Platform HSC members.
Any minor corrections and/or uncontroversial changes will be
incorporated into the specification. Any remaining issues or proposed changes
will be addressed in the public review summary report. If there are no
issues that require incompatible changes to the public review
specification, the platform HSC will recommend the updated
specifications be approved and ratified by the RISC-V Technical
Steering Committee and the RISC-V Board of Directors.

Thanks to all the contributors for all their hard work.

[1] tech-unixplatformspec@...
[2] https://github.com/riscv-non-isa/riscv-uefi/issues

Regards
Sunil


Next Platform HSC Meeting

Kumar Sankaran
 

Hi All,

Due to lack of a full agenda, I am canceling the next platform HSC meeting on Monday Mar 21st 2022. This way, people can use this time to attend other RISC-V meetings.

 

In terms of the discussion topics, the following is the status of the OS-A SEE TG from Aaron Durbin who is the acting Chair of this group.

    1. Proposed Charter agreement
    2. Charter approval
    3. Call for Chairs
    4. Flesh out spec.

 

I will update the above to the meeting minutes for Mar 21st 2022.

 

Agenda and minutes kept on the github wiki:

https://github.com/riscv/riscv-platform-specs/wiki

 

Regards

Kumar


OS-A SEE TG Infrastructure

Aaron Durbin
 

Hi All,

I wanted to point out that we have GitHub repositories and a mailing list for OS-A SEE (Supervisor Execution Environment) TG. Please join if you are interested.

GitHub Spec: https://github.com/riscv-non-isa/riscv-os-a-see

Nothing yet has been seeded from a OS-A SEE perspective in those repos. The OS-A SEE TG is still in the Inception phase. As such, we need to nail down a charter. My plan was to submit a preliminary charter to the admin repository as a starting point for us to work on. We will also need to call for chairs for the OS-A SEE TG as well. That's the initial first steps, and then we can pull pieces from the existing OS-A platform spec (https://github.com/riscv/riscv-platform-specs/blob/main/riscv-platform-spec.adoc) that adhere and follow the approved charter for OS-A SEE TG to get the spec rolling. The OS-A Platform will then depend on the OS-A SEE spec (which transitively will depend on the RVA22S64 Profile).

For those that missed the memo, this TG is a part of a broader Platform reorg as detailed here: https://docs.google.com/presentation/d/1gldII0Gziyz2ajwgT8z5vPhw_HBuUmCJWHOiiQ4CAVs/edit#slide=id.g116d4f1a24e_0_678

I look forward to working with all of you. And, please feel free to use this thread to provide any feedback or thoughts on direction for the OS-A SEE TG.

-Aaron


Next Platform HSC Meeting on Mon Mar 7th 2022 8AM PST

Kumar Sankaran
 

Hi All,
The next platform HSC meeting is scheduled on Mon Mar 7th 2022 at 8AM PST.

Here are the details:

Agenda and minutes kept on the github wiki:
https://github.com/riscv/riscv-platform-specs/wiki

Here are the slides:
https://docs.google.com/presentation/d/1gldII0Gziyz2ajwgT8z5vPhw_HBuUmCJWHOiiQ4CAVs/edit#slide=id.g116d4f1a24e_0_685

Meeting info
Zoom meeting: https://zoom.us/j/2786028446
Passcode: 901897

Or iPhone one-tap :
US: +16465588656,,2786028466# or +16699006833,,2786028466# Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 646 558 8656 or +1 669 900 6833
Meeting ID: 278 602 8446
International numbers available:
https://zoom.us/zoomconference?m=_R0jyyScMETN7-xDLLRkUFxRAP07A-_

Regards
Kumar


Re: Watchdog timer per hart?

Allen Baum
 

That's a bit looser a definition than I'd expect, but that explains your comments, certainly. Thx.

On Wed, Mar 2, 2022 at 5:14 PM Greg Favor <gfavor@...> wrote:
On Wed, Mar 2, 2022 at 4:54 PM Allen Baum <allen.baum@...> wrote:
Don't they even define whether restartability is required or not?

Since the suitable response to a first or second stage timeout is rather system-specific, ARM didn't try to ordain exactly where the timeout signals go and what happens as a result.  In SBSA they just described the general expected possibilities (which my previous remarks were based on).  But here's what a 2020 version of BSA says (which is roughly similar to SBSA but a bit narrower in the possibilities it describes):

The basic function of the Generic Watchdog is to count for a fixed period of time, during which it expects to be
refreshed by the system indicating normal operation. If a refresh occurs within the watch period, the period is
refreshed to the start. If the refresh does not occur then the watch period expires, and a signal is raised and a
second watch period is begun.

The initial signal is typically wired to an interrupt and alerts the system. The system can attempt to take
corrective action that includes refreshing the watchdog within the second watch period. If the refresh is
successful, the system returns to the previous normal operation. If it fails, then the second watch period
expires and a second signal is generated. The signal is fed to a higher agent as an interrupt or reset for it to
take executive action.

Greg
 

On Wed, Mar 2, 2022 at 4:00 PM Greg Favor <gfavor@...> wrote:
Even ARM SBSA allowed a lot of flexibility as to where the first-stage and second-stage timeout "signals" went (which ultimately then placed the handling in the hands of software somewhere).  In other words, SBSA didn't prescribe the details of the overall watchdog handling picture.

Greg

On Wed, Mar 2, 2022 at 2:35 PM Allen Baum <allen.baum@...> wrote:
Now we're starting to drill down appropriately. There is a wide range.
This is me thinking out loud and trying desperately to avoid the real work I should be doing:

 - A watchdog time event can cause an interrupt (as opposed to a HW reset)
  -- maskable or non-maskable? 
  -- Using xTVEC to vector or a platform defined vector.? (e.g. the reset vector)
  -- A new cause type or reuse an existing one? (e.g.using the reset cause)
  -- restartable or non-restartable or both? (both implies - to me at least-  the 2 stage watchdog concept, "pulling the emergency cord")
      If the watchdog timer is restartable, either it must
        --- be maskable, or 
        --- implement something like the restartable-NMI spec to be able to save state.
   -- what does "pulling the emergency cord" do? e.g. 
       --- some kind of HW reset (we had a light reset at Intel that cleared as little as possible so that a post-mortem dump could identify what was going on)
       --- just vector to a SW handler (obviously this should depend on why the watchdog timer was activated, e.g. waiting for a HW event or SW event)


On Wed, Mar 2, 2022 at 12:41 PM Kumar Sankaran <ksankaran@...> wrote:
From a platform standpoint, the intent was to have a single platform
level watchdog that is shared across the entire platform. This
platform watchdog could be the 2-level watchdog as described below by
Greg. Whether S-mode software or M-mode software would handle the
tickling of this watchdog and handle timeouts is a subject for further
discussion.

On Wed, Mar 2, 2022 at 12:34 PM Greg Favor <gfavor@...> wrote:
>
> On Wed, Mar 2, 2022 at 12:23 PM Aaron Durbin <adurbin@...> wrote:
>>
>> Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out.
>
>
> One comment - for when any concrete discussion about having a system-level watchdog occurs:
>
> One can have a one-stage or a two-stage watchdog.  The former yanks the emergency cord on the system upon timeout.
>
> The latter (which is what ARM defined in SBSA and the subsequent SBA) interrupts the OS on the first timeout and gives it a chance to take remedial actions (and refresh the watchdog).  Then, if a second timeout occurs (without a refresh after the first timeout), the emergency cord is yanked.
>
> ARM also defined separate Secure and Non-Secure watchdogs (akin to what one might call S-mode and M-mode watchdogs).  The OS has its own watchdog to tickle and an emergency situation results in reboot of the OS (for example).  And the Secure Monitor has its own watchdog and an emergency situation results in reboot of the system (for example).
>
> Greg
>
>



--
Regards
Kumar






Re: Watchdog timer per hart?

Greg Favor
 

On Wed, Mar 2, 2022 at 4:54 PM Allen Baum <allen.baum@...> wrote:
Don't they even define whether restartability is required or not?

Since the suitable response to a first or second stage timeout is rather system-specific, ARM didn't try to ordain exactly where the timeout signals go and what happens as a result.  In SBSA they just described the general expected possibilities (which my previous remarks were based on).  But here's what a 2020 version of BSA says (which is roughly similar to SBSA but a bit narrower in the possibilities it describes):

The basic function of the Generic Watchdog is to count for a fixed period of time, during which it expects to be
refreshed by the system indicating normal operation. If a refresh occurs within the watch period, the period is
refreshed to the start. If the refresh does not occur then the watch period expires, and a signal is raised and a
second watch period is begun.

The initial signal is typically wired to an interrupt and alerts the system. The system can attempt to take
corrective action that includes refreshing the watchdog within the second watch period. If the refresh is
successful, the system returns to the previous normal operation. If it fails, then the second watch period
expires and a second signal is generated. The signal is fed to a higher agent as an interrupt or reset for it to
take executive action.

Greg
 

On Wed, Mar 2, 2022 at 4:00 PM Greg Favor <gfavor@...> wrote:
Even ARM SBSA allowed a lot of flexibility as to where the first-stage and second-stage timeout "signals" went (which ultimately then placed the handling in the hands of software somewhere).  In other words, SBSA didn't prescribe the details of the overall watchdog handling picture.

Greg

On Wed, Mar 2, 2022 at 2:35 PM Allen Baum <allen.baum@...> wrote:
Now we're starting to drill down appropriately. There is a wide range.
This is me thinking out loud and trying desperately to avoid the real work I should be doing:

 - A watchdog time event can cause an interrupt (as opposed to a HW reset)
  -- maskable or non-maskable? 
  -- Using xTVEC to vector or a platform defined vector.? (e.g. the reset vector)
  -- A new cause type or reuse an existing one? (e.g.using the reset cause)
  -- restartable or non-restartable or both? (both implies - to me at least-  the 2 stage watchdog concept, "pulling the emergency cord")
      If the watchdog timer is restartable, either it must
        --- be maskable, or 
        --- implement something like the restartable-NMI spec to be able to save state.
   -- what does "pulling the emergency cord" do? e.g. 
       --- some kind of HW reset (we had a light reset at Intel that cleared as little as possible so that a post-mortem dump could identify what was going on)
       --- just vector to a SW handler (obviously this should depend on why the watchdog timer was activated, e.g. waiting for a HW event or SW event)


On Wed, Mar 2, 2022 at 12:41 PM Kumar Sankaran <ksankaran@...> wrote:
From a platform standpoint, the intent was to have a single platform
level watchdog that is shared across the entire platform. This
platform watchdog could be the 2-level watchdog as described below by
Greg. Whether S-mode software or M-mode software would handle the
tickling of this watchdog and handle timeouts is a subject for further
discussion.

On Wed, Mar 2, 2022 at 12:34 PM Greg Favor <gfavor@...> wrote:
>
> On Wed, Mar 2, 2022 at 12:23 PM Aaron Durbin <adurbin@...> wrote:
>>
>> Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out.
>
>
> One comment - for when any concrete discussion about having a system-level watchdog occurs:
>
> One can have a one-stage or a two-stage watchdog.  The former yanks the emergency cord on the system upon timeout.
>
> The latter (which is what ARM defined in SBSA and the subsequent SBA) interrupts the OS on the first timeout and gives it a chance to take remedial actions (and refresh the watchdog).  Then, if a second timeout occurs (without a refresh after the first timeout), the emergency cord is yanked.
>
> ARM also defined separate Secure and Non-Secure watchdogs (akin to what one might call S-mode and M-mode watchdogs).  The OS has its own watchdog to tickle and an emergency situation results in reboot of the OS (for example).  And the Secure Monitor has its own watchdog and an emergency situation results in reboot of the system (for example).
>
> Greg
>
>



--
Regards
Kumar






Re: Watchdog timer per hart?

Allen Baum
 

Don't they even define whether restartability is required or not?

On Wed, Mar 2, 2022 at 4:00 PM Greg Favor <gfavor@...> wrote:
Even ARM SBSA allowed a lot of flexibility as to where the first-stage and second-stage timeout "signals" went (which ultimately then placed the handling in the hands of software somewhere).  In other words, SBSA didn't prescribe the details of the overall watchdog handling picture.

Greg

On Wed, Mar 2, 2022 at 2:35 PM Allen Baum <allen.baum@...> wrote:
Now we're starting to drill down appropriately. There is a wide range.
This is me thinking out loud and trying desperately to avoid the real work I should be doing:

 - A watchdog time event can cause an interrupt (as opposed to a HW reset)
  -- maskable or non-maskable? 
  -- Using xTVEC to vector or a platform defined vector.? (e.g. the reset vector)
  -- A new cause type or reuse an existing one? (e.g.using the reset cause)
  -- restartable or non-restartable or both? (both implies - to me at least-  the 2 stage watchdog concept, "pulling the emergency cord")
      If the watchdog timer is restartable, either it must
        --- be maskable, or 
        --- implement something like the restartable-NMI spec to be able to save state.
   -- what does "pulling the emergency cord" do? e.g. 
       --- some kind of HW reset (we had a light reset at Intel that cleared as little as possible so that a post-mortem dump could identify what was going on)
       --- just vector to a SW handler (obviously this should depend on why the watchdog timer was activated, e.g. waiting for a HW event or SW event)


On Wed, Mar 2, 2022 at 12:41 PM Kumar Sankaran <ksankaran@...> wrote:
From a platform standpoint, the intent was to have a single platform
level watchdog that is shared across the entire platform. This
platform watchdog could be the 2-level watchdog as described below by
Greg. Whether S-mode software or M-mode software would handle the
tickling of this watchdog and handle timeouts is a subject for further
discussion.

On Wed, Mar 2, 2022 at 12:34 PM Greg Favor <gfavor@...> wrote:
>
> On Wed, Mar 2, 2022 at 12:23 PM Aaron Durbin <adurbin@...> wrote:
>>
>> Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out.
>
>
> One comment - for when any concrete discussion about having a system-level watchdog occurs:
>
> One can have a one-stage or a two-stage watchdog.  The former yanks the emergency cord on the system upon timeout.
>
> The latter (which is what ARM defined in SBSA and the subsequent SBA) interrupts the OS on the first timeout and gives it a chance to take remedial actions (and refresh the watchdog).  Then, if a second timeout occurs (without a refresh after the first timeout), the emergency cord is yanked.
>
> ARM also defined separate Secure and Non-Secure watchdogs (akin to what one might call S-mode and M-mode watchdogs).  The OS has its own watchdog to tickle and an emergency situation results in reboot of the OS (for example).  And the Secure Monitor has its own watchdog and an emergency situation results in reboot of the system (for example).
>
> Greg
>
>



--
Regards
Kumar






Re: Watchdog timer per hart?

Greg Favor
 

Even ARM SBSA allowed a lot of flexibility as to where the first-stage and second-stage timeout "signals" went (which ultimately then placed the handling in the hands of software somewhere).  In other words, SBSA didn't prescribe the details of the overall watchdog handling picture.

Greg


On Wed, Mar 2, 2022 at 2:35 PM Allen Baum <allen.baum@...> wrote:
Now we're starting to drill down appropriately. There is a wide range.
This is me thinking out loud and trying desperately to avoid the real work I should be doing:

 - A watchdog time event can cause an interrupt (as opposed to a HW reset)
  -- maskable or non-maskable? 
  -- Using xTVEC to vector or a platform defined vector.? (e.g. the reset vector)
  -- A new cause type or reuse an existing one? (e.g.using the reset cause)
  -- restartable or non-restartable or both? (both implies - to me at least-  the 2 stage watchdog concept, "pulling the emergency cord")
      If the watchdog timer is restartable, either it must
        --- be maskable, or 
        --- implement something like the restartable-NMI spec to be able to save state.
   -- what does "pulling the emergency cord" do? e.g. 
       --- some kind of HW reset (we had a light reset at Intel that cleared as little as possible so that a post-mortem dump could identify what was going on)
       --- just vector to a SW handler (obviously this should depend on why the watchdog timer was activated, e.g. waiting for a HW event or SW event)


On Wed, Mar 2, 2022 at 12:41 PM Kumar Sankaran <ksankaran@...> wrote:
From a platform standpoint, the intent was to have a single platform
level watchdog that is shared across the entire platform. This
platform watchdog could be the 2-level watchdog as described below by
Greg. Whether S-mode software or M-mode software would handle the
tickling of this watchdog and handle timeouts is a subject for further
discussion.

On Wed, Mar 2, 2022 at 12:34 PM Greg Favor <gfavor@...> wrote:
>
> On Wed, Mar 2, 2022 at 12:23 PM Aaron Durbin <adurbin@...> wrote:
>>
>> Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out.
>
>
> One comment - for when any concrete discussion about having a system-level watchdog occurs:
>
> One can have a one-stage or a two-stage watchdog.  The former yanks the emergency cord on the system upon timeout.
>
> The latter (which is what ARM defined in SBSA and the subsequent SBA) interrupts the OS on the first timeout and gives it a chance to take remedial actions (and refresh the watchdog).  Then, if a second timeout occurs (without a refresh after the first timeout), the emergency cord is yanked.
>
> ARM also defined separate Secure and Non-Secure watchdogs (akin to what one might call S-mode and M-mode watchdogs).  The OS has its own watchdog to tickle and an emergency situation results in reboot of the OS (for example).  And the Secure Monitor has its own watchdog and an emergency situation results in reboot of the system (for example).
>
> Greg
>
>



--
Regards
Kumar






Re: Watchdog timer per hart?

Allen Baum
 

Now we're starting to drill down appropriately. There is a wide range.
This is me thinking out loud and trying desperately to avoid the real work I should be doing:

 - A watchdog time event can cause an interrupt (as opposed to a HW reset)
  -- maskable or non-maskable? 
  -- Using xTVEC to vector or a platform defined vector.? (e.g. the reset vector)
  -- A new cause type or reuse an existing one? (e.g.using the reset cause)
  -- restartable or non-restartable or both? (both implies - to me at least-  the 2 stage watchdog concept, "pulling the emergency cord")
      If the watchdog timer is restartable, either it must
        --- be maskable, or 
        --- implement something like the restartable-NMI spec to be able to save state.
   -- what does "pulling the emergency cord" do? e.g. 
       --- some kind of HW reset (we had a light reset at Intel that cleared as little as possible so that a post-mortem dump could identify what was going on)
       --- just vector to a SW handler (obviously this should depend on why the watchdog timer was activated, e.g. waiting for a HW event or SW event)


On Wed, Mar 2, 2022 at 12:41 PM Kumar Sankaran <ksankaran@...> wrote:
From a platform standpoint, the intent was to have a single platform
level watchdog that is shared across the entire platform. This
platform watchdog could be the 2-level watchdog as described below by
Greg. Whether S-mode software or M-mode software would handle the
tickling of this watchdog and handle timeouts is a subject for further
discussion.

On Wed, Mar 2, 2022 at 12:34 PM Greg Favor <gfavor@...> wrote:
>
> On Wed, Mar 2, 2022 at 12:23 PM Aaron Durbin <adurbin@...> wrote:
>>
>> Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out.
>
>
> One comment - for when any concrete discussion about having a system-level watchdog occurs:
>
> One can have a one-stage or a two-stage watchdog.  The former yanks the emergency cord on the system upon timeout.
>
> The latter (which is what ARM defined in SBSA and the subsequent SBA) interrupts the OS on the first timeout and gives it a chance to take remedial actions (and refresh the watchdog).  Then, if a second timeout occurs (without a refresh after the first timeout), the emergency cord is yanked.
>
> ARM also defined separate Secure and Non-Secure watchdogs (akin to what one might call S-mode and M-mode watchdogs).  The OS has its own watchdog to tickle and an emergency situation results in reboot of the OS (for example).  And the Secure Monitor has its own watchdog and an emergency situation results in reboot of the system (for example).
>
> Greg
>
>



--
Regards
Kumar






Re: Watchdog timer per hart?

Kumar Sankaran
 

From a platform standpoint, the intent was to have a single platform
level watchdog that is shared across the entire platform. This
platform watchdog could be the 2-level watchdog as described below by
Greg. Whether S-mode software or M-mode software would handle the
tickling of this watchdog and handle timeouts is a subject for further
discussion.

On Wed, Mar 2, 2022 at 12:34 PM Greg Favor <gfavor@...> wrote:

On Wed, Mar 2, 2022 at 12:23 PM Aaron Durbin <adurbin@...> wrote:

Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out.

One comment - for when any concrete discussion about having a system-level watchdog occurs:

One can have a one-stage or a two-stage watchdog. The former yanks the emergency cord on the system upon timeout.

The latter (which is what ARM defined in SBSA and the subsequent SBA) interrupts the OS on the first timeout and gives it a chance to take remedial actions (and refresh the watchdog). Then, if a second timeout occurs (without a refresh after the first timeout), the emergency cord is yanked.

ARM also defined separate Secure and Non-Secure watchdogs (akin to what one might call S-mode and M-mode watchdogs). The OS has its own watchdog to tickle and an emergency situation results in reboot of the OS (for example). And the Secure Monitor has its own watchdog and an emergency situation results in reboot of the system (for example).

Greg

--
Regards
Kumar


Re: Watchdog timer per hart?

Greg Favor
 

On Wed, Mar 2, 2022 at 12:23 PM Aaron Durbin <adurbin@...> wrote:
Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out. 

One comment - for when any concrete discussion about having a system-level watchdog occurs:

One can have a one-stage or a two-stage watchdog.  The former yanks the emergency cord on the system upon timeout.  

The latter (which is what ARM defined in SBSA and the subsequent SBA) interrupts the OS on the first timeout and gives it a chance to take remedial actions (and refresh the watchdog).  Then, if a second timeout occurs (without a refresh after the first timeout), the emergency cord is yanked.

ARM also defined separate Secure and Non-Secure watchdogs (akin to what one might call S-mode and M-mode watchdogs).  The OS has its own watchdog to tickle and an emergency situation results in reboot of the OS (for example).  And the Secure Monitor has its own watchdog and an emergency situation results in reboot of the system (for example).

Greg


Re: Watchdog timer per hart?

Aaron Durbin
 

On Wed, Mar 2, 2022 at 1:19 PM Greg Favor <gfavor@...> wrote:
A core-level watchdog can mean quite different things to different people and their core designs.  In some cases this "watchdog" would be a micro-architectural thing that, for example, recognizes that the core is not making forward progress and would temporarily invoke some low-performance uarch mechanism that guarantees forward progress (out of the circumstances currently causing livelock).  Although the details of that very much depend on what types of livelock causes one is concerned about.  In other cases this "watchdog" might generate a local interrupt to take the core into a "lack of forward progress" software handler; or a global interrupt to inform someone else that this core is livelocked.

In general, there's an enormous range of possibilities as to what a core-level watchdog means.  And an enormous range as to what one is trying to accomplish or defend against.

Yes. Greg articulated what I was getting at better than I did. I apologize for muddying the waters. From a platform standpoint one system-level watchdog should suffice as it's typically the last resort of restarting a system prior to sending a tech out. 
 

Greg


On Wed, Mar 2, 2022 at 12:09 PM James Robinson <jrobinson@...> wrote:
Hi Aaron,

Thanks for the response. Would you be able to give any more details on how a core level watchdog would differ from a platform level one?

James


Re: Watchdog timer per hart?

Greg Favor
 

A core-level watchdog can mean quite different things to different people and their core designs.  In some cases this "watchdog" would be a micro-architectural thing that, for example, recognizes that the core is not making forward progress and would temporarily invoke some low-performance uarch mechanism that guarantees forward progress (out of the circumstances currently causing livelock).  Although the details of that very much depend on what types of livelock causes one is concerned about.  In other cases this "watchdog" might generate a local interrupt to take the core into a "lack of forward progress" software handler; or a global interrupt to inform someone else that this core is livelocked.

In general, there's an enormous range of possibilities as to what a core-level watchdog means.  And an enormous range as to what one is trying to accomplish or defend against.

Greg


On Wed, Mar 2, 2022 at 12:09 PM James Robinson <jrobinson@...> wrote:
Hi Aaron,

Thanks for the response. Would you be able to give any more details on how a core level watchdog would differ from a platform level one?

James


Re: Watchdog timer per hart?

James Robinson
 

Hi Aaron,

Thanks for the response. Would you be able to give any more details on how a core level watchdog would differ from a platform level one?

James


Re: Watchdog timer per hart?

Aaron Durbin
 



On Wed, Mar 2, 2022 at 12:35 AM James Robinson <jrobinson@...> wrote:
Hi Greg,

Thanks for your response. I'm not sure if I'm missing something about there being a connection between having a supervisor level watchdog timer and having a timer per hart, but I wasn't particularly imagining a distinction between machine and supervisor mode watch dog timers. I'll repose the question I was thinking about:

Suppose I have a system containing 16 harts. Should I have a separate WDCSR memory mapped register and associated counter for each of the 16 harts, with each counter directing an interrupt to its associated hart if it is not reset before the timeout expires? Or should I have one WDCSR memory mapped register and associated counter for the whole system, with the interrupt directed to one specific hart, and that hart being responsible for responding to a lack of timer update?

If one is operating the machine with 16 harts without any sharding or partitioning, I don't see why one would need a watchdog per hart. System watchdogs, or TCO timers from other architecture's parlance, are for system use. Now a core would normally have it's own watchdog for instruction retirement forward progress purposes, but that's a completely different use-case than the intention of a system level watchdog.

As for Greg's question about putting that in OS-A SEE or a Platform itself, I'm open to suggestions. However, my initial thinking is that it would be deferred to a Platform. The thinking is that OS-A SEE is about targeting SW expectations for the kernel. Kernels are really good about runtime binding of drivers based on the presence of hardware so I'm not overly inclined to mandate such things. That said, I'd be open to hear other opinions.


Thanks,
James


Re: Watchdog timer per hart?

James Robinson
 

Hi Greg,

Thanks for your response. I'm not sure if I'm missing something about there being a connection between having a supervisor level watchdog timer and having a timer per hart, but I wasn't particularly imagining a distinction between machine and supervisor mode watch dog timers. I'll repose the question I was thinking about:

Suppose I have a system containing 16 harts. Should I have a separate WDCSR memory mapped register and associated counter for each of the 16 harts, with each counter directing an interrupt to its associated hart if it is not reset before the timeout expires? Or should I have one WDCSR memory mapped register and associated counter for the whole system, with the interrupt directed to one specific hart, and that hart being responsible for responding to a lack of timer update?

Thanks,
James

121 - 140 of 1818