Date   

Re: OS-A platform stoptime requirement

Ved Shanbhogue
 

So there is an assumption here that somehow time is broadcast and not the clock. For an implementation that does clock broadcast this requirement requires having a shadow time that counts while software visible time is frozen. All that complexity may be totally justified but is not obvious why . Besides if the time does not stop for a shared implementation of mtime then this does not seem like is something fundamentally required for debug.

Regards
Ved


On Tue, Dec 21, 2021 at 4:45 AM Greg Favor <gfavor@...> wrote:
On Tue, Dec 21, 2021 at 12:22 AM Allen Baum <allen.baum@...> wrote:
What you describe sounds very implementation dependent; I had always imagined that mtime would not be broadcast, but an mtime count enable bit would be, to keep the local copy synched.
That has its own issues of course (synching at reset, whenever mtime is written, and whenever stoptime is released) - though they're all the same mechanism, and can reuse whatever is used for reading mtime.

And also re-sync'ing when coming out of deeper power management sleep states.

The mechanism for software reading mtime is memory-mapped register reads; ditto for trap-and-emulate of hardware reads of the time CSR; and "hardware broadcast" of mtime to time otherwise.  Obviously only the latter time CSR implementation has to deal with resync issues.  While some systems may be able to avoid having any and all reasons for needing occasional time resync, many systems for one or more reasons will need occasional time resync.

I've seen many people (including ARM time distribution IP) do a hybrid between just sending an "increment" pulse and broadcasting a full 64-bit value - that supports periodic full resync while using just a small number of wires to also communicate the increments.  (One can potentially squeeze this down to two wires, although designs I've seen don't go that far.)

Greg


Re: OS-A platform stoptime requirement

Greg Favor
 

On Tue, Dec 21, 2021 at 12:22 AM Allen Baum <allen.baum@...> wrote:
What you describe sounds very implementation dependent; I had always imagined that mtime would not be broadcast, but an mtime count enable bit would be, to keep the local copy synched.
That has its own issues of course (synching at reset, whenever mtime is written, and whenever stoptime is released) - though they're all the same mechanism, and can reuse whatever is used for reading mtime.

And also re-sync'ing when coming out of deeper power management sleep states.

The mechanism for software reading mtime is memory-mapped register reads; ditto for trap-and-emulate of hardware reads of the time CSR; and "hardware broadcast" of mtime to time otherwise.  Obviously only the latter time CSR implementation has to deal with resync issues.  While some systems may be able to avoid having any and all reasons for needing occasional time resync, many systems for one or more reasons will need occasional time resync.

I've seen many people (including ARM time distribution IP) do a hybrid between just sending an "increment" pulse and broadcasting a full 64-bit value - that supports periodic full resync while using just a small number of wires to also communicate the increments.  (One can potentially squeeze this down to two wires, although designs I've seen don't go that far.)

Greg


Re: OS-A platform stoptime requirement

Allen Baum
 

What you describe sounds very implementation dependent; I had always imagined that mtime would not be broadcast, but an mtime count enable bit would be, to keep the local copy synched.
That has its own issues of course (synching at reset, whenever mtime is written, and whenever stoptime is released) - though they're all the same mechanism, and can reuse whatever is used for reading mtime.

On Mon, Dec 20, 2021 at 6:35 PM Greg Favor <gfavor@...> wrote:
I'm cc'ing Paul Donahue (vice-chair of the Debug TG).  He was involved with distilling out of the enormous amount of optionality in the Debug spec, what would be suitable to require in OS-A platforms.  So he can comment about this debug-related OS-A platform requirement, and in particular the stoptime requirement (Paul, see the email thread included down below):
       dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1

Btw, I don't see resync of time with mtime as more than a relatively trivial exercise on debug mode exit.  Outside of debug mode mtime is being broadcast to all harts and each hart's time CSR updates with the latest time value that it receives.  In debug mode, if stoptime=1, then the time flops are simply inhibited from updating with any new received mtime values.  Then when debug mode is exited and the inhibit goes away, the time flops naturally go back to getting updated with the latest and/or new received mtime values.

Greg

On Mon, Dec 20, 2021 at 3:19 PM Beeman Strong <beeman@...> wrote:
Thanks, I definitely misunderstood the intent.  So the expectation is that, in Debug Mode, reads to mtime will see time continue to progress, but reads to the time CSR will see a frozen value.  Reads of the time CSR by software running outside debug mode should not be impacted, and will see a value synchronized with mtime.

I suppose I can imagine usages where keeping the time CSR frozen has value to a debugger, but it does add complexity and latency in requiring a resync with mtime on debug mode exit.  Does the value really rise to the level of being a platform requirement?  Is there some important debug functionality that breaks if we keep it simple and let the time CSR keep running in debug mode?

On Mon, Dec 20, 2021 at 2:05 PM Andrew Waterman <andrew@...> wrote:


On Mon, Dec 20, 2021 at 3:42 PM Greg Favor <gfavor@...> wrote:
I think there's a little bit of confusion going on.  The 'stoptime' bit is defined as "Don’t increment any hart-local timers while in Debug Mode."  I take this to clearly not be referring to MTIME, but to the local time CSR.

I fully agree that expecting a debug action on a core to have to reach out to wherever in a system MTIME may be, is inappropriate.  Which also affects other still active harts - which is probably very inappropriate (i.e. debugging just one hart shouldn't inherently affect operation of all harts).

Oops, it has been a while since I've read this spec.  I withdraw my comment, if it's indeed the case that shared implementations of mtime need not be affected by stoptime.


Whereas stopping the local time CSR for the duration of being in Debug mode would be easy to implement, i.e. in_debug_mode inhibits the time CSR from advancing.  Presumably, once the hart exits Debug mode, the time CSR effectively immediately catches back up with the current time value that has been broadcast to it from MTIME.

Greg


On Mon, Dec 20, 2021 at 1:19 PM Andrew Waterman <andrew@...> wrote:


On Mon, Dec 20, 2021 at 12:11 PM Beeman Strong <beeman@...> wrote:
Hi there,

In the OS-A platform spec I see the following requirement:

• dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1
◦ Rationale: The architecture has strict requirements on minstret which may be perturbed by an external debugger in a way that’s visible to software. The default should allow code that’s sensitive to these requirements to be debugged.

The rationale justifies the requirement for stopcount=1, but I don't see any rationale for stoptime=1.

The debug spec refers to stoptime=1 stopping "timers", which I interpret to mean the mtime counter.  This timer is expected to by synchronized across harts in a system ("The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock.")  In a system with multiple harts, where a subset of harts may be halted at a given time, this stoptime=1 requirement risks violating this ISA requirement and confusing software by causing wall-clock time to get out of sync.

Can we remove "and dcsr.stoptime" from this platform requirement?

FWIW, although I appreciate the motivation behind this requirement, I also support removing it.  For the case that mtime is centrally implemented, this requirement is quite onerous to implement.  For the case that mtime is decentralized, this requirement is easy to satisfy, but is differently problematic, as the spec mentions ("risks violating this ISA requirement").  I dislike disadvantaging the centralized-mtime implementations for a feature we've already admitted is problematic at the ISA level.
 

thanks,
beeman


Re: [PATCH 1/1] Platform Spec Content Reorganization into separate sections

atishp@...
 

On Wed, Dec 15, 2021 at 11:21 PM Kumar Sankaran
<ksankaran@...> wrote:

As per the discussion and agreement during the Platform HSC meeting,
this patch splits the content of the platform spec into 3 different
sections - an OS-A Common Requirements section, OS-A Embedded Platform
section, OS-A Server Platform section and M Platform section.
This patch keeps all the content in the same single file for easier
readability. In the near future, the next patchset will split the
individual sections into separate .adoc files, one for the common
requirements and one .adoc for each specific platform.

Below are the changes.
Added OS-A Common Requirements section for all the common requirements
Added OS-A Embedded and OS-A Server platforms
Cleaned up some text in the Introduction section while still keeping
the bulk of the content as is.
Added the Timer 100ns resolution change from Greg.
Kept the M-Platform as is.
Added licensing and version log to the platform adoc based on Jeff’s feedback
Updated changelog to make it current

diff --git a/changelog.adoc b/changelog.adoc
index 6181115..6c48bda 100644
--- a/changelog.adoc
+++ b/changelog.adoc
@@ -7,20 +7,13 @@
[preface]
## Change Log

+### version 0.3-draft
+* 2021-12-13:
+** Restructure document into OS-A Common, OS-A Embedded and OS-A Server
+
### version 0.2-draft
* 2021-09-01:
** Draft version for internal reviews
-* 2021-05-20:
-** Platform requirements for Debug
-* 2021-05-19:
-** Base boot and runtime requirements - Initial commit
-* 2021-04-08:
-** Initial commit of server firmware requirements
-* 2021-03-25:
-** Initial commit of Embedded-2022 specification
-* 2021-03-16:
-** Added 2022 platforms
-** Added individual sections and sub-sections for the content

### version 0.1-draft
* 2020-10-07:
diff --git a/riscv-platform-spec.adoc b/riscv-platform-spec.adoc
index 6321683..858bd0b 100644
--- a/riscv-platform-spec.adoc
+++ b/riscv-platform-spec.adoc
@@ -20,6 +20,12 @@
// table of contents
toc::[]

+// document copyright and licensing information
+include::licensing.adoc[]
+
+// changelog for the document
+include::changelog.adoc[]
+
[preface]
== Terminology
[cols="1,4", width=80%, align="left", options="header"]
@@ -78,20 +84,26 @@ specification has to be self certified by the
platform compatibility test
suite (PCT). More details about the PCT are available in the platform policy
specification.

-Platforms are augmented with extensions for industry specific target
-market verticals like “server†, “mobile†, “edge
computing†, “machine-learningâ€
-and “automotive†.
-
-The platform specification currently defines two platforms:
-
-* *OS-A Platform*: This specifies a rich-OS platform for
-Linux/FreeBSD/Windows - flavors that run on enterprise and embedded class
-application processors. The OS-A platform has a base feature set and extensions
-as shown below: +
-** *Base*
-** *Server Extension*
-
-* *M Platform*: This specifies an RTOS platform for bare-metal applications and
+The platform specification currently defines two platforms as shown below.
+Additional platforms are expected to be defined in the future for industry
+specific target market verticals like “mobile†, “edge computing†,
+“machine-learning†"desktop", “automotive†and more.
+
Not sure why these are not rendered correctly in gmail. The deleted text seems
to be garbage as well which indicates a rendering issue in gmail.
Please make sure that
it is correctly displayed at your end.

+* *OS-A Platform*: The OS-A platform specifies a category of rich-OS platforms
+that support operating systems like Linux, FreeBSD, Windows and more;
+flavors that run on enterprise and embedded class application processors.
+Each OS-A platform that is defined below is independent in its representation
+and is not dependent on any other platform for its features or specifications.
+Requirements common across multiple platforms are bundled together in the OS-A
+Common Requirements section in order to prevent duplication of content. The
+specific platform can include all or some of the requirements in the common
+section and add or modify these as per the specific requirements.
+The OS-A platforms that are currently defined are the following: +
+** *OS-A Embedded Platform*
+** *OS-A Server Platform*
+
+* *M Platform*: The M platform specifies an RTOS platform for bare-metal
+applications and
small operating systems running on a microcontroller. The M platform has a base
feature set and extensions as shown below: +
** *Base*
@@ -102,13 +114,11 @@ functionality available in S, U, VS and VU
modes, and the standardization of
the SBI (Supervisory Binary Interface as defined in <<spec_sbi>>) between
Supervisor level (S-mode/VS-mode) and M-mode/HS-mode respectively.

-// OS-A Platform
-== OS-A Platform
+// OS-A Platform Common requirements
+== OS-A Common Requirements

-// Base feature set for OS-A Platform
-=== Base
-==== ISA Requirements
-===== General
+=== ISA Requirements
+==== General

* This OS-A platform must comply with the RVA22U and RVA22S ISA profiles as
defined in the RISC-V ISA Profiles specification [11].
@@ -121,10 +131,8 @@ if the standard extension is not required.
* All hart PMA regions for main memory must be marked as coherent.
* Memory accesses by I/O masters can be coherent or non-coherent with respect
to all hart-related caches.
-[sidebar]
----

-===== Supervisor mode
+==== Supervisor mode
* sstatus
** sstatus.UBE must support the same access attribute (read-only or writable)
as mstatus.UBE.
@@ -145,7 +153,7 @@ non-zero and zero values as architecturally defined.
** For RV32, Bare and Sv32 translation modes must be supported.
** For RV64, Bare and Sv39 translation modes must be supported.

-===== Hypervisor extension
+==== Hypervisor extension
* hstatus
** VTW bit must not be hardwired to 0.
** VTVM bit must not be hardwired to 0.
@@ -178,23 +186,8 @@ non-zero and zero values as architecturally defined.
** For RV32, Bare and Sv32 translation modes must be supported.
** For RV64, Bare and Sv39 translation modes must be supported.

-==== PMU
-
-The RVA22 profile defines 32 PMU counters out-of-which first three counters are
-defined by the privilege specification while other 29 counters are
programmable.
-The SBI PMU extension defines a set of hardware events that can be monitored
-using these programmable counters. This section defines the minimum number of
-programmable counters and hardware events required for an OS-A compatible
-platform.
-
-* Counters
-** The platform does not require to implement any of the programmable counters.
-* Events
-** The platform does not require to implement any of the hardware
events defined
-in SBI PMU extensions.
-
-==== Debug
-The OS-A base platform requirements are the following:
+=== Debug
+The OS-A platform common requirements are the following:

- Implement resethaltreq
* Rationale: Debugging immediately out of reset is a useful debug tool.
@@ -275,26 +268,12 @@ each must be 1
The default should allow code that's sensitive to these requirements to be
debugged.

-==== Interrupts and Timer
-
-===== Timer support
-
+=== Timers
* One or more ACLINT MTIMER devices are required for the OS-A platform.
-* Platform must support a default ACLINT MTIME counter resolution of 10ns
- (i.e. an increment by 1 represents 10 ns).
-* The ACLINT MTIME update frequency (i.e. hardware clock) must be between
- 10 MHz and 100 MHz, and updates must be strictly monotonic.
-
-[sidebar]
---
-[underline]*_Implementation Note:_*
-For example, if the MTIME counter update frequency (i.e. hardware clock) is
-25 MHz then the MTIME counter would increment by 4 upon every hardware clock
-tick for MTIME counter resolution of 10ns.
---
-
-===== Interrupts Support
+* Platform must support an ACLINT MTIME counter resolution of 100ns or less
+(corresponding to a clock tick frequency of at least 10 MHz).

+=== Interrupts
The OS-A platform must comply with one of the four interrupt support
categories described in following sub-sections. The hardware must support at
least one of the four interrupt categories while software must support all of
@@ -302,7 +281,7 @@ the interrupt categories described below. Any
hardware requirement for a specifi
privilege mode is only applicable for platforms supporting that privilege mode.

[#legacy_wired_irqs]
-====== Legacy wired IRQs - DEPRECATED
+==== Legacy wired IRQs - DEPRECATED
** One or more PLIC devices are required to support wired interrupts.
** One or more ACLINT MSWI devices are required to support M-mode software
interrupts.
@@ -314,7 +293,7 @@ devices.
** MSI virtualization is not supported.

[#only_wired_irqs]
-====== Only Wired IRQs
+==== Only Wired IRQs
** One or more AIA APLIC devices are required to support wired interrupts.
** One or more ACLINT MSWI devices are required to support M-mode
software interrupts.
** One or more ACLINT SSWI devices are required to support S/HS-mode
software interrupts.
@@ -323,7 +302,7 @@ devices.
** MSI virtualization is not supported.

[#msis_and_wired_irqs]
-====== MSIs and Wired IRQs
+==== MSIs and Wired IRQs
** AIA local interrupt CSRs must be supported by each hart.
*** `siselect` CSR must support holding 9-bit value.
*** `vsiselect` CSR must support holding 9-bit value if H-extension is
@@ -342,7 +321,7 @@ support wired irqs.
** MSI virtualization is not supported.

[#msis_virtual_msis_and_wired_irqs]
-====== MSIs, Virtual MSIs, and Wired IRQs
+==== MSIs, Virtual MSIs, and Wired IRQs
** To support virtual MSIs, the H-extension must be implemented.
*** GEILEN must be 3 or more.
** AIA local interrupt CSRs must be supported by each hart.
@@ -361,7 +340,7 @@ platform support wired irqs.
AIA IMSIC devices.
** MSI virtualization is supported.

-===== Summary
+==== Summary

The <<table_interrutps_and_timer_osa_platforms>> below summarizes the four
categories of interrupt support and timer support allowed on an OS-A platorm.
/s/platorm/platform

@@ -445,8 +424,8 @@ categories of interrupt support and timer support
allowed on an OS-A platorm.
/s/platorm/platform

These are not part of your change. But if you can fix them in the next
version we don't have to spin a patch
just for this.


|+++<color rgb="#e69138"><font size=".6em">Priv Sstc</font></color>+++
|===

-==== System Peripherals
-===== UART/Serial Console
+=== System Peripherals
+==== UART/Serial Console

In order to facilitate the bring-up and debug of the low level initial
platform, hardware is required to implement a UART port that confirms to the
@@ -460,33 +439,10 @@ of the following:
** UART 16550 - MANDATORY
** UART 8250 - DEPRECATED

-==== Boot Process
-- The base specification defines the interface between the firmware and the
-operating system suitable for the RISC-V platforms with rich operating
-systems.
-- These requirements specify the required boot and runtime services, device
-discovery mechanism, etc.
-- The requirements are operating system agnostic, specific firmware/bootloader
-implementation agnostic.
-- For the generic mandatory requirements this base specification will refer to
-the EBBR specification <<spec_ebbr>>. Any deviation from the EBBR will be
-explicitly mentioned in the requirements.
-
-
-===== Firmware
-====== Storage and Partitioning
-- GPT partitioning required for shared storage.
-- MBR support is not required.
-
-===== Hardware Discovery Mechanisms
-- Device Tree (DT) is the required mechanism for system description.
-- Platforms must support the Unified Discovery specification for all pre-boot
-information population <<spec_unified_discovery>>.
-

-==== Runtime Services
+=== Runtime Services

-===== SBI
+==== SBI

* The M-mode runtime must implement SBI specification <<spec_sbi>> or higher.
* Required SBI extensions include:
@@ -497,7 +453,7 @@ information population <<spec_unified_discovery>>.
** SBI SRST
** SBI PMU

-===== UEFI
+==== UEFI

* Wherever applicable UEFI firmware must implement UEFI interfaces over
similar interfaces and services present in the SBI specification. For
@@ -506,7 +462,7 @@ information population <<spec_unified_discovery>>.
* The operating system should prioritize calling the UEFI interfaces before
the SBI or platform specific mechanisms.

-==== Software and ABIs
+=== Software and ABIs
The platform specification mandates the following requirements for
software components:

@@ -541,17 +497,54 @@ transactions that precisely traps if violated.
*** Platform must provide a protection mechanism from I/O agents manipulating
or accessing machine mode assets.

-// Server extension for OS-A Platform
-=== Server Extension
-The server extension specifies additional requirements for server class
-platforms. The server extension includes all of the requirements for the
-base with the additional requirements as below. The server extension, besides
-placing additional requirements on top of the underlying base specification,
-can also restrict the options allowed in the underlying base specification for
-satisfying a requirement.
-
-==== ISA Requirements
-===== General
+// OS-A Embedded Platform
+== OS-A Embedded Platform
+The OS-A Embedded Platform targets embedded class applications. The OS-A
+Embedded Platform inherits all the requirements as defined in the OS-A Platform
+Common Requirements section. Additional requirements are detailed in the
+following sections.
+
+=== PMU
+The RVA22 profile defines 32 PMU counters out-of-which first three counters are
+defined by the privilege specification while other 29 counters are
programmable.
+The SBI PMU extension defines a set of hardware events that can be monitored
+using these programmable counters. This section defines the minimum number of
+programmable counters and hardware events required for an OS-A Embedded
+compatible platform.
+
+* Counters
+** The platform does not require to implement any of the programmable counters.
+* Events
+** The platform does not require to implement any of the hardware
events defined
+in SBI PMU extensions.
+
+=== Boot Process
+- The OS-A Embedded Platform must comply with the EBBR specification
+<<spec_ebbr>>. Any deviation from the EBBR will be explicitly mentioned in
+the requirements in this section.
+
+==== Firmware
+===== Storage and Partitioning
+- GPT partitioning required for shared storage.
+- MBR support is not required.
+
+==== Hardware Discovery Mechanisms
+- Platforms must support the Unified Discovery specification for all pre-boot
+information population <<spec_unified_discovery>>.
+
+===== Device Tree (DT)
+- Device Tree (DT) is the required mechanism for the hardware discovery and
+configuration.
+
+// OS-A Server Platform
+== OS-A Server Platform
+The OS-A Server Platform targets server class applications. The OS-A
+Server Platform inherits all the requirements as defined in the OS-A Platform
+Common Requirements section. Additional requirements are detailed in the
+following sections.
+
+=== ISA Requirements
+==== General
* The hypervisor H-extension must be supported.
* The Zam extension must be supported for misaligned addresses within
at least aligned 16B regions.
* The `time` CSR must be implemented in hardware.
@@ -561,12 +554,12 @@ satisfying a requirement.
There should be hardware support for all misaligned accesses; misaligned
accesses should not take address misaligned exceptions.

-===== Supervisor mode
+==== Supervisor mode
* satp
** For RV64, Sv48 translation mode must be supported.
** At least 8 ASID bits must be supported and not hardwired to 0.

-===== Hypervisor extension
+==== Hypervisor extension
* hgatp
** For RV64, Sv48x4 translation mode must be supported.
** At least 8 VMID bits must be supported and not hardwired to 0.
@@ -575,7 +568,13 @@ accesses should not take address misaligned exceptions.
** For RV64, Sv48 translation mode must be supported.
** At least 8 ASID bits must be supported and not hardwired to 0.

-==== PMU
+=== PMU
+The RVA22 profile defines 32 PMU counters out-of-which first three counters are
+defined by the privilege specification while other 29 counters are
programmable.
+The SBI PMU extension defines a set of hardware events that can be monitored
+using these programmable counters. This section defines the minimum number of
+programmable counters and hardware events required for an OS-A Server
+compatible platform.

* Counters
** The platform must implement at least 8 programmable counters.
@@ -597,9 +596,9 @@ Any platform that does not implement the
micro-architectural features related to
a hardware event may hardwire the event value to zero.
--

-==== Debug
-The server extension requirements are all of the base specification
-requirements plus:
+=== Debug
+The OS-A Server platform includes all the requirements as specified in the
+OS-A Common Requirements section plus the following:

- Implement at least six mcontrol6 triggers that can support matching on PC
(select=0, execute=1, match=0) with timing=0 and full support for mode
@@ -611,13 +610,10 @@ above
respect to all harts connected to the DM
* Rationale: Debuggers must be able to view memory coherently.

-==== Interrupts and Timer
-
-===== Interrupts support
-
-The server extension must comply with interrupt support described in
-<<msis_virtual_msis_and_wired_irqs>> with the following additional
-requirements:
+=== Interrupts
+The OS-A Server platform must support the interrupt requirements as specified
+in the OS-A Common Requirements Interrupts section
+<<msis_virtual_msis_and_wired_irqs>> plus the following:

* The H-extension implemented by each hart must support GEILEN = 5 or more.
* Per-hart AIA IMSIC devices.
@@ -630,20 +626,20 @@ requirements:
Platforms should implement at least 5 guest interrupt files. More guest
interrupt files allow for better VM oversubscription on the same hart.

-==== Boot Process
-===== Firmware
+=== Boot Process
+==== Firmware
The boot and system firmware for the server platforms must support UEFI as
defined in the section 2.6.1 of the UEFI Specification <<spec_uefi>> with some
additional requirements described in following sub-sections.

-====== UEFI Configuration Tables
+===== UEFI Configuration Tables
The platforms are required to provide following tables:

* *EFI_ACPI_20_TABLE_GUID* ACPI configuration table which is at version 6.4+ or
newer with HW-Reduced ACPI model.
* *SMBIOS3_TABLE_GUID* SMBIOS table which conforms to version 3.4 or later.

-====== UEFI Protocol Support
+===== UEFI Protocol Support
The UEFI protocols listed below are required to be implemented.

.Additional UEFI Protocols
@@ -654,15 +650,17 @@ The UEFI protocols listed below are required to
be implemented.
|EFI_PCI_IO_PROTOCOL | 14.4 | For PCIe support
|===

-===== Hardware Discovery Mechanisms
+==== Hardware Discovery Mechanisms
+- Platforms must support the Unified Discovery specification for all pre-boot
+information population <<spec_unified_discovery>>.

-====== ACPI
+===== ACPI
ACPI is the required mechanism for the hardware discovery and configuration.
Server platforms are required to adhere to the RISC-V ACPI Platform
Requirements
Specification <<spec_riscv_acpi>>. Platform firmware must support ACPI and
the runtime OS environment must use ACPI for device discovery and
configuration.

-====== SMBIOS
+===== SMBIOS
The System Management BIOS (SMBIOS) table is required for the platform
conforming to server extension. The SMBIOS records provide basic hardware and
firmware configuration information used widely by the platform management
@@ -687,9 +685,12 @@ characteristics and HART hardware features
discovered during the firmware boot
process.
|===

-==== Runtime services
+=== Runtime services
+The OS-A Server platform includes all the runtime services requirements as
+specified in the OS-A Common Requirements Runtime Services section plus the
+following.

-===== UEFI
+==== UEFI
The UEFI run time services listed below are required to be implemented.

.Required UEFI Runtime Services
@@ -723,9 +724,12 @@ implemented but it can return EFI_UNSUPPORTED.
implemented but it can return EFI_UNSUPPORTED.
|===

-==== System Peripherals
+=== System Peripherals
+The OS-A Server platform includes all the system peripheral requirements as
+specified in the OS-A Common Requirements System Peripherals section plus
+the added requirements in this section.

-===== Watchdog Timers
+==== Watchdog Timers
Implementation of a two-stage watchdog timer, as defined in the RISC-V Watchdog
Timer Specification<<spec_riscv_watchdog>> is required. Software must
periodically refresh the watchdog timer, otherwise a first-stage watchdog
@@ -747,7 +751,7 @@ targeting a specific hart.

The resultant action taken is platform-specific.

-===== System Date/Time[[SystemDateTime]]
+==== System Date/Time[[SystemDateTime]]
In order to facilitate server manageability, server extension platform is
required to provide the mechanism to maintain system date/time for UEFI
runtime Time service. +
@@ -761,11 +765,11 @@ runtime Time service. +
EFI_UNSUPPORTED if the platform doesn't require the features or the system
date/time mechanism doesn’t have the capabilities.

-===== PCIe
+==== PCIe
Platforms are required to support at least PCIe Base Specification Revision 1.1
<<spec_pcie_sig>>.

-====== PCIe Config Space
+===== PCIe Config Space
* Platforms must support access to the PCIe config space via ECAM as described
in the PCIe Base specification.
* The entire config space for a single PCIe domain must be accessible via a
@@ -777,7 +781,7 @@ supported PCIe domains and map the ECAM I/O region
for each domain.
memory attributes are that of a PMA I/O region (i.e. strongly-ordered,
non-cacheable, non-idempotent).

-====== PCIe Memory Space
+===== PCIe Memory Space
Platforms are required to map PCIe address space directly in the system address
space and not have any address translation for outbound accesses from harts or
for inbound accesses to any component in the system address space.
@@ -811,7 +815,7 @@ Such an access control mechanism could be
analogous to the per-hart PMP
as described in the RISC-V Privileged Architectures specification.
--

-====== PCIe Interrupts
+===== PCIe Interrupts
* Platforms must support both INTx and MSI/MSI-x interrupts.
* Following are the requirements for INTx:
** For each root port in the system, the platform must map all the INTx
@@ -833,13 +837,13 @@ requests 16 MSI vectors the minimum MSI data
value assigned by the platform
software can be 0x10 so that the function can use lower 4 bits to assert each
of the 16 vectors.

-====== PCIe cache coherency
+===== PCIe cache coherency
Memory that is cacheable by harts is not kept coherent by hardware when PCIe
transactions to that memory are marked with a No_Snoop bit of zero. In this
case, software must manage coherency on such memory; otherwise, software
coherency management is not required.

-====== PCIe Topology
+===== PCIe Topology
Platforms are required to implement at least one of the following topologies
and the components required in that topology.

@@ -899,17 +903,16 @@ implemented. RCEC is required to terminate the
AER and PME messages from RCiEP.
must be implemented in a separate PCIe domain and must be addressable via a
separate ECAM I/O region.

-===== PCIe Device Firmware Requirement
-PCI expansion ROM code type 3 (UEFI) image must be provided by PCIe device for
-OS/A server extension platform according to PCI Firmware
-Specification <<spec_pci_firmware>> if that PCIe device is utilized during
-UEFI firmware boot process. The image stored in PCI expansion ROM is a UEFI
-driver that must be compliant with UEFI specification <<spec_uefi>> 14.4.2
-PCI Option ROMs.
+===== PCIe Device Firmware
+PCI expansion ROM code type 3 (UEFI) image must be provided by PCIe device
+platform according to PCI Firmware Specification <<spec_pci_firmware>> if that
+PCIe device is utilized during UEFI firmware boot process. The image stored in
+PCI expansion ROM is a UEFI driver that must be compliant with UEFI
+specification <<spec_uefi>> 14.4.2 PCI Option ROMs.

-
-==== Security
-Platforms must implement the following security features:
+=== Security
+The OS-A Server platform includes all the security requirements as
+specified in the OS-A Common Requirements security section plus the following:

* Support for some form of Secure Boot, as a means to ensure the integrity of
platform firmware and software, is required. Flexibility is provided as
@@ -942,7 +945,7 @@ transactions that precisely traps if violated.
*** Platform must provide a protection mechanism from I/O agents manipulating
or accessing machine mode assets.

-==== RAS
+=== RAS
All the below mentioned RAS features are required for the OS-A platform server
extension:
Other than that, it looks good to me.

Reviewed-by: Atish Patra <atishp@...>

--
Regards,
Atish


--
Regards
Kumar





Re: OS-A platform stoptime requirement

Greg Favor
 

I'm cc'ing Paul Donahue (vice-chair of the Debug TG).  He was involved with distilling out of the enormous amount of optionality in the Debug spec, what would be suitable to require in OS-A platforms.  So he can comment about this debug-related OS-A platform requirement, and in particular the stoptime requirement (Paul, see the email thread included down below):
       dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1

Btw, I don't see resync of time with mtime as more than a relatively trivial exercise on debug mode exit.  Outside of debug mode mtime is being broadcast to all harts and each hart's time CSR updates with the latest time value that it receives.  In debug mode, if stoptime=1, then the time flops are simply inhibited from updating with any new received mtime values.  Then when debug mode is exited and the inhibit goes away, the time flops naturally go back to getting updated with the latest and/or new received mtime values.

Greg

On Mon, Dec 20, 2021 at 3:19 PM Beeman Strong <beeman@...> wrote:
Thanks, I definitely misunderstood the intent.  So the expectation is that, in Debug Mode, reads to mtime will see time continue to progress, but reads to the time CSR will see a frozen value.  Reads of the time CSR by software running outside debug mode should not be impacted, and will see a value synchronized with mtime.

I suppose I can imagine usages where keeping the time CSR frozen has value to a debugger, but it does add complexity and latency in requiring a resync with mtime on debug mode exit.  Does the value really rise to the level of being a platform requirement?  Is there some important debug functionality that breaks if we keep it simple and let the time CSR keep running in debug mode?

On Mon, Dec 20, 2021 at 2:05 PM Andrew Waterman <andrew@...> wrote:


On Mon, Dec 20, 2021 at 3:42 PM Greg Favor <gfavor@...> wrote:
I think there's a little bit of confusion going on.  The 'stoptime' bit is defined as "Don’t increment any hart-local timers while in Debug Mode."  I take this to clearly not be referring to MTIME, but to the local time CSR.

I fully agree that expecting a debug action on a core to have to reach out to wherever in a system MTIME may be, is inappropriate.  Which also affects other still active harts - which is probably very inappropriate (i.e. debugging just one hart shouldn't inherently affect operation of all harts).

Oops, it has been a while since I've read this spec.  I withdraw my comment, if it's indeed the case that shared implementations of mtime need not be affected by stoptime.


Whereas stopping the local time CSR for the duration of being in Debug mode would be easy to implement, i.e. in_debug_mode inhibits the time CSR from advancing.  Presumably, once the hart exits Debug mode, the time CSR effectively immediately catches back up with the current time value that has been broadcast to it from MTIME.

Greg


On Mon, Dec 20, 2021 at 1:19 PM Andrew Waterman <andrew@...> wrote:


On Mon, Dec 20, 2021 at 12:11 PM Beeman Strong <beeman@...> wrote:
Hi there,

In the OS-A platform spec I see the following requirement:

• dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1
◦ Rationale: The architecture has strict requirements on minstret which may be perturbed by an external debugger in a way that’s visible to software. The default should allow code that’s sensitive to these requirements to be debugged.

The rationale justifies the requirement for stopcount=1, but I don't see any rationale for stoptime=1.

The debug spec refers to stoptime=1 stopping "timers", which I interpret to mean the mtime counter.  This timer is expected to by synchronized across harts in a system ("The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock.")  In a system with multiple harts, where a subset of harts may be halted at a given time, this stoptime=1 requirement risks violating this ISA requirement and confusing software by causing wall-clock time to get out of sync.

Can we remove "and dcsr.stoptime" from this platform requirement?

FWIW, although I appreciate the motivation behind this requirement, I also support removing it.  For the case that mtime is centrally implemented, this requirement is quite onerous to implement.  For the case that mtime is decentralized, this requirement is easy to satisfy, but is differently problematic, as the spec mentions ("risks violating this ISA requirement").  I dislike disadvantaging the centralized-mtime implementations for a feature we've already admitted is problematic at the ISA level.
 

thanks,
beeman


Re: OS-A platform stoptime requirement

Beeman Strong
 

Thanks, I definitely misunderstood the intent.  So the expectation is that, in Debug Mode, reads to mtime will see time continue to progress, but reads to the time CSR will see a frozen value.  Reads of the time CSR by software running outside debug mode should not be impacted, and will see a value synchronized with mtime.

I suppose I can imagine usages where keeping the time CSR frozen has value to a debugger, but it does add complexity and latency in requiring a resync with mtime on debug mode exit.  Does the value really rise to the level of being a platform requirement?  Is there some important debug functionality that breaks if we keep it simple and let the time CSR keep running in debug mode?

On Mon, Dec 20, 2021 at 2:05 PM Andrew Waterman <andrew@...> wrote:


On Mon, Dec 20, 2021 at 3:42 PM Greg Favor <gfavor@...> wrote:
I think there's a little bit of confusion going on.  The 'stoptime' bit is defined as "Don’t increment any hart-local timers while in Debug Mode."  I take this to clearly not be referring to MTIME, but to the local time CSR.

I fully agree that expecting a debug action on a core to have to reach out to wherever in a system MTIME may be, is inappropriate.  Which also affects other still active harts - which is probably very inappropriate (i.e. debugging just one hart shouldn't inherently affect operation of all harts).

Oops, it has been a while since I've read this spec.  I withdraw my comment, if it's indeed the case that shared implementations of mtime need not be affected by stoptime.


Whereas stopping the local time CSR for the duration of being in Debug mode would be easy to implement, i.e. in_debug_mode inhibits the time CSR from advancing.  Presumably, once the hart exits Debug mode, the time CSR effectively immediately catches back up with the current time value that has been broadcast to it from MTIME.

Greg


On Mon, Dec 20, 2021 at 1:19 PM Andrew Waterman <andrew@...> wrote:


On Mon, Dec 20, 2021 at 12:11 PM Beeman Strong <beeman@...> wrote:
Hi there,

In the OS-A platform spec I see the following requirement:

• dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1
◦ Rationale: The architecture has strict requirements on minstret which may be perturbed by an external debugger in a way that’s visible to software. The default should allow code that’s sensitive to these requirements to be debugged.

The rationale justifies the requirement for stopcount=1, but I don't see any rationale for stoptime=1.

The debug spec refers to stoptime=1 stopping "timers", which I interpret to mean the mtime counter.  This timer is expected to by synchronized across harts in a system ("The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock.")  In a system with multiple harts, where a subset of harts may be halted at a given time, this stoptime=1 requirement risks violating this ISA requirement and confusing software by causing wall-clock time to get out of sync.

Can we remove "and dcsr.stoptime" from this platform requirement?

FWIW, although I appreciate the motivation behind this requirement, I also support removing it.  For the case that mtime is centrally implemented, this requirement is quite onerous to implement.  For the case that mtime is decentralized, this requirement is easy to satisfy, but is differently problematic, as the spec mentions ("risks violating this ISA requirement").  I dislike disadvantaging the centralized-mtime implementations for a feature we've already admitted is problematic at the ISA level.
 

thanks,
beeman


Re: OS-A platform stoptime requirement

Andrew Waterman
 



On Mon, Dec 20, 2021 at 3:42 PM Greg Favor <gfavor@...> wrote:
I think there's a little bit of confusion going on.  The 'stoptime' bit is defined as "Don’t increment any hart-local timers while in Debug Mode."  I take this to clearly not be referring to MTIME, but to the local time CSR.

I fully agree that expecting a debug action on a core to have to reach out to wherever in a system MTIME may be, is inappropriate.  Which also affects other still active harts - which is probably very inappropriate (i.e. debugging just one hart shouldn't inherently affect operation of all harts).

Oops, it has been a while since I've read this spec.  I withdraw my comment, if it's indeed the case that shared implementations of mtime need not be affected by stoptime.


Whereas stopping the local time CSR for the duration of being in Debug mode would be easy to implement, i.e. in_debug_mode inhibits the time CSR from advancing.  Presumably, once the hart exits Debug mode, the time CSR effectively immediately catches back up with the current time value that has been broadcast to it from MTIME.

Greg


On Mon, Dec 20, 2021 at 1:19 PM Andrew Waterman <andrew@...> wrote:


On Mon, Dec 20, 2021 at 12:11 PM Beeman Strong <beeman@...> wrote:
Hi there,

In the OS-A platform spec I see the following requirement:

• dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1
◦ Rationale: The architecture has strict requirements on minstret which may be perturbed by an external debugger in a way that’s visible to software. The default should allow code that’s sensitive to these requirements to be debugged.

The rationale justifies the requirement for stopcount=1, but I don't see any rationale for stoptime=1.

The debug spec refers to stoptime=1 stopping "timers", which I interpret to mean the mtime counter.  This timer is expected to by synchronized across harts in a system ("The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock.")  In a system with multiple harts, where a subset of harts may be halted at a given time, this stoptime=1 requirement risks violating this ISA requirement and confusing software by causing wall-clock time to get out of sync.

Can we remove "and dcsr.stoptime" from this platform requirement?

FWIW, although I appreciate the motivation behind this requirement, I also support removing it.  For the case that mtime is centrally implemented, this requirement is quite onerous to implement.  For the case that mtime is decentralized, this requirement is easy to satisfy, but is differently problematic, as the spec mentions ("risks violating this ISA requirement").  I dislike disadvantaging the centralized-mtime implementations for a feature we've already admitted is problematic at the ISA level.
 

thanks,
beeman


Re: OS-A platform stoptime requirement

Greg Favor
 

I think there's a little bit of confusion going on.  The 'stoptime' bit is defined as "Don’t increment any hart-local timers while in Debug Mode."  I take this to clearly not be referring to MTIME, but to the local time CSR.

I fully agree that expecting a debug action on a core to have to reach out to wherever in a system MTIME may be, is inappropriate.  Which also affects other still active harts - which is probably very inappropriate (i.e. debugging just one hart shouldn't inherently affect operation of all harts).

Whereas stopping the local time CSR for the duration of being in Debug mode would be easy to implement, i.e. in_debug_mode inhibits the time CSR from advancing.  Presumably, once the hart exits Debug mode, the time CSR effectively immediately catches back up with the current time value that has been broadcast to it from MTIME.

Greg


On Mon, Dec 20, 2021 at 1:19 PM Andrew Waterman <andrew@...> wrote:


On Mon, Dec 20, 2021 at 12:11 PM Beeman Strong <beeman@...> wrote:
Hi there,

In the OS-A platform spec I see the following requirement:

• dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1
◦ Rationale: The architecture has strict requirements on minstret which may be perturbed by an external debugger in a way that’s visible to software. The default should allow code that’s sensitive to these requirements to be debugged.

The rationale justifies the requirement for stopcount=1, but I don't see any rationale for stoptime=1.

The debug spec refers to stoptime=1 stopping "timers", which I interpret to mean the mtime counter.  This timer is expected to by synchronized across harts in a system ("The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock.")  In a system with multiple harts, where a subset of harts may be halted at a given time, this stoptime=1 requirement risks violating this ISA requirement and confusing software by causing wall-clock time to get out of sync.

Can we remove "and dcsr.stoptime" from this platform requirement?

FWIW, although I appreciate the motivation behind this requirement, I also support removing it.  For the case that mtime is centrally implemented, this requirement is quite onerous to implement.  For the case that mtime is decentralized, this requirement is easy to satisfy, but is differently problematic, as the spec mentions ("risks violating this ISA requirement").  I dislike disadvantaging the centralized-mtime implementations for a feature we've already admitted is problematic at the ISA level.
 

thanks,
beeman


Re: OS-A platform stoptime requirement

Andrew Waterman
 



On Mon, Dec 20, 2021 at 12:11 PM Beeman Strong <beeman@...> wrote:
Hi there,

In the OS-A platform spec I see the following requirement:

• dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1
◦ Rationale: The architecture has strict requirements on minstret which may be perturbed by an external debugger in a way that’s visible to software. The default should allow code that’s sensitive to these requirements to be debugged.

The rationale justifies the requirement for stopcount=1, but I don't see any rationale for stoptime=1.

The debug spec refers to stoptime=1 stopping "timers", which I interpret to mean the mtime counter.  This timer is expected to by synchronized across harts in a system ("The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock.")  In a system with multiple harts, where a subset of harts may be halted at a given time, this stoptime=1 requirement risks violating this ISA requirement and confusing software by causing wall-clock time to get out of sync.

Can we remove "and dcsr.stoptime" from this platform requirement?

FWIW, although I appreciate the motivation behind this requirement, I also support removing it.  For the case that mtime is centrally implemented, this requirement is quite onerous to implement.  For the case that mtime is decentralized, this requirement is easy to satisfy, but is differently problematic, as the spec mentions ("risks violating this ISA requirement").  I dislike disadvantaging the centralized-mtime implementations for a feature we've already admitted is problematic at the ISA level.
 

thanks,
beeman


OS-A platform stoptime requirement

Beeman Strong
 

Hi there,

In the OS-A platform spec I see the following requirement:

• dcsr.stopcount and dcsr.stoptime must be supported and the reset value of each must be 1
◦ Rationale: The architecture has strict requirements on minstret which may be perturbed by an external debugger in a way that’s visible to software. The default should allow code that’s sensitive to these requirements to be debugged.

The rationale justifies the requirement for stopcount=1, but I don't see any rationale for stoptime=1.

The debug spec refers to stoptime=1 stopping "timers", which I interpret to mean the mtime counter.  This timer is expected to by synchronized across harts in a system ("The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock.")  In a system with multiple harts, where a subset of harts may be halted at a given time, this stoptime=1 requirement risks violating this ISA requirement and confusing software by causing wall-clock time to get out of sync.

Can we remove "and dcsr.stoptime" from this platform requirement?

thanks,
beeman


Re: Platform specification questions

Ved Shanbhogue
 

Greg HI -

On Tue, Dec 14, 2021 at 05:32:08PM -0800, Greg Favor wrote:
The following two items in Ved's email didn't get any response, so I offer
my own below ...

On Sun, Dec 12, 2021 at 4:15 PM Vedvyas Shanbhogue <ved@...> wrote:

Section 2.3.7.3.2 - PCIe memory space:
The requirement to not have any address translation for inbound accesses
to any component in system address space is restrictive. If direct
assignment of devices is supported then the IOMMU would be required to do
the address translation for inbound accesses. Further for hart originated
accesses where the PCIe memory is mapped into virtual address space there
needs to be a translation through the first and/or second level page
tables. Please help clarify why PCie memory must not be mapped into
virtual address space and why use of IOMMU to do translation is disallowed
by the specification.
I think where this came from is learnings in the ARM "server" ecosystem (as
then got captured in SBSA). In particular, one wants devices and software
on harts to have the same view of system physical address space so that,
for example, pointers can be easily passed around. Which doesn't conflict
with having address translation by IOMMUs. Maybe the current text needs to
be better worded, but I think the ideas to be expressed are:

For inbound PCIe transactions:

- There should be no hardware modifications of PCIe addresses outside of an
IOMMU (as some vendors way back in early ARM SBSA days were wont to do).

- If there is not an IOMMU associated with the PCIe interface, then PCIe
devices will have the same view of PA space as the harts.

- If there is an IOMMU associated with the PCIe interface, then system
software can trust that all address modifications are under its control via
hart page tables and IOMMU page tables.

For outbound PCIe transactions, system software is free to set up VA-to-PA
translations in hart page tables. I think the mandate against outbound
address translation was accidentally mistaken. The key point is that there
is one common view of system physical address space. Hart and IOMMU page
tables may translate from hart VA's and device addresses to system physical
address space, but the above ensures that "standard" system software has
full control over this and doesn't have non-standard address
transformations happening that it isn't aware of and doesn't know how to
control.
Thanks. I think this is very clear.



Section 2.3.7.3.3 - PCIe interrupts:
It seems unnecessary to require platforms built for the '22 version of the
platform to have to support running software that is not MSI aware. Please
clarify why supporting the INTx emulation for legacy/Pre-PCIe software
compatibility a required and not an optional capability for RISC-v
platforms?

This one seems questionable to me as well, although I'm not the expert to
reliably proclaim that INTx support is no longer a necessity in some
server-class systems. I can imagine that back in earlier ARM "server" days
this legacy issue was a bigger deal and hence was mandated in SBSA. But
maybe that is no longer an issue? Or at least for 2022+ systems - to the
point where mandating this legacy support is an unnecessary burden on many
or the majority of such systems.

If this is a fair view going forward, then the INTx requirements should
just become recommendations for systems that do feel the need to care about
INTx support.
I think the recommendation could be changed to require MSI and make supporting INTx emulation optional. I am hoping to hear from BIOS and OS experts if we would need support OS/BIOS that are `22 platform compatible but are not MSI capable.

regards
ved


[PATCH 1/1] Platform Spec Content Reorganization into separate sections

Kumar Sankaran
 

As per the discussion and agreement during the Platform HSC meeting,
this patch splits the content of the platform spec into 3 different
sections - an OS-A Common Requirements section, OS-A Embedded Platform
section, OS-A Server Platform section and M Platform section.
This patch keeps all the content in the same single file for easier
readability. In the near future, the next patchset will split the
individual sections into separate .adoc files, one for the common
requirements and one .adoc for each specific platform.

Below are the changes.
Added OS-A Common Requirements section for all the common requirements
Added OS-A Embedded and OS-A Server platforms
Cleaned up some text in the Introduction section while still keeping
the bulk of the content as is.
Added the Timer 100ns resolution change from Greg.
Kept the M-Platform as is.
Added licensing and version log to the platform adoc based on Jeff’s feedback
Updated changelog to make it current

diff --git a/changelog.adoc b/changelog.adoc
index 6181115..6c48bda 100644
--- a/changelog.adoc
+++ b/changelog.adoc
@@ -7,20 +7,13 @@
[preface]
## Change Log

+### version 0.3-draft
+* 2021-12-13:
+** Restructure document into OS-A Common, OS-A Embedded and OS-A Server
+
### version 0.2-draft
* 2021-09-01:
** Draft version for internal reviews
-* 2021-05-20:
-** Platform requirements for Debug
-* 2021-05-19:
-** Base boot and runtime requirements - Initial commit
-* 2021-04-08:
-** Initial commit of server firmware requirements
-* 2021-03-25:
-** Initial commit of Embedded-2022 specification
-* 2021-03-16:
-** Added 2022 platforms
-** Added individual sections and sub-sections for the content

### version 0.1-draft
* 2020-10-07:
diff --git a/riscv-platform-spec.adoc b/riscv-platform-spec.adoc
index 6321683..858bd0b 100644
--- a/riscv-platform-spec.adoc
+++ b/riscv-platform-spec.adoc
@@ -20,6 +20,12 @@
// table of contents
toc::[]

+// document copyright and licensing information
+include::licensing.adoc[]
+
+// changelog for the document
+include::changelog.adoc[]
+
[preface]
== Terminology
[cols="1,4", width=80%, align="left", options="header"]
@@ -78,20 +84,26 @@ specification has to be self certified by the
platform compatibility test
suite (PCT). More details about the PCT are available in the platform policy
specification.

-Platforms are augmented with extensions for industry specific target
-market verticals like “server†, “mobile†, “edge
computing†, “machine-learningâ€
-and “automotive†.
-
-The platform specification currently defines two platforms:
-
-* *OS-A Platform*: This specifies a rich-OS platform for
-Linux/FreeBSD/Windows - flavors that run on enterprise and embedded class
-application processors. The OS-A platform has a base feature set and extensions
-as shown below: +
-** *Base*
-** *Server Extension*
-
-* *M Platform*: This specifies an RTOS platform for bare-metal applications and
+The platform specification currently defines two platforms as shown below.
+Additional platforms are expected to be defined in the future for industry
+specific target market verticals like “mobile†, “edge computing†,
+“machine-learning†"desktop", “automotive†and more.
+
+* *OS-A Platform*: The OS-A platform specifies a category of rich-OS platforms
+that support operating systems like Linux, FreeBSD, Windows and more;
+flavors that run on enterprise and embedded class application processors.
+Each OS-A platform that is defined below is independent in its representation
+and is not dependent on any other platform for its features or specifications.
+Requirements common across multiple platforms are bundled together in the OS-A
+Common Requirements section in order to prevent duplication of content. The
+specific platform can include all or some of the requirements in the common
+section and add or modify these as per the specific requirements.
+The OS-A platforms that are currently defined are the following: +
+** *OS-A Embedded Platform*
+** *OS-A Server Platform*
+
+* *M Platform*: The M platform specifies an RTOS platform for bare-metal
+applications and
small operating systems running on a microcontroller. The M platform has a base
feature set and extensions as shown below: +
** *Base*
@@ -102,13 +114,11 @@ functionality available in S, U, VS and VU
modes, and the standardization of
the SBI (Supervisory Binary Interface as defined in <<spec_sbi>>) between
Supervisor level (S-mode/VS-mode) and M-mode/HS-mode respectively.

-// OS-A Platform
-== OS-A Platform
+// OS-A Platform Common requirements
+== OS-A Common Requirements

-// Base feature set for OS-A Platform
-=== Base
-==== ISA Requirements
-===== General
+=== ISA Requirements
+==== General

* This OS-A platform must comply with the RVA22U and RVA22S ISA profiles as
defined in the RISC-V ISA Profiles specification [11].
@@ -121,10 +131,8 @@ if the standard extension is not required.
* All hart PMA regions for main memory must be marked as coherent.
* Memory accesses by I/O masters can be coherent or non-coherent with respect
to all hart-related caches.
-[sidebar]
----

-===== Supervisor mode
+==== Supervisor mode
* sstatus
** sstatus.UBE must support the same access attribute (read-only or writable)
as mstatus.UBE.
@@ -145,7 +153,7 @@ non-zero and zero values as architecturally defined.
** For RV32, Bare and Sv32 translation modes must be supported.
** For RV64, Bare and Sv39 translation modes must be supported.

-===== Hypervisor extension
+==== Hypervisor extension
* hstatus
** VTW bit must not be hardwired to 0.
** VTVM bit must not be hardwired to 0.
@@ -178,23 +186,8 @@ non-zero and zero values as architecturally defined.
** For RV32, Bare and Sv32 translation modes must be supported.
** For RV64, Bare and Sv39 translation modes must be supported.

-==== PMU
-
-The RVA22 profile defines 32 PMU counters out-of-which first three counters are
-defined by the privilege specification while other 29 counters are
programmable.
-The SBI PMU extension defines a set of hardware events that can be monitored
-using these programmable counters. This section defines the minimum number of
-programmable counters and hardware events required for an OS-A compatible
-platform.
-
-* Counters
-** The platform does not require to implement any of the programmable counters.
-* Events
-** The platform does not require to implement any of the hardware
events defined
-in SBI PMU extensions.
-
-==== Debug
-The OS-A base platform requirements are the following:
+=== Debug
+The OS-A platform common requirements are the following:

- Implement resethaltreq
* Rationale: Debugging immediately out of reset is a useful debug tool.
@@ -275,26 +268,12 @@ each must be 1
The default should allow code that's sensitive to these requirements to be
debugged.

-==== Interrupts and Timer
-
-===== Timer support
-
+=== Timers
* One or more ACLINT MTIMER devices are required for the OS-A platform.
-* Platform must support a default ACLINT MTIME counter resolution of 10ns
- (i.e. an increment by 1 represents 10 ns).
-* The ACLINT MTIME update frequency (i.e. hardware clock) must be between
- 10 MHz and 100 MHz, and updates must be strictly monotonic.
-
-[sidebar]
---
-[underline]*_Implementation Note:_*
-For example, if the MTIME counter update frequency (i.e. hardware clock) is
-25 MHz then the MTIME counter would increment by 4 upon every hardware clock
-tick for MTIME counter resolution of 10ns.
---
-
-===== Interrupts Support
+* Platform must support an ACLINT MTIME counter resolution of 100ns or less
+(corresponding to a clock tick frequency of at least 10 MHz).

+=== Interrupts
The OS-A platform must comply with one of the four interrupt support
categories described in following sub-sections. The hardware must support at
least one of the four interrupt categories while software must support all of
@@ -302,7 +281,7 @@ the interrupt categories described below. Any
hardware requirement for a specifi
privilege mode is only applicable for platforms supporting that privilege mode.

[#legacy_wired_irqs]
-====== Legacy wired IRQs - DEPRECATED
+==== Legacy wired IRQs - DEPRECATED
** One or more PLIC devices are required to support wired interrupts.
** One or more ACLINT MSWI devices are required to support M-mode software
interrupts.
@@ -314,7 +293,7 @@ devices.
** MSI virtualization is not supported.

[#only_wired_irqs]
-====== Only Wired IRQs
+==== Only Wired IRQs
** One or more AIA APLIC devices are required to support wired interrupts.
** One or more ACLINT MSWI devices are required to support M-mode
software interrupts.
** One or more ACLINT SSWI devices are required to support S/HS-mode
software interrupts.
@@ -323,7 +302,7 @@ devices.
** MSI virtualization is not supported.

[#msis_and_wired_irqs]
-====== MSIs and Wired IRQs
+==== MSIs and Wired IRQs
** AIA local interrupt CSRs must be supported by each hart.
*** `siselect` CSR must support holding 9-bit value.
*** `vsiselect` CSR must support holding 9-bit value if H-extension is
@@ -342,7 +321,7 @@ support wired irqs.
** MSI virtualization is not supported.

[#msis_virtual_msis_and_wired_irqs]
-====== MSIs, Virtual MSIs, and Wired IRQs
+==== MSIs, Virtual MSIs, and Wired IRQs
** To support virtual MSIs, the H-extension must be implemented.
*** GEILEN must be 3 or more.
** AIA local interrupt CSRs must be supported by each hart.
@@ -361,7 +340,7 @@ platform support wired irqs.
AIA IMSIC devices.
** MSI virtualization is supported.

-===== Summary
+==== Summary

The <<table_interrutps_and_timer_osa_platforms>> below summarizes the four
categories of interrupt support and timer support allowed on an OS-A platorm.
@@ -445,8 +424,8 @@ categories of interrupt support and timer support
allowed on an OS-A platorm.
|+++<color rgb="#e69138"><font size=".6em">Priv Sstc</font></color>+++
|===

-==== System Peripherals
-===== UART/Serial Console
+=== System Peripherals
+==== UART/Serial Console

In order to facilitate the bring-up and debug of the low level initial
platform, hardware is required to implement a UART port that confirms to the
@@ -460,33 +439,10 @@ of the following:
** UART 16550 - MANDATORY
** UART 8250 - DEPRECATED

-==== Boot Process
-- The base specification defines the interface between the firmware and the
-operating system suitable for the RISC-V platforms with rich operating
-systems.
-- These requirements specify the required boot and runtime services, device
-discovery mechanism, etc.
-- The requirements are operating system agnostic, specific firmware/bootloader
-implementation agnostic.
-- For the generic mandatory requirements this base specification will refer to
-the EBBR specification <<spec_ebbr>>. Any deviation from the EBBR will be
-explicitly mentioned in the requirements.
-
-
-===== Firmware
-====== Storage and Partitioning
-- GPT partitioning required for shared storage.
-- MBR support is not required.
-
-===== Hardware Discovery Mechanisms
-- Device Tree (DT) is the required mechanism for system description.
-- Platforms must support the Unified Discovery specification for all pre-boot
-information population <<spec_unified_discovery>>.
-

-==== Runtime Services
+=== Runtime Services

-===== SBI
+==== SBI

* The M-mode runtime must implement SBI specification <<spec_sbi>> or higher.
* Required SBI extensions include:
@@ -497,7 +453,7 @@ information population <<spec_unified_discovery>>.
** SBI SRST
** SBI PMU

-===== UEFI
+==== UEFI

* Wherever applicable UEFI firmware must implement UEFI interfaces over
similar interfaces and services present in the SBI specification. For
@@ -506,7 +462,7 @@ information population <<spec_unified_discovery>>.
* The operating system should prioritize calling the UEFI interfaces before
the SBI or platform specific mechanisms.

-==== Software and ABIs
+=== Software and ABIs
The platform specification mandates the following requirements for
software components:

@@ -541,17 +497,54 @@ transactions that precisely traps if violated.
*** Platform must provide a protection mechanism from I/O agents manipulating
or accessing machine mode assets.

-// Server extension for OS-A Platform
-=== Server Extension
-The server extension specifies additional requirements for server class
-platforms. The server extension includes all of the requirements for the
-base with the additional requirements as below. The server extension, besides
-placing additional requirements on top of the underlying base specification,
-can also restrict the options allowed in the underlying base specification for
-satisfying a requirement.
-
-==== ISA Requirements
-===== General
+// OS-A Embedded Platform
+== OS-A Embedded Platform
+The OS-A Embedded Platform targets embedded class applications. The OS-A
+Embedded Platform inherits all the requirements as defined in the OS-A Platform
+Common Requirements section. Additional requirements are detailed in the
+following sections.
+
+=== PMU
+The RVA22 profile defines 32 PMU counters out-of-which first three counters are
+defined by the privilege specification while other 29 counters are
programmable.
+The SBI PMU extension defines a set of hardware events that can be monitored
+using these programmable counters. This section defines the minimum number of
+programmable counters and hardware events required for an OS-A Embedded
+compatible platform.
+
+* Counters
+** The platform does not require to implement any of the programmable counters.
+* Events
+** The platform does not require to implement any of the hardware
events defined
+in SBI PMU extensions.
+
+=== Boot Process
+- The OS-A Embedded Platform must comply with the EBBR specification
+<<spec_ebbr>>. Any deviation from the EBBR will be explicitly mentioned in
+the requirements in this section.
+
+==== Firmware
+===== Storage and Partitioning
+- GPT partitioning required for shared storage.
+- MBR support is not required.
+
+==== Hardware Discovery Mechanisms
+- Platforms must support the Unified Discovery specification for all pre-boot
+information population <<spec_unified_discovery>>.
+
+===== Device Tree (DT)
+- Device Tree (DT) is the required mechanism for the hardware discovery and
+configuration.
+
+// OS-A Server Platform
+== OS-A Server Platform
+The OS-A Server Platform targets server class applications. The OS-A
+Server Platform inherits all the requirements as defined in the OS-A Platform
+Common Requirements section. Additional requirements are detailed in the
+following sections.
+
+=== ISA Requirements
+==== General
* The hypervisor H-extension must be supported.
* The Zam extension must be supported for misaligned addresses within
at least aligned 16B regions.
* The `time` CSR must be implemented in hardware.
@@ -561,12 +554,12 @@ satisfying a requirement.
There should be hardware support for all misaligned accesses; misaligned
accesses should not take address misaligned exceptions.

-===== Supervisor mode
+==== Supervisor mode
* satp
** For RV64, Sv48 translation mode must be supported.
** At least 8 ASID bits must be supported and not hardwired to 0.

-===== Hypervisor extension
+==== Hypervisor extension
* hgatp
** For RV64, Sv48x4 translation mode must be supported.
** At least 8 VMID bits must be supported and not hardwired to 0.
@@ -575,7 +568,13 @@ accesses should not take address misaligned exceptions.
** For RV64, Sv48 translation mode must be supported.
** At least 8 ASID bits must be supported and not hardwired to 0.

-==== PMU
+=== PMU
+The RVA22 profile defines 32 PMU counters out-of-which first three counters are
+defined by the privilege specification while other 29 counters are
programmable.
+The SBI PMU extension defines a set of hardware events that can be monitored
+using these programmable counters. This section defines the minimum number of
+programmable counters and hardware events required for an OS-A Server
+compatible platform.

* Counters
** The platform must implement at least 8 programmable counters.
@@ -597,9 +596,9 @@ Any platform that does not implement the
micro-architectural features related to
a hardware event may hardwire the event value to zero.
--

-==== Debug
-The server extension requirements are all of the base specification
-requirements plus:
+=== Debug
+The OS-A Server platform includes all the requirements as specified in the
+OS-A Common Requirements section plus the following:

- Implement at least six mcontrol6 triggers that can support matching on PC
(select=0, execute=1, match=0) with timing=0 and full support for mode
@@ -611,13 +610,10 @@ above
respect to all harts connected to the DM
* Rationale: Debuggers must be able to view memory coherently.

-==== Interrupts and Timer
-
-===== Interrupts support
-
-The server extension must comply with interrupt support described in
-<<msis_virtual_msis_and_wired_irqs>> with the following additional
-requirements:
+=== Interrupts
+The OS-A Server platform must support the interrupt requirements as specified
+in the OS-A Common Requirements Interrupts section
+<<msis_virtual_msis_and_wired_irqs>> plus the following:

* The H-extension implemented by each hart must support GEILEN = 5 or more.
* Per-hart AIA IMSIC devices.
@@ -630,20 +626,20 @@ requirements:
Platforms should implement at least 5 guest interrupt files. More guest
interrupt files allow for better VM oversubscription on the same hart.

-==== Boot Process
-===== Firmware
+=== Boot Process
+==== Firmware
The boot and system firmware for the server platforms must support UEFI as
defined in the section 2.6.1 of the UEFI Specification <<spec_uefi>> with some
additional requirements described in following sub-sections.

-====== UEFI Configuration Tables
+===== UEFI Configuration Tables
The platforms are required to provide following tables:

* *EFI_ACPI_20_TABLE_GUID* ACPI configuration table which is at version 6.4+ or
newer with HW-Reduced ACPI model.
* *SMBIOS3_TABLE_GUID* SMBIOS table which conforms to version 3.4 or later.

-====== UEFI Protocol Support
+===== UEFI Protocol Support
The UEFI protocols listed below are required to be implemented.

.Additional UEFI Protocols
@@ -654,15 +650,17 @@ The UEFI protocols listed below are required to
be implemented.
|EFI_PCI_IO_PROTOCOL | 14.4 | For PCIe support
|===

-===== Hardware Discovery Mechanisms
+==== Hardware Discovery Mechanisms
+- Platforms must support the Unified Discovery specification for all pre-boot
+information population <<spec_unified_discovery>>.

-====== ACPI
+===== ACPI
ACPI is the required mechanism for the hardware discovery and configuration.
Server platforms are required to adhere to the RISC-V ACPI Platform
Requirements
Specification <<spec_riscv_acpi>>. Platform firmware must support ACPI and
the runtime OS environment must use ACPI for device discovery and
configuration.

-====== SMBIOS
+===== SMBIOS
The System Management BIOS (SMBIOS) table is required for the platform
conforming to server extension. The SMBIOS records provide basic hardware and
firmware configuration information used widely by the platform management
@@ -687,9 +685,12 @@ characteristics and HART hardware features
discovered during the firmware boot
process.
|===

-==== Runtime services
+=== Runtime services
+The OS-A Server platform includes all the runtime services requirements as
+specified in the OS-A Common Requirements Runtime Services section plus the
+following.

-===== UEFI
+==== UEFI
The UEFI run time services listed below are required to be implemented.

.Required UEFI Runtime Services
@@ -723,9 +724,12 @@ implemented but it can return EFI_UNSUPPORTED.
implemented but it can return EFI_UNSUPPORTED.
|===

-==== System Peripherals
+=== System Peripherals
+The OS-A Server platform includes all the system peripheral requirements as
+specified in the OS-A Common Requirements System Peripherals section plus
+the added requirements in this section.

-===== Watchdog Timers
+==== Watchdog Timers
Implementation of a two-stage watchdog timer, as defined in the RISC-V Watchdog
Timer Specification<<spec_riscv_watchdog>> is required. Software must
periodically refresh the watchdog timer, otherwise a first-stage watchdog
@@ -747,7 +751,7 @@ targeting a specific hart.

The resultant action taken is platform-specific.

-===== System Date/Time[[SystemDateTime]]
+==== System Date/Time[[SystemDateTime]]
In order to facilitate server manageability, server extension platform is
required to provide the mechanism to maintain system date/time for UEFI
runtime Time service. +
@@ -761,11 +765,11 @@ runtime Time service. +
EFI_UNSUPPORTED if the platform doesn't require the features or the system
date/time mechanism doesn’t have the capabilities.

-===== PCIe
+==== PCIe
Platforms are required to support at least PCIe Base Specification Revision 1.1
<<spec_pcie_sig>>.

-====== PCIe Config Space
+===== PCIe Config Space
* Platforms must support access to the PCIe config space via ECAM as described
in the PCIe Base specification.
* The entire config space for a single PCIe domain must be accessible via a
@@ -777,7 +781,7 @@ supported PCIe domains and map the ECAM I/O region
for each domain.
memory attributes are that of a PMA I/O region (i.e. strongly-ordered,
non-cacheable, non-idempotent).

-====== PCIe Memory Space
+===== PCIe Memory Space
Platforms are required to map PCIe address space directly in the system address
space and not have any address translation for outbound accesses from harts or
for inbound accesses to any component in the system address space.
@@ -811,7 +815,7 @@ Such an access control mechanism could be
analogous to the per-hart PMP
as described in the RISC-V Privileged Architectures specification.
--

-====== PCIe Interrupts
+===== PCIe Interrupts
* Platforms must support both INTx and MSI/MSI-x interrupts.
* Following are the requirements for INTx:
** For each root port in the system, the platform must map all the INTx
@@ -833,13 +837,13 @@ requests 16 MSI vectors the minimum MSI data
value assigned by the platform
software can be 0x10 so that the function can use lower 4 bits to assert each
of the 16 vectors.

-====== PCIe cache coherency
+===== PCIe cache coherency
Memory that is cacheable by harts is not kept coherent by hardware when PCIe
transactions to that memory are marked with a No_Snoop bit of zero. In this
case, software must manage coherency on such memory; otherwise, software
coherency management is not required.

-====== PCIe Topology
+===== PCIe Topology
Platforms are required to implement at least one of the following topologies
and the components required in that topology.

@@ -899,17 +903,16 @@ implemented. RCEC is required to terminate the
AER and PME messages from RCiEP.
must be implemented in a separate PCIe domain and must be addressable via a
separate ECAM I/O region.

-===== PCIe Device Firmware Requirement
-PCI expansion ROM code type 3 (UEFI) image must be provided by PCIe device for
-OS/A server extension platform according to PCI Firmware
-Specification <<spec_pci_firmware>> if that PCIe device is utilized during
-UEFI firmware boot process. The image stored in PCI expansion ROM is a UEFI
-driver that must be compliant with UEFI specification <<spec_uefi>> 14.4.2
-PCI Option ROMs.
+===== PCIe Device Firmware
+PCI expansion ROM code type 3 (UEFI) image must be provided by PCIe device
+platform according to PCI Firmware Specification <<spec_pci_firmware>> if that
+PCIe device is utilized during UEFI firmware boot process. The image stored in
+PCI expansion ROM is a UEFI driver that must be compliant with UEFI
+specification <<spec_uefi>> 14.4.2 PCI Option ROMs.

-
-==== Security
-Platforms must implement the following security features:
+=== Security
+The OS-A Server platform includes all the security requirements as
+specified in the OS-A Common Requirements security section plus the following:

* Support for some form of Secure Boot, as a means to ensure the integrity of
platform firmware and software, is required. Flexibility is provided as
@@ -942,7 +945,7 @@ transactions that precisely traps if violated.
*** Platform must provide a protection mechanism from I/O agents manipulating
or accessing machine mode assets.

-==== RAS
+=== RAS
All the below mentioned RAS features are required for the OS-A platform server
extension:

--
Regards
Kumar


Re: Platform specification questions

Greg Favor
 

The following two items in Ved's email didn't get any response, so I offer my own below ...

On Sun, Dec 12, 2021 at 4:15 PM Vedvyas Shanbhogue <ved@...> wrote:
Section 2.3.7.3.2 - PCIe memory space:
The requirement to not have any address translation for inbound accesses to any component in system address space is restrictive. If direct assignment of devices is supported then the IOMMU would be required to do the address translation for inbound accesses. Further for hart originated accesses where the PCIe memory is mapped into virtual address space there needs to be a translation through the first and/or second level page tables.  Please help clarify why PCie memory must not be mapped into virtual address space and why use of IOMMU to do translation is disallowed by the specification.

I think where this came from is learnings in the ARM "server" ecosystem (as then got captured in SBSA).  In particular, one wants devices and software on harts to have the same view of system physical address space so that, for example, pointers can be easily passed around.  Which doesn't conflict with having address translation by IOMMUs.  Maybe the current text needs to be better worded, but I think the ideas to be expressed are:

For inbound PCIe transactions:

- There should be no hardware modifications of PCIe addresses outside of an IOMMU (as some vendors way back in early ARM SBSA days were wont to do).

- If there is not an IOMMU associated with the PCIe interface, then PCIe devices will have the same view of PA space as the harts.

- If there is an IOMMU associated with the PCIe interface, then system software can trust that all address modifications are under its control via hart page tables and IOMMU page tables.

For outbound PCIe transactions, system software is free to set up VA-to-PA translations in hart page tables.  I think the mandate against outbound address translation was accidentally mistaken.  The key point is that there is one common view of system physical address space.  Hart and IOMMU page tables may translate from hart VA's and device addresses to system physical address space, but the above ensures that "standard" system software has full control over this and doesn't have non-standard address transformations happening that it isn't aware of and doesn't know how to control.

 
Section 2.3.7.3.3 - PCIe interrupts:
It seems unnecessary to require platforms built for the '22 version of the platform to have to support running software that is not MSI aware. Please clarify why supporting the INTx emulation for legacy/Pre-PCIe software compatibility a required and not an optional capability for RISC-v platforms?

This one seems questionable to me as well, although I'm not the expert to reliably proclaim that INTx support is no longer a necessity in some server-class systems.  I can imagine that back in earlier ARM "server" days this legacy issue was a bigger deal and hence was mandated in SBSA.  But maybe that is no longer an issue?  Or at least for 2022+ systems - to the point where mandating this legacy support is an unnecessary burden on many or the majority of such systems.

If this is a fair view going forward, then the INTx requirements should just become recommendations for systems that do feel the need to care about INTx support.

Greg
 


Re: Platform specification questions

Kumar Sankaran
 

On Tue, Dec 14, 2021 at 7:14 AM Ved Shanbhogue <ved@...> wrote:

On Mon, Dec 13, 2021 at 08:47:51PM -0800, Kumar Sankaran wrote:

So one suggestion is we remove specific errors like single-bit errors,
multi-bit errors and such and limit the features to error handling,
detection and logging/reporting.
So we could drop these statements:
"
- Main memory must be protected with SECDED-ECC.
- All cache structures must be protected.
- single-bit errors must be detected and corrected.
- multi-bit errors can be detected and reported.
"

And change this statement to drop the restriction to "these protected structures":
"There must be memory-mapped RAS registers to log detected errors with information about the type and location of the error"

regards
ved
Yes, fine by me. We can make the changes you have suggested above and
leave the remaining content as is.


--
Regards
Kumar


Re: Platform specification questions

Philipp Tomsich
 

Kumar & Greg,

On Tue, Dec 14, 2021 at 5:48 AM Kumar Sankaran <ksankaran@...> wrote:
On Mon, Dec 13, 2021 at 6:56 PM Greg Favor <gfavor@...> wrote:
>
> On Mon, Dec 13, 2021 at 5:38 PM Ved Shanbhogue <ved@...> wrote:
>>
>> This was one of the source of my questions. If the platform specifications intent is to specify the SEE, ISA and non-ISA hardware - the hardware/software contract - as visible to software so that a shrink wrapped operating system can load then I would say its not the platform specifications role to teach how to design resilient hardware. If the goal of the platform specification is to teach hardware designers about how to design resilient hardware then I think the specification falls short in many ways...I think you also hit upon that in the next statement.
>
>
> I wouldn't view platform mandates of this sort as teaching, but as establishing a baseline that system integrators can depend on - by guiding the hardware developers as to what that expected baseline is.  (But I get your point.)
>
>>
>> My understanding was the former i.e. establishing the standard for hardware-software interoperability. Specifically in areas of RAS I think where the interoperability is required - e.g. standardized logging/reporting, redirecting reporting to firmware-first, etc. I think should be in the purview.
>
>
> Agreed.
>
> The fundamental question is whether the goal of the platform spec is solely to ensure hardware-software interoperability and not to go further in ensuring other minimum capabilities that compliant platforms will provide.  What should be said and not said about RAS follows from that.
>
> Given that people are leaning towards the more limited scope or goal for the OS-A platforms, then that directly implies that there should be no requirements about what RAS features/coverage/etc. are actually implemented by compliant platforms.
>
> Greg

The intent of the platform spec is hardware-software interoperability.
I agree that dictating RAS hardware features is not within the scope
of the platform spec. However, we do want standards for RAS error
handling, error detection, logging/reporting and such. For example
using APEI to convey error information to OSPM is needed for software
interop.
So one suggestion is we remove specific errors like single-bit errors,
multi-bit errors and such and limit the features to error handling,
detection and logging/reporting.

If the content is worthwhile, please consider putting it in an informative section.  Content, such as discussed, might either become an (inline) application note—or go into a separate informative appendix that dives into the relationship between OS-A and RAS features. 

Philipp.


Re: Platform specification questions

Ved Shanbhogue
 

On Mon, Dec 13, 2021 at 08:47:51PM -0800, Kumar Sankaran wrote:

So one suggestion is we remove specific errors like single-bit errors,
multi-bit errors and such and limit the features to error handling,
detection and logging/reporting.
So we could drop these statements:
"
- Main memory must be protected with SECDED-ECC.
- All cache structures must be protected.
- single-bit errors must be detected and corrected.
- multi-bit errors can be detected and reported.
"

And change this statement to drop the restriction to "these protected structures":
"There must be memory-mapped RAS registers to log detected errors with information about the type and location of the error"

regards
ved


Re: Platform specification questions

Kumar Sankaran
 

On Mon, Dec 13, 2021 at 6:56 PM Greg Favor <gfavor@...> wrote:

On Mon, Dec 13, 2021 at 5:38 PM Ved Shanbhogue <ved@...> wrote:

This was one of the source of my questions. If the platform specifications intent is to specify the SEE, ISA and non-ISA hardware - the hardware/software contract - as visible to software so that a shrink wrapped operating system can load then I would say its not the platform specifications role to teach how to design resilient hardware. If the goal of the platform specification is to teach hardware designers about how to design resilient hardware then I think the specification falls short in many ways...I think you also hit upon that in the next statement.

I wouldn't view platform mandates of this sort as teaching, but as establishing a baseline that system integrators can depend on - by guiding the hardware developers as to what that expected baseline is. (But I get your point.)


My understanding was the former i.e. establishing the standard for hardware-software interoperability. Specifically in areas of RAS I think where the interoperability is required - e.g. standardized logging/reporting, redirecting reporting to firmware-first, etc. I think should be in the purview.

Agreed.

The fundamental question is whether the goal of the platform spec is solely to ensure hardware-software interoperability and not to go further in ensuring other minimum capabilities that compliant platforms will provide. What should be said and not said about RAS follows from that.

Given that people are leaning towards the more limited scope or goal for the OS-A platforms, then that directly implies that there should be no requirements about what RAS features/coverage/etc. are actually implemented by compliant platforms.

Greg
The intent of the platform spec is hardware-software interoperability.
I agree that dictating RAS hardware features is not within the scope
of the platform spec. However, we do want standards for RAS error
handling, error detection, logging/reporting and such. For example
using APEI to convey error information to OSPM is needed for software
interop.
So one suggestion is we remove specific errors like single-bit errors,
multi-bit errors and such and limit the features to error handling,
detection and logging/reporting.

--
Regards
Kumar


Re: Platform specification questions

Greg Favor
 

On Mon, Dec 13, 2021 at 5:38 PM Ved Shanbhogue <ved@...> wrote:
This was one of the source of my questions. If the platform specifications intent is to specify the SEE, ISA and non-ISA hardware - the hardware/software contract - as visible to software so that a shrink wrapped operating system can load then I would say its not the platform specifications role to teach how to design resilient hardware. If the goal of the platform specification is to teach hardware designers about how to design resilient hardware then I think the specification falls short in many ways...I think you also hit upon that in the next statement.

I wouldn't view platform mandates of this sort as teaching, but as establishing a baseline that system integrators can depend on - by guiding the hardware developers as to what that expected baseline is.  (But I get your point.)
 
My understanding was the former i.e. establishing the standard for hardware-software interoperability. Specifically in areas of RAS I think where the interoperability is required - e.g. standardized logging/reporting, redirecting reporting to firmware-first, etc. I think should be in the purview.

Agreed.

The fundamental question is whether the goal of the platform spec is solely to ensure hardware-software interoperability and not to go further in ensuring other minimum capabilities that compliant platforms will provide.  What should be said and not said about RAS follows from that.

Given that people are leaning towards the more limited scope or goal for the OS-A platforms, then that directly implies that there should be no requirements about what RAS features/coverage/etc. are actually implemented by compliant platforms.

Greg


Re: Platform specification questions

Ved Shanbhogue
 

On Mon, Dec 13, 2021 at 05:11:38PM -0800, Greg Favor wrote:
I think this whole RAS-related topic in the current platform draft was to
establish some form of modest RAS requirement (versus no requirement) until
a proper RAS arch spec exists. Although even then (assuming that arch spec
is like x86 and ARM RAS specs that are just concerned with standardizing
RAS registers for logging and the mechanisms for reporting errors), there
still won't be any minimum requirement for actual error detection and
correction.
I agree. I think the RAS ISA would want to be about standardized error logging and reporting but not mandate what errors are detected/corrected and how they are corrected or contained. For example, even in x86 and ARM space there are many product segments which have varying degrees of resilience but the RAS architecture flexibly covers the full spectrum of implementations between multiple x86 and ARM vendors.

Fundamentally, should the Server platform spec mandate ANY error
detection/correction requirements, or just leave it as a wild west among
hardware developers to individually and eventually figure out where the
line exists as far as the basic needs for RAS in *Server*-compliant
platforms? And leave it for system integrators to discover that some
Server-compliant hardware has less than "basic" RAS?
This was one of the source of my questions. If the platform specifications intent is to specify the SEE, ISA and non-ISA hardware - the hardware/software contract - as visible to software so that a shrink wrapped operating system can load then I would say its not the platform specifications role to teach how to design resilient hardware. If the goal of the platform specification is to teach hardware designers about how to design resilient hardware then I think the specification falls short in many ways...I think you also hit upon that in the next statement.

BUT if the platform spec is ONLY trying to establish hardware/software
interoperability, and not also match up hardware and software expectations
regarding other areas of functionality such as RAS, then that answers the
question. My own leaning is towards trying to address the latter versus
the narrower view that the only concern is software interoperability. But
I understand the arguments both ways.
My understanding was the former i.e. establishing the standard for hardware-software interoperability. Specifically in areas of RAS I think where the interoperability is required - e.g. standardized logging/reporting, redirecting reporting to firmware-first, etc. I think should be in the purview. Aspects like "every cache must have single bit error correction" or "must implement SECDED-ECC" may not be necessary to acheive this objective. For example, an implementation may have two levels caches where instructions may be cached and for the lowest level the implementation may only implement parity but on a error refetch from a higher level cache or DDR where there might be ECC. So for such an implementation to require ECC in its instruction cache seems not required - the machine is meeting its FIT rate objectives through other means.

regards
ved


Re: Platform specification questions

Greg Favor
 

On Mon, Dec 13, 2021 at 2:22 PM Ved Shanbhogue <ved@...> wrote:
>Mandate:  *At a minimum, caching structures must be protected such that
>single-bit errors are detected and corrected by hardware.*
>
Would a mandate be overeaching and why limit it to caches then?

This was just trying to mandate a basic requirement and not go as far as requiring protection of all RAM-based structures - which some may view as overreach.  Conversely I can understand that some people can view that "all caching structures" is already an overreach.  

A product may define its reliability goals and may reason that a certain cache need not be protected due to various reasons like the technology in which the product is built, the altitude at which it is supposed to be used, the architectural vulnerability factor computed for that structure, etc.

I am failing to understand how would we be adding to or removing from the OS-A platform compatibility goals which is to be able to boot a shrink wrapper server operating system by trying to provide a mandate on how it implements reliability?

I think this whole RAS-related topic in the current platform draft was to establish some form of modest RAS requirement (versus no requirement) until a proper RAS arch spec exists.  Although even then (assuming that arch spec is like x86 and ARM RAS specs that are just concerned with standardizing RAS registers for logging and the mechanisms for reporting errors), there still won't be any minimum requirement for actual error detection and correction.

Fundamentally, should the Server platform spec mandate ANY error detection/correction requirements, or just leave it as a wild west among hardware developers to individually and eventually figure out where the line exists as far as the basic needs for RAS in Server-compliant platforms?  And leave it for system integrators to discover that some Server-compliant hardware has less than "basic" RAS?

BUT if the platform spec is ONLY trying to establish hardware/software interoperability, and not also match up hardware and software expectations regarding other areas of functionality such as RAS, then that answers the question.  My own leaning is towards trying to address the latter versus the narrower view that the only concern is software interoperability.  But I understand the arguments both ways.

Greg

241 - 260 of 1845