Re: [PATCH 1/1] RAS features for OS-A platform server extension


Abner Chang
 



Kumar Sankaran <ksankaran@...> 於 2021年6月16日 週三 上午8:17寫道:
Signed-off-by: Kumar Sankaran <ksankaran@...>
---
 riscv-platform-spec.adoc | 42 ++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/riscv-platform-spec.adoc b/riscv-platform-spec.adoc
index 4c356b8..d779452 100644
--- a/riscv-platform-spec.adoc
+++ b/riscv-platform-spec.adoc
@@ -19,18 +19,6 @@
 // table of contents
 toc::[]

-// document copyright and licensing information
-include::licensing.adoc[]
-
-// changelog for the document
-include::changelog.adoc[]
-
-// Introduction: describe the intent and purpose of the document
-include::introduction.adoc[]
-
-// Profiles: (NB: content from very first version)
-include::profiles.adoc[]
-
 == Introduction
 The platform specification defines a set of platforms that specify requirements
 for interoperability between software and hardware. The platform policy
@@ -68,11 +56,13 @@ The M platform has the following extensions:
 |SBI       | Supervisor Binary Interface
 |UEFI      | Unified Extensible Firmware Interface
 |ACPI      | Advanced Configuration and Power Interface
+|APEI      | ACPI Platform Error Interfaces
 |SMBIOS    | System Management Basic I/O System
 |DTS       | Devicetree source file
 |DTB       | Devicetree binary
 |RVA22     | RISC-V Application 2022
 |EE        | Execution Environment
+|OSPM      | Operating System Power Management
 |RV32GC    | RISC-V 32-bit general purpose ISA described as RV32IMAFDC.
 |RV64GC    | RISC-V 64-bit general purpose ISA described as RV64IMAFDC.
 |===
@@ -87,6 +77,7 @@ The M platform has the following extensions:
 |link:[RVA22 Specification]
                                        | TBD
 |link:https://arm-software.github.io/ebbr/[EBBR Specification]
                                        | v2.0.0-pre1
 |link:https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf[ACPI
Specification]              | v6.4
+|link:https://uefi.org/specs/ACPI/6.4/18_ACPI_Platform_Error_Interfaces/ACPI_PLatform_Error_Interfaces.html[APEI
Specification]              | v6.4
 |link:https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.4.0.pdf[SMBIOS
Specification]    | v3.4.0
 |link:[Platform Policy]
                                        | TBD
 |===
@@ -504,6 +495,30 @@ delegate the virtual supervisor timer interrupt
to 'VS' mode.
 * IOMMU

 ==== RAS
+All the below mentioned RAS features are required for the OS-A platform server
+extension
+
+*  Main memory must be protected with SECDED-ECC +
+*  All cache structures must be protected +
+** single-bit errors must be detected and corrected +
+** multi-bit errors can be detected and reported +
+* There must be memory-mapped RAS registers associated with these protected
+structures to log detected errors with information about the type and location
+of the error +
+* The platform must support the APEI specification to convey all error
+information to OSPM +
+* Correctable errors must be reported by hardware and either be corrected or
+recovered by hardware, transparent to system operation and to software +
+* Hardware must provide status of these correctable errors via RAS registers +
+* Uncorrectable errors must be reported by the hardware via RAS error
+registers for system software to take the needed corrective action +
+* Attempted use of corrupted (uncorrectable) data must result in a precise
+exception on that instruction with a distinguishing custom exception cause
+code +
+* Errors logged in RAS registers must be able to generate an interrupt request
+to the system interrupt controller that may be directed to either M-mode or
+S/HS-mode for firmware-first versus OS-first error reporting +
+* PCIe AER capability is required +

Hi Kumar,
I would like to add something.
In order to support the OEM RAS policy,
- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.
- Each RAS error should be able to mask through RAS configuration registers.
- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.
 
Besides memory and PCIe RAS, do we have RAS errors for the processor/HART? such as IPI error or some CE/UC/UCR to HART locally?

Regards,
Abner

 // M Platform
 == M Platform
@@ -593,6 +608,3 @@ also implement PMP support.
 When PMP is supported it is recommended to include at least 4 regions, although
 if possible more should be supported to allow more flexibility. Hardware
 implementations should aim for supporting at least 16 PMP regions.
-
-// acknowledge all of the contributors
-include::contributors.adoc[]
--
2.21.0





Join tech-unixplatformspec@lists.riscv.org to automatically receive all group messages.