Date   

[PATCH 1/1] Cache Coherency and ASID Requirements for OS-A platform

Kumar Sankaran
 

This patch adds the following
Cache coherency and ASID requirements
Interrupt Controller and PMU chapter sub-sections for OS-A base and
Server Extension

diff --git a/riscv-platform-spec.adoc b/riscv-platform-spec.adoc
index 87ea6d5..b985f50 100644
--- a/riscv-platform-spec.adoc
+++ b/riscv-platform-spec.adoc
@@ -88,8 +88,12 @@ The M platform has the following extensions:
* ISA Profile
** The OS-A platform is required to comply with the RVA22 profile.
* Cache Coherency
-* PMU
-* ASID
+** All HART related caches must be hardware coherent and must appear to
+software as Physically Indexed, Physically Tagged (PIPT) caches
+** Memory accesses by I/O masters can be coherent or non-coherent with respect
+to the HART related caches
+
+==== PMU

==== Debug
The OS-A base platform requirements are -
@@ -287,10 +291,12 @@ base with the additional requirements as below.
==== Architecture
The platforms which conform to server extension are required to implement +

-- RISC-V Hypervisor-level H Instruction-Set Extensions
-- IOMMU with support for memory resident interrupt files
-- PMU
-- ASID
+- RV64 support
+- RISC-V H ISA extension
+- ASID support
+- VMID support
+
+==== PMU

==== Debug
The OS-A server platform requirements are all of the base above plus:
@@ -305,6 +311,8 @@ above.
respect to all harts connected to the DM
* Rationale: Debuggers must be able to view memory coherently

+==== Interrupt Controller
+
==== Boot and Runtime Requirements
===== Firmware
The boot and system firmware for the RV64I server platforms required to be

--
Regards
Kumar


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Kumar Sankaran
 

Patch merged with all the changes requested.

 

Regards

Kumar

From: Greg Favor <gfavor@...>
Sent: Wednesday, June 23, 2021 11:02 PM
To: Abner Chang <renba.chang@...>
Cc: Kumar Sankaran <ksankaran@...>; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] [PATCH 1/1] RAS features for OS-A platform server extension

 

On Wed, Jun 23, 2021 at 8:25 PM Abner Chang <renba.chang@...> wrote:

Please review the below sentence. 

If the RAS event is configured as the firmware first model, the platform should be able to trigger the higest priority of M-mode interrupt to all HARTs in the physical RV processor. This prevents the subsequent RAS errors are propagated by other HARTs that access the problematic hardware (PCIe, Memory, I/O and etc.)

 

Note that the priority of any RAS interrupts would be software configurable in the interrupt controller.  Also note that there are other common techniques for preventing the propagation of errors and for isolating the impact of errors (e.g. precise hart exceptions on attempted use of corrupted data, data poisoning, I/O flow termination, ...).

 

One question:

Besides those RAS events come from the interrupt controller,

 

In a typical enterprise-class RAS architecture, "error events" are logged in RAS registers, which then optionally generate RAS interrupt requests.  These then go to the system interrupt controller, which prioritizes and routes requests to appropriate harts.  

 

how about the HART or Memory RAS events?

 

One would typically have RAS registers (for logging and reporting errors) spread around the system, ideally at all points in the system where errors can be detected and at all points where corrupted data can be consumed.  

 

Are those RAS events in the scope of exception? or they would be also routed to  interrupt controller?

 

RAS errors generally result in RAS interrupts, but when a hart tries to consume corrupted data, the ideal RAS behavior is for the hart to take a precise exception on the load instruction that is trying to consume corrupted data.

 

Or we don't have to worry about this, RAS TG will have the solution?

 

All this would be covered by a proper RAS architecture (to hopefully be developed by a TG next year).

 

Greg 


Re: [PATCH 1/1] Initial commit of PLIC

Alistair Francis
 

On Sun, 2021-06-20 at 21:32 +0800, Abner Chang wrote:
From: Abner Chang <abner.chang@...>

This is the commit for creating the patches for
widely review in Platform Spec HSC task group

Signed-off-by: Abner Chang <abner.chang@...>
Reviewed-by: Alistair Francis <alistair.francis@...>

Alistair

---
 riscv-plic.adoc | 306 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 306 insertions(+)
 create mode 100644 riscv-plic.adoc

diff --git a/riscv-plic.adoc b/riscv-plic.adoc
new file mode 100644
index 0000000..b770e0e
--- /dev/null
+++ b/riscv-plic.adoc
@@ -0,0 +1,306 @@
+= *RISC-V Platform-Level Interrupt Controller Specification*
+
+== Copyright and license information
+
+This RISC-V PLIC specification is
+
+[%hardbreaks]
+(C) 2017 Drew Barbier <drew@...>
+(C) 2018-2019 Palmer Dabbelt <palmer@...>
+(C) 2019 Abner Chang, Hewlett Packard Enterprise <abner.chang@...>
+
+It is licensed under the Creative Commons Attribution 4.0
International
+License (CC-BY 4.0).  The full license text is available at
+https://creativecommons.org/licenses/by/4.0/.
+
+== Introduction
+
+This document contains the RISC-V platform-level interrupt controller
(PLIC)
+specification, which defines an interrupt controller specifically
designed to
+work in the context of RISC-V systems.  The PLIC multiplexes various
device
+interrupts onto the external interrupt lines of Hart contexts, with
+hardware support for interrupt priorities. +
+This specification defines the general PLIC architecture and operation
parameters.
+The PLIC which claimed as PLIC-Compliant standard PLIC should follow
the
+implementations mentioned in sections below.
+
+.Figure 1 RISC-V PLIC Interrupt Architecture Block Diagram
+image::Images/PLIC.jpg[GitHub,1000,643,
link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLIC.jpg
]
+
+== RISC-V PLIC Operation Parameters
+
+General PLIC operation parameter register blocks are defined in this
spec, those are: +
+
+- *Interrupt Priorities registers:* +
+   The interrupt priority for each interrupt source. +
+
+- *Interrupt Pending Bits registers:* +
+   The interrupt pending status of each interrupt source. +
+  
+- *Interrupt Enables registers:* +
+   The enablement of interrupt source of each context. +
+
+- *Priority Thresholds registers:* +
+   The interrupt priority threshold of each context. +
+
+- *Interrupt Claim registers:* +
+   The register to acquire interrupt source ID of each context. +
+  
+- *Interrupt Completion registers:* +
+   The register to send interrupt completion message to the associated
gateway. +
+
++
+
+Below is the figure of PLIC Operation Parameter Block Diagram,
+
+.Figure 2 PLIC Operation Parameter Block Diagram
+image::Images/PLICArch.jpg[GitHub,
link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLICArch.jpg
]
+
+== Memory Map
+
+The `base address of PLIC Memory Map` is platform implementation-
specific.
+
+*PLIC Memory Map*
+
+       base + 0x000000: Reserved (interrupt source 0 does not exist)
+       base + 0x000004: Interrupt source 1 priority
+       base + 0x000008: Interrupt source 2 priority
+       ...
+       base + 0x000FFC: Interrupt source 1023 priority
+       base + 0x001000: Interrupt Pending bit 0-31
+       base + 0x00107C: Interrupt Pending bit 992-1023
+       ...     
+       base + 0x002000: Enable bits for sources 0-31 on context 0
+       base + 0x002004: Enable bits for sources 32-63 on context 0
+       ...
+       base + 0x00207F: Enable bits for sources 992-1023 on context 0
+       base + 0x002080: Enable bits for sources 0-31 on context 1
+       base + 0x002084: Enable bits for sources 32-63 on context
1     
+       ...
+       base + 0x0020FF: Enable bits for sources 992-1023 on context 1
+       base + 0x002100: Enable bits for sources 0-31 on context 2
+       base + 0x002104: Enable bits for sources 32-63 on context
2     
+       ...
+       base + 0x00217F: Enable bits for sources 992-1023 on context 2
+       ...
+       base + 0x1F1F80: Enable bits for sources 0-31 on context 15871
+       base + 0x1F1F84: Enable bits for sources 32-63 on context
15871         
+       base + 0x1F1FFF: Enable bits for sources 992-1023 on context
15871
+       ...     
+       base + 0x1FFFFC: Reserved
+       base + 0x200000: Priority threshold for context 0
+       base + 0x200004: Claim/complete for context 0
+       base + 0x200008: Reserved
+       ...
+       base + 0x200FFC: Reserved
+       base + 0x201000: Priority threshold for context 1
+       base + 0x201004: Claim/complete for context 1
+       ...
+       base + 0x3FFE000: Priority threshold for context 15871
+       base + 0x3FFE004: Claim/complete for context 15871
+       base + 0x3FFE008: Reserved
+       ...     
+       base + 0x3FFFFFC: Reserved
+       
+Sections below describe the control register blocks of PLIC operation
parameters.
+
+== Register Width
+
+The memory map register width is in 32-bit.
+
+== Interrupt Priorities
+
+If PLIC supports Interrupt Priorities, then each PLIC interrupt source
can be assigned a priority by writing to its 32-bit
+memory-mapped `priority` register.  A priority value of 0 is reserved
to mean ''never interrupt'' and effectively
+disables the interrupt. Priority 1 is the lowest active priority while
the maximum level of priority depends on
+PLIC implementation. Ties between global interrupts of the same
priority are broken by the Interrupt ID; interrupts
+with the lowest ID have the highest
+effective priority. +
+ +
+The base address of Interrupt Source Priority block within PLIC Memory
Map region is fixed at 0x000000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name*| *Function*|*Register Block Size in
Byte*| *Description*
+|Interrupt Source Priority
+|Interrupt Source Priority #0 to #1023
+|1024 * 4 = 4096(0x1000) bytes
+|This is a continuously memory block which contains PLIC Interrupt
Source Priority. Total 1024 Interrupt Source Priority
+in this memory block. Interrupt Source Priority #0 is reserved which
indicates it does not exist.
+|===
+
+*PLIC Interrupt Source Priority Memory Map* +
+
+       0x000000: Reserved (interrupt source 0 does not exist)
+       0x000004: Interrupt source 1 priority
+       0x000008: Interrupt source 2 priority
+       ...
+       0x000FFC: Interrupt source 1023 priority
+
+== Interrupt Pending Bits
+
+The current status of the interrupt source pending bits in the PLIC
core can be
+read from the pending array, organized as 32-bit register.  The
pending bit
+for interrupt ID N is stored in bit (N mod 32) of word (N/32).  Bit 0
+of word 0, which represents the non-existent interrupt source 0, is
hardwired
+to zero.
+
+A pending bit in the PLIC core can be cleared by setting the
associated enable
+bit then performing a claim. +
+ +
+The base address of Interrupt Pending Bits block within PLIC Memory
Map region is fixed at 0x001000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in
Byte*| *Description*
+|Interrupt Pending Bits
+|Interrupt Pending Bit of Interrupt Source #0 to #N
+|1024 / 8 = 128(0x80) bytes
+|This is a continuously memory block contains PLIC Interrupt Pending
Bits. Each Interrupt Pending Bit occupies 1-bit from this register
block.
+|===
+
+*PLIC Interrupt Pending Bits Memory Map* +
+
+       0x001000: Interrupt Source #0 to #31 Pending Bits
+       ...
+       0x00107C: Interrupt Source #992 to #1023 Pending Bits
+
+
+== Interrupt Enables
+
+Each global interrupt can be enabled by setting the corresponding bit
in the
+`enables` register. The `enables` registers are accessed as a
contiguous array
+of 32-bit registers, packed the same way as the `pending` bits. Bit 0
of enable
+register 0 represents the non-existent interrupt ID 0 and is hardwired
to 0.
+PLIC has 15872 Interrupt Enable blocks for the contexts. The `context`
is referred
+to the specific privilege mode in the specific Hart of specific RISC-V
processor
+instance. How PLIC organizes interrupts for the contexts (Hart and
privilege mode)
+is out of RISC-V PLIC specification scope, however it must be spec-out
in vendor's
+PLIC specification. +
+ +
+The base address of Interrupt Enable Bits block within PLIC Memory Map
region is fixed at 0x002000. +
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in
Byte*| *Description*
+|Interrupt Enable Bits
+|Interrupt Enable Bit of Interrupt Source #0 to #1023 for 15872
contexts
+|(1024 / 8) * 15872 = 2031616(0x1f0000) bytes
+|This is a continuously memory block contains PLIC Interrupt Enable
Bits of 15872 contexts.
+Each Interrupt Enable Bit occupies 1-bit from this register block and
total 15872 Interrupt
+Enable Bit blocks
+|===
+
+*PLIC Interrupt Enable Bits Memory Map* +
+
+       0x002000: Interrupt Source #0 to #31 Enable Bits on context 0
+       ...
+       0x00207F: Interrupt Source #992 to #1023 Enable Bits on context
0
+       0x002080: Interrupt Source #0 to #31 Enable Bits on context 1
+       ...
+       0x0020FF: Interrupt Source #992 to #1023 Enable Bits on context
1
+       0x002100: Interrupt Source #0 to #31 Enable Bits on context 2
+       ...
+       0x00217F: Interrupt Source #992 to #1023 Enable Bits on context
2
+       0x002180: Interrupt Source #0 to #31 Enable Bits on context 3
+       ...
+       0x0021FF: Interrupt Source #992 to #1023 Enable Bits on context
3
+       ...
+       ...
+       ...
+       0x1F1F80: Interrupt Source #0 to #31 on context 15871   
+       ...     
+       0x1F1F80: Interrupt Source #992 to #1023 on context 15871
+       
+== Priority Thresholds
+
+PLIC provides context based `threshold register` for the settings of a
interrupt priority
+threshold of each context. The `threshold register` is a WARL field.
The PLIC will mask all
+PLIC interrupts of a priority less than or equal to `threshold`.  For
example,
+a`threshold` value of zero permits all interrupts with non-zero
priority. +
+ +
+The base address of Priority Thresholds register block is located at
4K alignement starts
+from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in
Byte*| *Description*
+|Priority Threshold
+|Priority Threshold for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register of Priority Thresholds setting for each conetxt
+|===
+
+*PLIC Interrupt Priority Thresholds Memory Map* +
+
+       0x200000: Priority threshold for context 0
+       0x201000: Priority threshold for context 1
+       0x202000: Priority threshold for context 2
+       0x203000: Priority threshold for context 3
+       ...
+       ...
+       ...
+       0x3FFF000: Priority threshold for context 15871
+       
+== Interrupt Claim Process
+
+The PLIC can perform an interrupt claim by reading the
`claim/complete`
+register, which returns the ID of the highest priority pending
interrupt or
+zero if there is no pending interrupt.  A successful claim will also
atomically
+clear the corresponding pending bit on the interrupt source. +
+The PLIC can perform a claim at any time and the claim operation is
not affected
+by the setting of the priority threshold register. +
+The Interrupt Claim Process register is context based and is located
at
+(4K alignement + 4) starts from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in
Byte*| *Description*
+|Interrupt Claim Register
+|Interrupt Claim Process for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register used to acquire interrupt ID for each conetxt
+|===
+
+*PLIC Interrupt Claim Process Memory Map* +
+
+       0x200004: Interrupt Claim Process for context 0
+       0x201004: Interrupt Claim Process for context 1
+       0x202004: Interrupt Claim Process for context 2
+       0x203004: Interrupt Claim Process for context 3
+       ...
+       ...
+       ...
+       0x3FFF004: Interrupt Claim Process for context 15871
+       
+## Interrupt Completion
+
+The PLIC signals it has completed executing an interrupt handler by
writing the
+interrupt ID it received from the claim to the `claim/complete`
register.  The
+PLIC does not check whether the completion ID is the same as the last
claim ID
+for that target.  If the completion ID does not match an interrupt
source that
+is currently enabled for the target, the completion is silently
ignored. +
+The Interrupt Completion registers are context based and located at
the same address
+with Interrupt Claim Process register, which is at (4K alignement + 4)
starts from
+offset 0x200000.
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Registers*|*Register Block Size in
Byte*| *Description*
+|Interrupt Completion Register
+|Interrupt Completion  for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is register to write to complete Interrupt process
+|===
+
+*PLIC Interrupt Completion Memory Map* +
+
+       0x200004: Interrupt Completion for context 0
+       0x201004: Interrupt Completion for context 1
+       0x202004: Interrupt Completion for context 2
+       0x203004: Interrupt Completion for context 3
+       ...
+       ...
+       ...
+       0x3FFF004: Interrupt Completion for context 15871
+


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Wed, Jun 23, 2021 at 8:25 PM Abner Chang <renba.chang@...> wrote:
Please review the below sentence. 
If the RAS event is configured as the firmware first model, the platform should be able to trigger the higest priority of M-mode interrupt to all HARTs in the physical RV processor. This prevents the subsequent RAS errors are propagated by other HARTs that access the problematic hardware (PCIe, Memory, I/O and etc.)

Note that the priority of any RAS interrupts would be software configurable in the interrupt controller.  Also note that there are other common techniques for preventing the propagation of errors and for isolating the impact of errors (e.g. precise hart exceptions on attempted use of corrupted data, data poisoning, I/O flow termination, ...).
 
One question:
Besides those RAS events come from the interrupt controller,

In a typical enterprise-class RAS architecture, "error events" are logged in RAS registers, which then optionally generate RAS interrupt requests.  These then go to the system interrupt controller, which prioritizes and routes requests to appropriate harts.  
 
how about the HART or Memory RAS events?

One would typically have RAS registers (for logging and reporting errors) spread around the system, ideally at all points in the system where errors can be detected and at all points where corrupted data can be consumed.  
 
Are those RAS events in the scope of exception? or they would be also routed to  interrupt controller?

RAS errors generally result in RAS interrupts, but when a hart tries to consume corrupted data, the ideal RAS behavior is for the hart to take a precise exception on the load instruction that is trying to consume corrupted data.
 
Or we don't have to worry about this, RAS TG will have the solution?

All this would be covered by a proper RAS architecture (to hopefully be developed by a TG next year).

Greg 


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Wed, Jun 23, 2021 at 7:17 PM Abner Chang <renba.chang@...> wrote:
Do we need to define what is the RAS error signals output to the interrupt controller? (The signal could be classified by the error severities such as CE, UC_FATAL, UC_NONFATAL or classified by the RAS error categories such as RAS_MEM_ERROR, RAS_IO_ERROR and etc.)

This just starts down the path of defining a small bit of a RAS architecture - which we shouldn't do without developing a full RAS architecture is developed (next year).
 
I think we can just leave it to RAS TG because we just define what server platform needs on RAS, right?

Exactly.
 
Without the hardware signal to trigger TEE. The alternative would be triggering the M-mode exception and jump to TEE in the M-mode exception handler?
So the scenario of triggering TEE would be,
For software management mode interface:
     S-mode-> sbi ecall to M-mode->TEE jump vector->TEE

Effectively the same as with ARM.
 
For the hardware management mode interface:
Hardware interrupt -> M-mode handler-> TEE jump vector->TEE
What firmware or software resides in TEE is implementation-specific. For example on edk2, we will load the management mode core into TEE.
I am just trying to get more understanding of the future design of TEE on RV.

I think the tech-tee TG has done some pieces of things around TEE, but I'm not sure what (and certainly there isn't anything heading to ratification this year).

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Abner Chang
 



Kumar Sankaran <ksankaran@...> 於 2021年6月24日 週四 上午5:11寫道:
On Wed, Jun 23, 2021 at 9:00 AM Greg Favor <gfavor@...> wrote:
>
> On Wed, Jun 23, 2021 at 7:59 AM Abner Chang <renba.chang@...> wrote:
>>>
>>> Yes.  Which is just a software matter of configuring the interrupt controller accordingly.
>>
>> Does this mean the interrupt controller would integrate all RAS events (HART, PCI, I/O, memory and etc.)?
>> Or there would be a separate hardware box that manages all RAS error events, and maybe some error signals output from that box and connected to the interrupt controller? The interrupt controller just provides the mechanism to morph those error signals to FFM or OSF interrupt?
>
>
> To the extent that "RAS interrupts" are literally that, i.e. interrupt request signals, then they go to the system interrupt controller just like all other interrupt request signals.  (Some system designs might also have a "platform microcontroller" that has its own local interrupt controller and may receive some of these interrupt request signals.)
>
> Maybe part of what you're trying to get at is that RAS error events in many architectures get logged in and reported from hardware RAS registers.  RAS registers "report" errors by outputting RAS interrupt request signals.  Software then comes back around and reads the RAS registers to gather info about logged errors.
>
>>>>
>>>> Can we summarize the requirement to
>>>>
>>>> - RAS errors should be capable of interrupting TEE.
>>
>> This is ok for now because there is no hardware signal defined for triggering TEE right? I have more comments on this below.
>
>
> I expect RV will have similarities to ARM in this matter - and ARM doesn't have a hardware signal defined for triggering TEE either (and hasn't felt the need to define such).
>
>>>
>>>
>>> This implies a requirement to have a TEE - and defining what constitutes a compliant TEE in the platform spec.  Btw, what distinguishes the TEE from "firmware"?
>>
>> Please correct me on ARM part if I am wrong.
>> The equivalent mechanism to TEE is SMM on X86 and TZ on ARM. I don't quite understand how ARM TZ works, however on X86 system, all cores are brought to SMM environment when SMI is triggered. ARM has the equivalent event which is SMC, right?
>
>
> Neither ARM nor RISC-V has a direct equivalent of SMM.  So I'll pick on what ARM has - which is rather like RV.  At a hardware level ARM has EL3 and Secure ELx, and RV as M-mode and secure partitions of S/U-mode (using PMP).  At a software level one has a Secure monitor running in EL3/M-mode and tbd whether other parts run in SELx/partitions.  TZ as a TEE is a combination of these hardware features and the secure software that runs on it.  ARM TZ doesn't specify the actual software TEE, it just provides the hardware architectural features and framework for creating and running a TEE.  There is no one standard ARM TEE (although ARM has developed their ATF as a reference secure boot flow; although maybe it has expanded in scope in recent years?).
>
> In short, RV first needs to define, develop, and specify a software TEE.  The hardware components are falling into place (e.g. PMP, ePMP, Zkr), and OpenSBI is working towards supporting secure partitions.  So, until there is a concrete RISC-V TEE standard (or even a standard framework), we shouldn't be stating requirements tied with having a TEE.  Also keep in mind that things like secure boot will be required in the Server extension - which is part of the overall topic of TEE.
>
>>
>> The above is called management mode (MM) which is defined in UEFI PI spec. MM has the highest privilege than CR0 on X86 and EL3 on ARM. The MM is OS agnostic and the MM event halts any processes and gets the core into management mode to run the firmware code. The environment of MM (data and code) can only be accessed when the core is in MM. Firmware always uses this for the secure stuff, power management, and of course the RAS.
>
>
> What you describe, for RV, is M-mode - a pretty direct analog of ARM EL3.
>
>>
>>
>> I would like to add one more thing to the RAS requirement but I don't know how to describe it properly because seems we don't have the MM event on RISC-V such as SMI and SMC which can bring the system to MM.
>
>
> RV has ECALL, just like ARM has SMC.
>
>>
>> So there are two scenarios for RAS on the firmware first model.
>> - If the platform doesn't have TEE and the hardware event to trigger TEE:
>>   If the RAS event is configured to firmware first mode, the platform should be able to trigger M-Mode exception to all harts in the physical processor. This prevents the subsequent RAS error propagated by other harts that access the problematic hardware (PCI, memory and etc.)
>>
>> - If the platform has TEE and the hardware event to trigger TEE:
>>     If the RAS event is configured to firmware first mode, the platform should be able to trigger TEE event to all harts in the physical processor and bring all harts into TEE. This prevents the subsequent RAS error propagated by other cores which access the problematic hardware (PCI, memory and etc.)
>
>
> I think part of what complicates this discussion is the nebulous nature of what exactly is the "TEE" in any given architecture.  At a hardware level x86/ARM/RV have SMM/EL3/M-mode and they have ways to "call" into that secure environment.  The software TEE architecture is what is rather nebulous.  There isn't a standard software TEE architecture for x86; RV doesn't have something (yet), and ARM has just ATF (which one may or may not fully equate to being a "TEE").
>
> Greg
>

Given where we are currently with the lack of a proper definition for
TEE, I suggest we simply remove the requirement for TEE for now and
add it later when the TEE spec is finalized.
Suggest we remove the line "RAS errors should be capable of
interrupting TEE" and leave it at that.
I agree with this Kumar.

Please review the below sentence. 
If the RAS event is configured as the firmware first model, the platform should be able to trigger the higest priority of M-mode interrupt to all HARTs in the physical RV processor. This prevents the subsequent RAS errors are propagated by other HARTs that access the problematic hardware (PCIe, Memory, I/O and etc.)

One question:
Besides those RAS events come from the interrupt controller, how about the HART or Memory RAS events? Are those RAS events in the scope of exception? or they would be also routed to  interrupt controller? Or we don't have to worry about this, RAS TG will have the solution?

Abner
 

--
Regards
Kumar


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Abner Chang
 



Greg Favor <gfavor@...> 於 2021年6月24日 週四 上午12:00寫道:
On Wed, Jun 23, 2021 at 7:59 AM Abner Chang <renba.chang@...> wrote:
Yes.  Which is just a software matter of configuring the interrupt controller accordingly.
Does this mean the interrupt controller would integrate all RAS events (HART, PCI, I/O, memory and etc.)? 
Or there would be a separate hardware box that manages all RAS error events, and maybe some error signals output from that box and connected to the interrupt controller? The interrupt controller just provides the mechanism to morph those error signals to FFM or OSF interrupt?

To the extent that "RAS interrupts" are literally that, i.e. interrupt request signals, then they go to the system interrupt controller just like all other interrupt request signals.  (Some system designs might also have a "platform microcontroller" that has its own local interrupt controller and may receive some of these interrupt request signals.)

Maybe part of what you're trying to get at is that RAS error events in many architectures get logged in and reported from hardware RAS registers.  RAS registers "report" errors by outputting RAS interrupt request signals.  Software then comes back around and reads the RAS registers to gather info about logged errors.
Yes, something likes that.

Do we need to define what is the RAS error signals output to the interrupt controller? (The signal could be classified by the error severities such as CE, UC_FATAL, UC_NONFATAL or classified by the RAS error categories such as RAS_MEM_ERROR, RAS_IO_ERROR and etc.)
I think we can just leave it to RAS TG because we just define what server platform needs on RAS, right?
  
Can we summarize the requirement to
- RAS errors should be capable of interrupting TEE.
This is ok for now because there is no hardware signal defined for triggering TEE right? I have more comments on this below. 

I expect RV will have similarities to ARM in this matter - and ARM doesn't have a hardware signal defined for triggering TEE either (and hasn't felt the need to define such).
Ok,  I thought there is a similar hardware signal.

Without the hardware signal to trigger TEE. The alternative would be triggering the M-mode exception and jump to TEE in the M-mode exception handler?
So the scenario of triggering TEE would be,
For software management mode interface:
     S-mode-> sbi ecall to M-mode->TEE jump vector->TEE
For the hardware management mode interface:
Hardware interrupt -> M-mode handler-> TEE jump vector->TEE
What firmware or software resides in TEE is implementation-specific. For example on edk2, we will load the management mode core into TEE.
I am just trying to get more understanding of the future design of TEE on RV.

 

This implies a requirement to have a TEE - and defining what constitutes a compliant TEE in the platform spec.  Btw, what distinguishes the TEE from "firmware"?
Please correct me on ARM part if I am wrong.
The equivalent mechanism to TEE is SMM on X86 and TZ on ARM. I don't quite understand how ARM TZ works, however on X86 system, all cores are brought to SMM environment when SMI is triggered. ARM has the equivalent event which is SMC, right?

Neither ARM nor RISC-V has a direct equivalent of SMM.  So I'll pick on what ARM has - which is rather like RV.  At a hardware level ARM has EL3 and Secure ELx, and RV as M-mode and secure partitions of S/U-mode (using PMP).  At a software level one has a Secure monitor running in EL3/M-mode and tbd whether other parts run in SELx/partitions.  TZ as a TEE is a combination of these hardware features and the secure software that runs on it.  ARM TZ doesn't specify the actual software TEE, it just provides the hardware architectural features and framework for creating and running a TEE.  There is no one standard ARM TEE (although ARM has developed their ATF as a reference secure boot flow; although maybe it has expanded in scope in recent years?).

In short, RV first needs to define, develop, and specify a software TEE.  The hardware components are falling into place (e.g. PMP, ePMP, Zkr), and OpenSBI is working towards supporting secure partitions.  So, until there is a concrete RISC-V TEE standard (or even a standard framework), we shouldn't be stating requirements tied with having a TEE.  Also keep in mind that things like secure boot will be required in the Server extension - which is part of the overall topic of TEE.
Thanks for the above explanation. 
 
The above is called management mode (MM) which is defined in UEFI PI spec. MM has the highest privilege than CR0 on X86 and EL3 on ARM. The MM is OS agnostic and the MM event halts any processes and gets the core into management mode to run the firmware code. The environment of MM (data and code) can only be accessed when the core is in MM. Firmware always uses this for the secure stuff, power management, and of course the RAS.

What you describe, for RV, is M-mode - a pretty direct analog of ARM EL3.
 

I would like to add one more thing to the RAS requirement but I don't know how to describe it properly because seems we don't have the MM event on RISC-V such as SMI and SMC which can bring the system to MM.

RV has ECALL, just like ARM has SMC.
Thanks for the correction. I thought SMC is the hardware signal.  
 
So there are two scenarios for RAS on the firmware first model.
- If the platform doesn't have TEE and the hardware event to trigger TEE:
  If the RAS event is configured to firmware first mode, the platform should be able to trigger M-Mode exception to all harts in the physical processor. This prevents the subsequent RAS error propagated by other harts that access the problematic hardware (PCI, memory and etc.)

- If the platform has TEE and the hardware event to trigger TEE:
    If the RAS event is configured to firmware first mode, the platform should be able to trigger TEE event to all harts in the physical processor and bring all harts into TEE. This prevents the subsequent RAS error propagated by other cores which access the problematic hardware (PCI, memory and etc.) 

I think part of what complicates this discussion is the nebulous nature of what exactly is the "TEE" in any given architecture.  At a hardware level x86/ARM/RV have SMM/EL3/M-mode and they have ways to "call" into that secure environment.  The software TEE architecture is what is rather nebulous.  There isn't a standard software TEE architecture for x86; RV doesn't have something (yet), and ARM has just ATF (which one may or may not fully equate to being a "TEE").
Agreed. 

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Kumar Sankaran
 

On Wed, Jun 23, 2021 at 9:00 AM Greg Favor <gfavor@...> wrote:

On Wed, Jun 23, 2021 at 7:59 AM Abner Chang <renba.chang@...> wrote:

Yes. Which is just a software matter of configuring the interrupt controller accordingly.
Does this mean the interrupt controller would integrate all RAS events (HART, PCI, I/O, memory and etc.)?
Or there would be a separate hardware box that manages all RAS error events, and maybe some error signals output from that box and connected to the interrupt controller? The interrupt controller just provides the mechanism to morph those error signals to FFM or OSF interrupt?

To the extent that "RAS interrupts" are literally that, i.e. interrupt request signals, then they go to the system interrupt controller just like all other interrupt request signals. (Some system designs might also have a "platform microcontroller" that has its own local interrupt controller and may receive some of these interrupt request signals.)

Maybe part of what you're trying to get at is that RAS error events in many architectures get logged in and reported from hardware RAS registers. RAS registers "report" errors by outputting RAS interrupt request signals. Software then comes back around and reads the RAS registers to gather info about logged errors.


Can we summarize the requirement to

- RAS errors should be capable of interrupting TEE.
This is ok for now because there is no hardware signal defined for triggering TEE right? I have more comments on this below.

I expect RV will have similarities to ARM in this matter - and ARM doesn't have a hardware signal defined for triggering TEE either (and hasn't felt the need to define such).



This implies a requirement to have a TEE - and defining what constitutes a compliant TEE in the platform spec. Btw, what distinguishes the TEE from "firmware"?
Please correct me on ARM part if I am wrong.
The equivalent mechanism to TEE is SMM on X86 and TZ on ARM. I don't quite understand how ARM TZ works, however on X86 system, all cores are brought to SMM environment when SMI is triggered. ARM has the equivalent event which is SMC, right?

Neither ARM nor RISC-V has a direct equivalent of SMM. So I'll pick on what ARM has - which is rather like RV. At a hardware level ARM has EL3 and Secure ELx, and RV as M-mode and secure partitions of S/U-mode (using PMP). At a software level one has a Secure monitor running in EL3/M-mode and tbd whether other parts run in SELx/partitions. TZ as a TEE is a combination of these hardware features and the secure software that runs on it. ARM TZ doesn't specify the actual software TEE, it just provides the hardware architectural features and framework for creating and running a TEE. There is no one standard ARM TEE (although ARM has developed their ATF as a reference secure boot flow; although maybe it has expanded in scope in recent years?).

In short, RV first needs to define, develop, and specify a software TEE. The hardware components are falling into place (e.g. PMP, ePMP, Zkr), and OpenSBI is working towards supporting secure partitions. So, until there is a concrete RISC-V TEE standard (or even a standard framework), we shouldn't be stating requirements tied with having a TEE. Also keep in mind that things like secure boot will be required in the Server extension - which is part of the overall topic of TEE.


The above is called management mode (MM) which is defined in UEFI PI spec. MM has the highest privilege than CR0 on X86 and EL3 on ARM. The MM is OS agnostic and the MM event halts any processes and gets the core into management mode to run the firmware code. The environment of MM (data and code) can only be accessed when the core is in MM. Firmware always uses this for the secure stuff, power management, and of course the RAS.

What you describe, for RV, is M-mode - a pretty direct analog of ARM EL3.



I would like to add one more thing to the RAS requirement but I don't know how to describe it properly because seems we don't have the MM event on RISC-V such as SMI and SMC which can bring the system to MM.

RV has ECALL, just like ARM has SMC.


So there are two scenarios for RAS on the firmware first model.
- If the platform doesn't have TEE and the hardware event to trigger TEE:
If the RAS event is configured to firmware first mode, the platform should be able to trigger M-Mode exception to all harts in the physical processor. This prevents the subsequent RAS error propagated by other harts that access the problematic hardware (PCI, memory and etc.)

- If the platform has TEE and the hardware event to trigger TEE:
If the RAS event is configured to firmware first mode, the platform should be able to trigger TEE event to all harts in the physical processor and bring all harts into TEE. This prevents the subsequent RAS error propagated by other cores which access the problematic hardware (PCI, memory and etc.)

I think part of what complicates this discussion is the nebulous nature of what exactly is the "TEE" in any given architecture. At a hardware level x86/ARM/RV have SMM/EL3/M-mode and they have ways to "call" into that secure environment. The software TEE architecture is what is rather nebulous. There isn't a standard software TEE architecture for x86; RV doesn't have something (yet), and ARM has just ATF (which one may or may not fully equate to being a "TEE").

Greg
Given where we are currently with the lack of a proper definition for
TEE, I suggest we simply remove the requirement for TEE for now and
add it later when the TEE spec is finalized.
Suggest we remove the line "RAS errors should be capable of
interrupting TEE" and leave it at that.

--
Regards
Kumar


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Wed, Jun 23, 2021 at 7:59 AM Abner Chang <renba.chang@...> wrote:
Yes.  Which is just a software matter of configuring the interrupt controller accordingly.
Does this mean the interrupt controller would integrate all RAS events (HART, PCI, I/O, memory and etc.)? 
Or there would be a separate hardware box that manages all RAS error events, and maybe some error signals output from that box and connected to the interrupt controller? The interrupt controller just provides the mechanism to morph those error signals to FFM or OSF interrupt?

To the extent that "RAS interrupts" are literally that, i.e. interrupt request signals, then they go to the system interrupt controller just like all other interrupt request signals.  (Some system designs might also have a "platform microcontroller" that has its own local interrupt controller and may receive some of these interrupt request signals.)

Maybe part of what you're trying to get at is that RAS error events in many architectures get logged in and reported from hardware RAS registers.  RAS registers "report" errors by outputting RAS interrupt request signals.  Software then comes back around and reads the RAS registers to gather info about logged errors.
 
Can we summarize the requirement to
- RAS errors should be capable of interrupting TEE.
This is ok for now because there is no hardware signal defined for triggering TEE right? I have more comments on this below. 

I expect RV will have similarities to ARM in this matter - and ARM doesn't have a hardware signal defined for triggering TEE either (and hasn't felt the need to define such).
 

This implies a requirement to have a TEE - and defining what constitutes a compliant TEE in the platform spec.  Btw, what distinguishes the TEE from "firmware"?
Please correct me on ARM part if I am wrong.
The equivalent mechanism to TEE is SMM on X86 and TZ on ARM. I don't quite understand how ARM TZ works, however on X86 system, all cores are brought to SMM environment when SMI is triggered. ARM has the equivalent event which is SMC, right?

Neither ARM nor RISC-V has a direct equivalent of SMM.  So I'll pick on what ARM has - which is rather like RV.  At a hardware level ARM has EL3 and Secure ELx, and RV as M-mode and secure partitions of S/U-mode (using PMP).  At a software level one has a Secure monitor running in EL3/M-mode and tbd whether other parts run in SELx/partitions.  TZ as a TEE is a combination of these hardware features and the secure software that runs on it.  ARM TZ doesn't specify the actual software TEE, it just provides the hardware architectural features and framework for creating and running a TEE.  There is no one standard ARM TEE (although ARM has developed their ATF as a reference secure boot flow; although maybe it has expanded in scope in recent years?).

In short, RV first needs to define, develop, and specify a software TEE.  The hardware components are falling into place (e.g. PMP, ePMP, Zkr), and OpenSBI is working towards supporting secure partitions.  So, until there is a concrete RISC-V TEE standard (or even a standard framework), we shouldn't be stating requirements tied with having a TEE.  Also keep in mind that things like secure boot will be required in the Server extension - which is part of the overall topic of TEE.
 
The above is called management mode (MM) which is defined in UEFI PI spec. MM has the highest privilege than CR0 on X86 and EL3 on ARM. The MM is OS agnostic and the MM event halts any processes and gets the core into management mode to run the firmware code. The environment of MM (data and code) can only be accessed when the core is in MM. Firmware always uses this for the secure stuff, power management, and of course the RAS.

What you describe, for RV, is M-mode - a pretty direct analog of ARM EL3.
 

I would like to add one more thing to the RAS requirement but I don't know how to describe it properly because seems we don't have the MM event on RISC-V such as SMI and SMC which can bring the system to MM.

RV has ECALL, just like ARM has SMC.
 
So there are two scenarios for RAS on the firmware first model.
- If the platform doesn't have TEE and the hardware event to trigger TEE:
  If the RAS event is configured to firmware first mode, the platform should be able to trigger M-Mode exception to all harts in the physical processor. This prevents the subsequent RAS error propagated by other harts that access the problematic hardware (PCI, memory and etc.)

- If the platform has TEE and the hardware event to trigger TEE:
    If the RAS event is configured to firmware first mode, the platform should be able to trigger TEE event to all harts in the physical processor and bring all harts into TEE. This prevents the subsequent RAS error propagated by other cores which access the problematic hardware (PCI, memory and etc.) 

I think part of what complicates this discussion is the nebulous nature of what exactly is the "TEE" in any given architecture.  At a hardware level x86/ARM/RV have SMM/EL3/M-mode and they have ways to "call" into that secure environment.  The software TEE architecture is what is rather nebulous.  There isn't a standard software TEE architecture for x86; RV doesn't have something (yet), and ARM has just ATF (which one may or may not fully equate to being a "TEE").

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Abner Chang
 



Greg Favor <gfavor@...> 於 2021年6月23日 週三 上午9:51寫道:
On Tue, Jun 22, 2021 at 5:34 PM Kumar Sankaran <ksankaran@...> wrote:
I think the primary requirements here are the following:
- The platform should provide the capability to configure each RAS
error to trigger firmware-first or OS-first error interrupt.
Agreed. 

Yes.  Which is just a software matter of configuring the interrupt controller accordingly.
Does this mean the interrupt controller would integrate all RAS events (HART, PCI, I/O, memory and etc.)? 
Or there would be a separate hardware box that manages all RAS error events, and maybe some error signals output from that box and connected to the interrupt controller? The interrupt controller just provides the mechanism to morph those error signals to FFM or OSF interrupt?
 
- If the RAS error is handled by firmware, the firmware should be able
to choose to expose the error to S/HS mode for further processes or
just hide the error from S/HS software.
Is there a need to provide all the other details?

Agreed.  The details and mechanics don't need to be discussed (unless they are mandating specific mechanics - which I don't believe is the case). 
Agreed. 

> Yes, to mask the RAS error interrupt or even not to create the log (in RAS status registers or CSR) that OEM doesn't consider that is a useful or important error to product.

This is fine

Maybe just say that "Logging and/or reporting of errors can be masked".
Agreed.

 
Can we summarize the requirement to
- RAS errors should be capable of interrupting TEE.
This is ok for now because there is no hardware signal defined for triggering TEE right? I have more comments on this below. 

This implies a requirement to have a TEE - and defining what constitutes a compliant TEE in the platform spec.  Btw, what distinguishes the TEE from "firmware"?
Please correct me on ARM part if I am wrong.
The equivalent mechanism to TEE is SMM on X86 and TZ on ARM. I don't quite understand how ARM TZ works, however on X86 system, all cores are brought to SMM environment when SMI is triggered. ARM has the equivalent event which is SMC, right?
The above is called management mode (MM) which is defined in UEFI PI spec. MM has the highest privilege than CR0 on X86 and EL3 on ARM. The MM is OS agnostic and the MM event halts any processes and gets the core into management mode to run the firmware code. The environment of MM (data and code) can only be accessed when the core is in MM. Firmware always uses this for the secure stuff, power management, and of course the RAS.

I would like to add one more thing to the RAS requirement but I don't know how to describe it properly because seems we don't have the MM event on RISC-V such as SMI and SMC which can bring the system to MM. So there are two scenarios for RAS on the firmware first model.
- If the platform doesn't have TEE and the hardware event to trigger TEE:
  If the RAS event is configured to firmware first mode, the platform should be able to trigger M-Mode exception to all harts in the physical processor. This prevents the subsequent RAS error propagated by other harts that access the problematic hardware (PCI, memory and etc.)

- If the platform has TEE and the hardware event to trigger TEE:
    If the RAS event is configured to firmware first mode, the platform should be able to trigger TEE event to all harts in the physical processor and bring all harts into TEE. This prevents the subsequent RAS error propagated by other cores which access the problematic hardware (PCI, memory and etc.) 

 
 
The PCIe AER errors have been handled OS first on X86 systems. If I
recall correct, ARM64 initially made PCIe AER errors firmware first
and then later changed to OS first to be compliant with what's already
out there.
The exact manner of handling these PCIe AER errors is also OEM
dependent. Some OEMs will handle it OS first while making a call to
the firmware to take additional corrective action of notifying the BMC
and such. Some ARM64 implementations handle this firmware first and
notify the BMC and then notify the OS.
From a RISC-V platforms requirements perspective, my suggestion is we
simply mention the capability of all errors to have support for
firmware first and OS first and leave it at that.

Agreed all around.
Agreed.

Abner 

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Tue, Jun 22, 2021 at 5:34 PM Kumar Sankaran <ksankaran@...> wrote:
I think the primary requirements here are the following:
- The platform should provide the capability to configure each RAS
error to trigger firmware-first or OS-first error interrupt.

Yes.  Which is just a software matter of configuring the interrupt controller accordingly.
 
- If the RAS error is handled by firmware, the firmware should be able
to choose to expose the error to S/HS mode for further processes or
just hide the error from S/HS software.
Is there a need to provide all the other details?

Agreed.  The details and mechanics don't need to be discussed (unless they are mandating specific mechanics - which I don't believe is the case). 

> Yes, to mask the RAS error interrupt or even not to create the log (in RAS status registers or CSR) that OEM doesn't consider that is a useful or important error to product.

This is fine

Maybe just say that "Logging and/or reporting of errors can be masked".
 
Can we summarize the requirement to
- RAS errors should be capable of interrupting TEE.

This implies a requirement to have a TEE - and defining what constitutes a compliant TEE in the platform spec.  Btw, what distinguishes the TEE from "firmware"?
 
The PCIe AER errors have been handled OS first on X86 systems. If I
recall correct, ARM64 initially made PCIe AER errors firmware first
and then later changed to OS first to be compliant with what's already
out there.
The exact manner of handling these PCIe AER errors is also OEM
dependent. Some OEMs will handle it OS first while making a call to
the firmware to take additional corrective action of notifying the BMC
and such. Some ARM64 implementations handle this firmware first and
notify the BMC and then notify the OS.
From a RISC-V platforms requirements perspective, my suggestion is we
simply mention the capability of all errors to have support for
firmware first and OS first and leave it at that.

Agreed all around.

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Kumar Sankaran
 

Greg - Do you have any further comments/responses to Abner's comments below?
Abner - my comments inline below.

On Fri, Jun 18, 2021 at 9:01 AM Abner Chang <renba.chang@...> wrote:



Greg Favor <gfavor@...> 於 2021年6月18日 週五 上午2:03寫道:

On Thu, Jun 17, 2021 at 8:56 AM Abner Chang <renba.chang@...> wrote:

- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.

I would have thought that this is just a software issue. What kind of hardware mechanism do you picture being needed?
That could be,
- If RAS error triggers M-mode (FFM) and firmware decides to expose the error to OS (could be configured through CSR or RAS registers), then the RAS OS interrupt can be triggered when the system exits M-mode.
- or If RAS error triggers Management mode in TEE, then the RAS OS interrupt to can be triggered when the system exits TEE.
The knob of exposing RAS errors to OS could go with each RAS error configuration register or just one centralized RAS register or CSR for all RAS errors.
Suppose the event to bring the system to TEE has the most priority even the system is executing in M-Mode. This makes sure firmware can address the RAS error immediately when it happens in any privilege.
I think the primary requirements here are the following:
- The platform should provide the capability to configure each RAS
error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able
to choose to expose the error to S/HS mode for further processes or
just hide the error from S/HS software.
Is there a need to provide all the other details?




- Each RAS error should be able to mask through RAS configuration registers.

By "mask" do you mean masking of generation of an error interrupt?
Yes, to mask the RAS error interrupt or even not to create the log (in RAS status registers or CSR) that OEM doesn't consider that is a useful or important error to product.

This is fine


- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

Wouldn't the TEE be running in M-mode? Or where is it expected to be running?
yes,TEE is be running in M-mode if the memory serves me right from the spec. My expectation of TEE is there would be an event that can be triggered by either hardware or software to bring the system to TEE no matter which mode the HART is currently running, I am not sure if this is how TEE would be implemented.

Can we summarize the requirement to
- RAS errors should be capable of interrupting TEE.


For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.

In x86 and ARM platforms, doesn't the OS pretty much always handle PCIe AER errors (i.e. OS-first for this class of errors)? (I was reading an Intel overview doc recently that essentially said that - irrespective of whether other classes of errors are OS-first or firmware-first).)
Besides correcting the error in firmware, firmware also logs the necessary PCIe error events to BMC before OS handling that. The firmware RAS logs are retrieved in out-of-band even the system is shut down or the OS crashes. This increases the diagnosability and decreases the cost of customer service in the field.

Abner
The PCIe AER errors have been handled OS first on X86 systems. If I
recall correct, ARM64 initially made PCIe AER errors firmware first
and then later changed to OS first to be compliant with what's already
out there.
The exact manner of handling these PCIe AER errors is also OEM
dependent. Some OEMs will handle it OS first while making a call to
the firmware to take additional corrective action of notifying the BMC
and such. Some ARM64 implementations handle this firmware first and
notify the BMC and then notify the OS.
From a RISC-V platforms requirements perspective, my suggestion is we
simply mention the capability of all errors to have support for
firmware first and OS first and leave it at that.



Besides memory and PCIe RAS, do we have RAS errors for the processor/HART? such as IPI error or some CE/UC/UCR to HART locally?

Definitely there will be processor/hart errors. Presumably each hart would output one or more RAS interrupt request signals.

Greg
Yes, there will be more RAS errors. For the initial spec, we are only
making the bare minimal set of RAS features mandatory for the server
extension for 2022. We can add more RAS features as things solidify.

--
Regards
Kumar


Re: [PATCH 1/1] Initial commit of PLIC

guoren@...
 

On Sun, Jun 20, 2021 at 9:36 PM Abner Chang <renba.chang@...> wrote:

From: Abner Chang <abner.chang@...>

This is the commit for creating the patches for
widely review in Platform Spec HSC task group

Signed-off-by: Abner Chang <abner.chang@...>
---
riscv-plic.adoc | 306 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 306 insertions(+)
create mode 100644 riscv-plic.adoc

diff --git a/riscv-plic.adoc b/riscv-plic.adoc
new file mode 100644
index 0000000..b770e0e
--- /dev/null
+++ b/riscv-plic.adoc
@@ -0,0 +1,306 @@
+= *RISC-V Platform-Level Interrupt Controller Specification*
+
+== Copyright and license information
+
+This RISC-V PLIC specification is
+
+[%hardbreaks]
+(C) 2017 Drew Barbier <drew@...>
+(C) 2018-2019 Palmer Dabbelt <palmer@...>
+(C) 2019 Abner Chang, Hewlett Packard Enterprise <abner.chang@...>
+
+It is licensed under the Creative Commons Attribution 4.0 International
+License (CC-BY 4.0). The full license text is available at
+https://creativecommons.org/licenses/by/4.0/.
+
+== Introduction
+
+This document contains the RISC-V platform-level interrupt controller (PLIC)
+specification, which defines an interrupt controller specifically designed to
+work in the context of RISC-V systems. The PLIC multiplexes various device
+interrupts onto the external interrupt lines of Hart contexts, with
+hardware support for interrupt priorities. +
+This specification defines the general PLIC architecture and operation parameters.
+The PLIC which claimed as PLIC-Compliant standard PLIC should follow the
+implementations mentioned in sections below.
+
+.Figure 1 RISC-V PLIC Interrupt Architecture Block Diagram
+image::Images/PLIC.jpg[GitHub,1000,643, link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLIC.jpg]
+
+== RISC-V PLIC Operation Parameters
+
+General PLIC operation parameter register blocks are defined in this spec, those are: +
+
+- *Interrupt Priorities registers:* +
+ The interrupt priority for each interrupt source. +
+
+- *Interrupt Pending Bits registers:* +
+ The interrupt pending status of each interrupt source. +
+
+- *Interrupt Enables registers:* +
+ The enablement of interrupt source of each context. +
+
+- *Priority Thresholds registers:* +
+ The interrupt priority threshold of each context. +
+
+- *Interrupt Claim registers:* +
+ The register to acquire interrupt source ID of each context. +
+
+- *Interrupt Completion registers:* +
+ The register to send interrupt completion message to the associated gateway. +
+
++
+
+Below is the figure of PLIC Operation Parameter Block Diagram,
+
+.Figure 2 PLIC Operation Parameter Block Diagram
+image::Images/PLICArch.jpg[GitHub, link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLICArch.jpg]
+
+== Memory Map
+
+The `base address of PLIC Memory Map` is platform implementation-specific.
+
+*PLIC Memory Map*
+
+ base + 0x000000: Reserved (interrupt source 0 does not exist)
+ base + 0x000004: Interrupt source 1 priority
+ base + 0x000008: Interrupt source 2 priority
+ ...
+ base + 0x000FFC: Interrupt source 1023 priority
+ base + 0x001000: Interrupt Pending bit 0-31
+ base + 0x00107C: Interrupt Pending bit 992-1023
+ ...
+ base + 0x002000: Enable bits for sources 0-31 on context 0
+ base + 0x002004: Enable bits for sources 32-63 on context 0
+ ...
+ base + 0x00207F: Enable bits for sources 992-1023 on context 0
+ base + 0x002080: Enable bits for sources 0-31 on context 1
+ base + 0x002084: Enable bits for sources 32-63 on context 1
+ ...
+ base + 0x0020FF: Enable bits for sources 992-1023 on context 1
+ base + 0x002100: Enable bits for sources 0-31 on context 2
+ base + 0x002104: Enable bits for sources 32-63 on context 2
+ ...
+ base + 0x00217F: Enable bits for sources 992-1023 on context 2
+ ...
+ base + 0x1F1F80: Enable bits for sources 0-31 on context 15871
+ base + 0x1F1F84: Enable bits for sources 32-63 on context 15871
+ base + 0x1F1FFF: Enable bits for sources 992-1023 on context 15871
+ ...
+ base + 0x1FFFFC: Reserved
+ base + 0x200000: Priority threshold for context 0
+ base + 0x200004: Claim/complete for context 0
+ base + 0x200008: Reserved
+ ...
+ base + 0x200FFC: Reserved
+ base + 0x201000: Priority threshold for context 1
+ base + 0x201004: Claim/complete for context 1
+ ...
+ base + 0x3FFE000: Priority threshold for context 15871
+ base + 0x3FFE004: Claim/complete for context 15871
+ base + 0x3FFE008: Reserved
+ ...
+ base + 0x3FFFFFC: Reserved
+
+Sections below describe the control register blocks of PLIC operation parameters.
+
+== Register Width
+
+The memory map register width is in 32-bit.
+
+== Interrupt Priorities
+
+If PLIC supports Interrupt Priorities, then each PLIC interrupt source can be assigned a priority by writing to its 32-bit
+memory-mapped `priority` register. A priority value of 0 is reserved to mean ''never interrupt'' and effectively
+disables the interrupt. Priority 1 is the lowest active priority while the maximum level of priority depends on
+PLIC implementation. Ties between global interrupts of the same priority are broken by the Interrupt ID; interrupts
+with the lowest ID have the highest
+effective priority. +
+ +
+The base address of Interrupt Source Priority block within PLIC Memory Map region is fixed at 0x000000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name*| *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Source Priority
+|Interrupt Source Priority #0 to #1023
+|1024 * 4 = 4096(0x1000) bytes
+|This is a continuously memory block which contains PLIC Interrupt Source Priority. Total 1024 Interrupt Source Priority
+in this memory block. Interrupt Source Priority #0 is reserved which indicates it does not exist.
+|===
+
+*PLIC Interrupt Source Priority Memory Map* +
+
+ 0x000000: Reserved (interrupt source 0 does not exist)
+ 0x000004: Interrupt source 1 priority
+ 0x000008: Interrupt source 2 priority
+ ...
+ 0x000FFC: Interrupt source 1023 priority
+
+== Interrupt Pending Bits
+
+The current status of the interrupt source pending bits in the PLIC core can be
+read from the pending array, organized as 32-bit register. The pending bit
+for interrupt ID N is stored in bit (N mod 32) of word (N/32). Bit 0
+of word 0, which represents the non-existent interrupt source 0, is hardwired
+to zero.
+
+A pending bit in the PLIC core can be cleared by setting the associated enable
+bit then performing a claim. +
I suggest adding PENDING SET and PENDING CLR registers to implement a
soft plic irq mechanism, here.

PENDING SET: only '1' bits of the value would be set into reg and '0'
bits of the value would be ignored.
PENDING CLR: only '0' bits of the value would be set into reg and '1'
bits of the value would be ignored.

How?

+ +
+The base address of Interrupt Pending Bits block within PLIC Memory Map region is fixed at 0x001000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Pending Bits
+|Interrupt Pending Bit of Interrupt Source #0 to #N
+|1024 / 8 = 128(0x80) bytes
+|This is a continuously memory block contains PLIC Interrupt Pending Bits. Each Interrupt Pending Bit occupies 1-bit from this register block.
+|===
+
+*PLIC Interrupt Pending Bits Memory Map* +
+
+ 0x001000: Interrupt Source #0 to #31 Pending Bits
+ ...
+ 0x00107C: Interrupt Source #992 to #1023 Pending Bits
+
+
+== Interrupt Enables
+
+Each global interrupt can be enabled by setting the corresponding bit in the
+`enables` register. The `enables` registers are accessed as a contiguous array
+of 32-bit registers, packed the same way as the `pending` bits. Bit 0 of enable
+register 0 represents the non-existent interrupt ID 0 and is hardwired to 0.
+PLIC has 15872 Interrupt Enable blocks for the contexts. The `context` is referred
+to the specific privilege mode in the specific Hart of specific RISC-V processor
+instance. How PLIC organizes interrupts for the contexts (Hart and privilege mode)
+is out of RISC-V PLIC specification scope, however it must be spec-out in vendor's
+PLIC specification. +
+ +
+The base address of Interrupt Enable Bits block within PLIC Memory Map region is fixed at 0x002000. +
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Enable Bits
+|Interrupt Enable Bit of Interrupt Source #0 to #1023 for 15872 contexts
+|(1024 / 8) * 15872 = 2031616(0x1f0000) bytes
+|This is a continuously memory block contains PLIC Interrupt Enable Bits of 15872 contexts.
+Each Interrupt Enable Bit occupies 1-bit from this register block and total 15872 Interrupt
+Enable Bit blocks
+|===
+
+*PLIC Interrupt Enable Bits Memory Map* +
+
+ 0x002000: Interrupt Source #0 to #31 Enable Bits on context 0
+ ...
+ 0x00207F: Interrupt Source #992 to #1023 Enable Bits on context 0
+ 0x002080: Interrupt Source #0 to #31 Enable Bits on context 1
+ ...
+ 0x0020FF: Interrupt Source #992 to #1023 Enable Bits on context 1
+ 0x002100: Interrupt Source #0 to #31 Enable Bits on context 2
+ ...
+ 0x00217F: Interrupt Source #992 to #1023 Enable Bits on context 2
+ 0x002180: Interrupt Source #0 to #31 Enable Bits on context 3
+ ...
+ 0x0021FF: Interrupt Source #992 to #1023 Enable Bits on context 3
+ ...
+ ...
+ ...
+ 0x1F1F80: Interrupt Source #0 to #31 on context 15871
+ ...
+ 0x1F1F80: Interrupt Source #992 to #1023 on context 15871
+
+== Priority Thresholds
+
+PLIC provides context based `threshold register` for the settings of a interrupt priority
+threshold of each context. The `threshold register` is a WARL field. The PLIC will mask all
+PLIC interrupts of a priority less than or equal to `threshold`. For example,
+a`threshold` value of zero permits all interrupts with non-zero priority. +
+ +
+The base address of Priority Thresholds register block is located at 4K alignement starts
+from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Priority Threshold
+|Priority Threshold for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register of Priority Thresholds setting for each conetxt
+|===
+
+*PLIC Interrupt Priority Thresholds Memory Map* +
+
+ 0x200000: Priority threshold for context 0
+ 0x201000: Priority threshold for context 1
+ 0x202000: Priority threshold for context 2
+ 0x203000: Priority threshold for context 3
+ ...
+ ...
+ ...
+ 0x3FFF000: Priority threshold for context 15871
+
+== Interrupt Claim Process
+
+The PLIC can perform an interrupt claim by reading the `claim/complete`
+register, which returns the ID of the highest priority pending interrupt or
+zero if there is no pending interrupt. A successful claim will also atomically
+clear the corresponding pending bit on the interrupt source. +
+The PLIC can perform a claim at any time and the claim operation is not affected
+by the setting of the priority threshold register. +
+The Interrupt Claim Process register is context based and is located at
+(4K alignement + 4) starts from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Claim Register
+|Interrupt Claim Process for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register used to acquire interrupt ID for each conetxt
+|===
+
+*PLIC Interrupt Claim Process Memory Map* +
+
+ 0x200004: Interrupt Claim Process for context 0
+ 0x201004: Interrupt Claim Process for context 1
+ 0x202004: Interrupt Claim Process for context 2
+ 0x203004: Interrupt Claim Process for context 3
+ ...
+ ...
+ ...
+ 0x3FFF004: Interrupt Claim Process for context 15871
+
+## Interrupt Completion
+
+The PLIC signals it has completed executing an interrupt handler by writing the
+interrupt ID it received from the claim to the `claim/complete` register. The
+PLIC does not check whether the completion ID is the same as the last claim ID
+for that target. If the completion ID does not match an interrupt source that
+is currently enabled for the target, the completion is silently ignored. +
+The Interrupt Completion registers are context based and located at the same address
+with Interrupt Claim Process register, which is at (4K alignement + 4) starts from
+offset 0x200000.
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Registers*|*Register Block Size in Byte*| *Description*
+|Interrupt Completion Register
+|Interrupt Completion for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is register to write to complete Interrupt process
+|===
+
+*PLIC Interrupt Completion Memory Map* +
+
+ 0x200004: Interrupt Completion for context 0
+ 0x201004: Interrupt Completion for context 1
+ 0x202004: Interrupt Completion for context 2
+ 0x203004: Interrupt Completion for context 3
+ ...
+ ...
+ ...
+ 0x3FFF004: Interrupt Completion for context 15871
+
--
2.19.0.windows.1






--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/


[PATCH 1/1] Initial commit of PLIC

Abner Chang
 

From: Abner Chang <abner.chang@...>

This is the commit for creating the patches for
widely review in Platform Spec HSC task group

Signed-off-by: Abner Chang <abner.chang@...>
---
riscv-plic.adoc | 306 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 306 insertions(+)
create mode 100644 riscv-plic.adoc

diff --git a/riscv-plic.adoc b/riscv-plic.adoc
new file mode 100644
index 0000000..b770e0e
--- /dev/null
+++ b/riscv-plic.adoc
@@ -0,0 +1,306 @@
+= *RISC-V Platform-Level Interrupt Controller Specification*
+
+== Copyright and license information
+
+This RISC-V PLIC specification is
+
+[%hardbreaks]
+(C) 2017 Drew Barbier <drew@...>
+(C) 2018-2019 Palmer Dabbelt <palmer@...>
+(C) 2019 Abner Chang, Hewlett Packard Enterprise <abner.chang@...>
+
+It is licensed under the Creative Commons Attribution 4.0 International
+License (CC-BY 4.0). The full license text is available at
+https://creativecommons.org/licenses/by/4.0/.
+
+== Introduction
+
+This document contains the RISC-V platform-level interrupt controller (PLIC)
+specification, which defines an interrupt controller specifically designed to
+work in the context of RISC-V systems. The PLIC multiplexes various device
+interrupts onto the external interrupt lines of Hart contexts, with
+hardware support for interrupt priorities. +
+This specification defines the general PLIC architecture and operation parameters.
+The PLIC which claimed as PLIC-Compliant standard PLIC should follow the
+implementations mentioned in sections below.
+
+.Figure 1 RISC-V PLIC Interrupt Architecture Block Diagram
+image::Images/PLIC.jpg[GitHub,1000,643, link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLIC.jpg]
+
+== RISC-V PLIC Operation Parameters
+
+General PLIC operation parameter register blocks are defined in this spec, those are: +
+
+- *Interrupt Priorities registers:* +
+ The interrupt priority for each interrupt source. +
+
+- *Interrupt Pending Bits registers:* +
+ The interrupt pending status of each interrupt source. +
+
+- *Interrupt Enables registers:* +
+ The enablement of interrupt source of each context. +
+
+- *Priority Thresholds registers:* +
+ The interrupt priority threshold of each context. +
+
+- *Interrupt Claim registers:* +
+ The register to acquire interrupt source ID of each context. +
+
+- *Interrupt Completion registers:* +
+ The register to send interrupt completion message to the associated gateway. +
+
++
+
+Below is the figure of PLIC Operation Parameter Block Diagram,
+
+.Figure 2 PLIC Operation Parameter Block Diagram
+image::Images/PLICArch.jpg[GitHub, link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLICArch.jpg]
+
+== Memory Map
+
+The `base address of PLIC Memory Map` is platform implementation-specific.
+
+*PLIC Memory Map*
+
+ base + 0x000000: Reserved (interrupt source 0 does not exist)
+ base + 0x000004: Interrupt source 1 priority
+ base + 0x000008: Interrupt source 2 priority
+ ...
+ base + 0x000FFC: Interrupt source 1023 priority
+ base + 0x001000: Interrupt Pending bit 0-31
+ base + 0x00107C: Interrupt Pending bit 992-1023
+ ...
+ base + 0x002000: Enable bits for sources 0-31 on context 0
+ base + 0x002004: Enable bits for sources 32-63 on context 0
+ ...
+ base + 0x00207F: Enable bits for sources 992-1023 on context 0
+ base + 0x002080: Enable bits for sources 0-31 on context 1
+ base + 0x002084: Enable bits for sources 32-63 on context 1
+ ...
+ base + 0x0020FF: Enable bits for sources 992-1023 on context 1
+ base + 0x002100: Enable bits for sources 0-31 on context 2
+ base + 0x002104: Enable bits for sources 32-63 on context 2
+ ...
+ base + 0x00217F: Enable bits for sources 992-1023 on context 2
+ ...
+ base + 0x1F1F80: Enable bits for sources 0-31 on context 15871
+ base + 0x1F1F84: Enable bits for sources 32-63 on context 15871
+ base + 0x1F1FFF: Enable bits for sources 992-1023 on context 15871
+ ...
+ base + 0x1FFFFC: Reserved
+ base + 0x200000: Priority threshold for context 0
+ base + 0x200004: Claim/complete for context 0
+ base + 0x200008: Reserved
+ ...
+ base + 0x200FFC: Reserved
+ base + 0x201000: Priority threshold for context 1
+ base + 0x201004: Claim/complete for context 1
+ ...
+ base + 0x3FFE000: Priority threshold for context 15871
+ base + 0x3FFE004: Claim/complete for context 15871
+ base + 0x3FFE008: Reserved
+ ...
+ base + 0x3FFFFFC: Reserved
+
+Sections below describe the control register blocks of PLIC operation parameters.
+
+== Register Width
+
+The memory map register width is in 32-bit.
+
+== Interrupt Priorities
+
+If PLIC supports Interrupt Priorities, then each PLIC interrupt source can be assigned a priority by writing to its 32-bit
+memory-mapped `priority` register. A priority value of 0 is reserved to mean ''never interrupt'' and effectively
+disables the interrupt. Priority 1 is the lowest active priority while the maximum level of priority depends on
+PLIC implementation. Ties between global interrupts of the same priority are broken by the Interrupt ID; interrupts
+with the lowest ID have the highest
+effective priority. +
+ +
+The base address of Interrupt Source Priority block within PLIC Memory Map region is fixed at 0x000000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name*| *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Source Priority
+|Interrupt Source Priority #0 to #1023
+|1024 * 4 = 4096(0x1000) bytes
+|This is a continuously memory block which contains PLIC Interrupt Source Priority. Total 1024 Interrupt Source Priority
+in this memory block. Interrupt Source Priority #0 is reserved which indicates it does not exist.
+|===
+
+*PLIC Interrupt Source Priority Memory Map* +
+
+ 0x000000: Reserved (interrupt source 0 does not exist)
+ 0x000004: Interrupt source 1 priority
+ 0x000008: Interrupt source 2 priority
+ ...
+ 0x000FFC: Interrupt source 1023 priority
+
+== Interrupt Pending Bits
+
+The current status of the interrupt source pending bits in the PLIC core can be
+read from the pending array, organized as 32-bit register. The pending bit
+for interrupt ID N is stored in bit (N mod 32) of word (N/32). Bit 0
+of word 0, which represents the non-existent interrupt source 0, is hardwired
+to zero.
+
+A pending bit in the PLIC core can be cleared by setting the associated enable
+bit then performing a claim. +
+ +
+The base address of Interrupt Pending Bits block within PLIC Memory Map region is fixed at 0x001000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Pending Bits
+|Interrupt Pending Bit of Interrupt Source #0 to #N
+|1024 / 8 = 128(0x80) bytes
+|This is a continuously memory block contains PLIC Interrupt Pending Bits. Each Interrupt Pending Bit occupies 1-bit from this register block.
+|===
+
+*PLIC Interrupt Pending Bits Memory Map* +
+
+ 0x001000: Interrupt Source #0 to #31 Pending Bits
+ ...
+ 0x00107C: Interrupt Source #992 to #1023 Pending Bits
+
+
+== Interrupt Enables
+
+Each global interrupt can be enabled by setting the corresponding bit in the
+`enables` register. The `enables` registers are accessed as a contiguous array
+of 32-bit registers, packed the same way as the `pending` bits. Bit 0 of enable
+register 0 represents the non-existent interrupt ID 0 and is hardwired to 0.
+PLIC has 15872 Interrupt Enable blocks for the contexts. The `context` is referred
+to the specific privilege mode in the specific Hart of specific RISC-V processor
+instance. How PLIC organizes interrupts for the contexts (Hart and privilege mode)
+is out of RISC-V PLIC specification scope, however it must be spec-out in vendor's
+PLIC specification. +
+ +
+The base address of Interrupt Enable Bits block within PLIC Memory Map region is fixed at 0x002000. +
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Enable Bits
+|Interrupt Enable Bit of Interrupt Source #0 to #1023 for 15872 contexts
+|(1024 / 8) * 15872 = 2031616(0x1f0000) bytes
+|This is a continuously memory block contains PLIC Interrupt Enable Bits of 15872 contexts.
+Each Interrupt Enable Bit occupies 1-bit from this register block and total 15872 Interrupt
+Enable Bit blocks
+|===
+
+*PLIC Interrupt Enable Bits Memory Map* +
+
+ 0x002000: Interrupt Source #0 to #31 Enable Bits on context 0
+ ...
+ 0x00207F: Interrupt Source #992 to #1023 Enable Bits on context 0
+ 0x002080: Interrupt Source #0 to #31 Enable Bits on context 1
+ ...
+ 0x0020FF: Interrupt Source #992 to #1023 Enable Bits on context 1
+ 0x002100: Interrupt Source #0 to #31 Enable Bits on context 2
+ ...
+ 0x00217F: Interrupt Source #992 to #1023 Enable Bits on context 2
+ 0x002180: Interrupt Source #0 to #31 Enable Bits on context 3
+ ...
+ 0x0021FF: Interrupt Source #992 to #1023 Enable Bits on context 3
+ ...
+ ...
+ ...
+ 0x1F1F80: Interrupt Source #0 to #31 on context 15871
+ ...
+ 0x1F1F80: Interrupt Source #992 to #1023 on context 15871
+
+== Priority Thresholds
+
+PLIC provides context based `threshold register` for the settings of a interrupt priority
+threshold of each context. The `threshold register` is a WARL field. The PLIC will mask all
+PLIC interrupts of a priority less than or equal to `threshold`. For example,
+a`threshold` value of zero permits all interrupts with non-zero priority. +
+ +
+The base address of Priority Thresholds register block is located at 4K alignement starts
+from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Priority Threshold
+|Priority Threshold for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register of Priority Thresholds setting for each conetxt
+|===
+
+*PLIC Interrupt Priority Thresholds Memory Map* +
+
+ 0x200000: Priority threshold for context 0
+ 0x201000: Priority threshold for context 1
+ 0x202000: Priority threshold for context 2
+ 0x203000: Priority threshold for context 3
+ ...
+ ...
+ ...
+ 0x3FFF000: Priority threshold for context 15871
+
+== Interrupt Claim Process
+
+The PLIC can perform an interrupt claim by reading the `claim/complete`
+register, which returns the ID of the highest priority pending interrupt or
+zero if there is no pending interrupt. A successful claim will also atomically
+clear the corresponding pending bit on the interrupt source. +
+The PLIC can perform a claim at any time and the claim operation is not affected
+by the setting of the priority threshold register. +
+The Interrupt Claim Process register is context based and is located at
+(4K alignement + 4) starts from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Claim Register
+|Interrupt Claim Process for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register used to acquire interrupt ID for each conetxt
+|===
+
+*PLIC Interrupt Claim Process Memory Map* +
+
+ 0x200004: Interrupt Claim Process for context 0
+ 0x201004: Interrupt Claim Process for context 1
+ 0x202004: Interrupt Claim Process for context 2
+ 0x203004: Interrupt Claim Process for context 3
+ ...
+ ...
+ ...
+ 0x3FFF004: Interrupt Claim Process for context 15871
+
+## Interrupt Completion
+
+The PLIC signals it has completed executing an interrupt handler by writing the
+interrupt ID it received from the claim to the `claim/complete` register. The
+PLIC does not check whether the completion ID is the same as the last claim ID
+for that target. If the completion ID does not match an interrupt source that
+is currently enabled for the target, the completion is silently ignored. +
+The Interrupt Completion registers are context based and located at the same address
+with Interrupt Claim Process register, which is at (4K alignement + 4) starts from
+offset 0x200000.
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Registers*|*Register Block Size in Byte*| *Description*
+|Interrupt Completion Register
+|Interrupt Completion for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is register to write to complete Interrupt process
+|===
+
+*PLIC Interrupt Completion Memory Map* +
+
+ 0x200004: Interrupt Completion for context 0
+ 0x201004: Interrupt Completion for context 1
+ 0x202004: Interrupt Completion for context 2
+ 0x203004: Interrupt Completion for context 3
+ ...
+ ...
+ ...
+ 0x3FFF004: Interrupt Completion for context 15871
+
--
2.19.0.windows.1


[PATCH 0/1] Initial commit of PLIC

Abner Chang
 

From: Abner Chang <abner.chang@...>

As Atish mentioned in the meeting, resend the patch to this task
group for the widely review becasue this document is referred in
RISC-V platform spec.

Abner Chang (1):
Initial commit of PLIC

--
2.19.0.windows.1


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Fri, Jun 18, 2021 at 9:01 AM Abner Chang <renba.chang@...> wrote:
Greg Favor <gfavor@...> 於 2021年6月18日 週五 上午2:03寫道:
On Thu, Jun 17, 2021 at 8:56 AM Abner Chang <renba.chang@...> wrote:
- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.

I would have thought that this is just a software issue.  What kind of hardware mechanism do you picture being needed?
That could be,
- If RAS error triggers M-mode (FFM) and firmware decides to expose the error to OS (could be configured through CSR or RAS registers), then the RAS OS interrupt can be triggered when the system exits M-mode.
- or If RAS error  triggers Management mode in TEE, then  the RAS OS interrupt to can be triggered when the system exits TEE.
The knob of exposing RAS errors to OS could go with each RAS error configuration register or just one centralized RAS register or CSR for all RAS errors.
Suppose the event to bring the system to TEE has the most priority even the system is executing in M-Mode. This makes sure firmware can address the RAS error immediately when it happens in any privilege.

Thanks.  This does seem to be all a matter of software configuring and handling things appropriately.
 

- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

Wouldn't the TEE be running in M-mode?  Or where is it expected to be running?
yes,TEE is be running in M-mode if the memory serves me right from the spec. My expectation of TEE is there would be an event that can be triggered by either hardware or software to bring the system to TEE no matter which mode the HART is currently running, I am not sure if this is how TEE would be implemented.

Then this just becomes a matter of software configuring the interrupt controller to direct a given interrupt source to a given privilege mode.

 
For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.

In x86 and ARM platforms, doesn't the OS pretty much always handle PCIe AER errors (i.e. OS-first for this class of errors)?  (I was reading an Intel overview doc recently that essentially said that - irrespective of whether other classes of errors are OS-first or firmware-first).)
Besides correcting the error in firmware, firmware also logs the necessary PCIe error events to BMC before OS handling that. The firmware RAS logs are retrieved in out-of-band even the system is shut down or the OS crashes. This increases the diagnosability and decreases the cost of customer service in the field.

Just fyi, this paper discusses use of both models in the x86 world: a-tour-beyond-bios-implementing-the-acpi-platform-error-interface-with-the-uefi.  As a number of us will remember from the ARMv8 days, there were big (as in religious) arguments over which model was the right one to adopt.  Ultimately it was accepted that both need to be supported by the architecture.  The point being that the OS/A platform spec should support both and not presume one as the one and only answer.
 
Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Abner Chang
 



Greg Favor <gfavor@...> 於 2021年6月18日 週五 上午2:03寫道:
On Thu, Jun 17, 2021 at 8:56 AM Abner Chang <renba.chang@...> wrote:
- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.

I would have thought that this is just a software issue.  What kind of hardware mechanism do you picture being needed?
That could be,
- If RAS error triggers M-mode (FFM) and firmware decides to expose the error to OS (could be configured through CSR or RAS registers), then the RAS OS interrupt can be triggered when the system exits M-mode.
- or If RAS error  triggers Management mode in TEE, then  the RAS OS interrupt to can be triggered when the system exits TEE.
The knob of exposing RAS errors to OS could go with each RAS error configuration register or just one centralized RAS register or CSR for all RAS errors.
Suppose the event to bring the system to TEE has the most priority even the system is executing in M-Mode. This makes sure firmware can address the RAS error immediately when it happens in any privilege.
 
- Each RAS error should be able to mask through RAS configuration registers.

By "mask" do you mean masking of generation of an error interrupt?
Yes, to mask the RAS error interrupt or even not to create the log (in RAS status registers or CSR) that OEM doesn't consider that is a useful or important error to product. 
 
- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

Wouldn't the TEE be running in M-mode?  Or where is it expected to be running?
yes,TEE is be running in M-mode if the memory serves me right from the spec. My expectation of TEE is there would be an event that can be triggered by either hardware or software to bring the system to TEE no matter which mode the HART is currently running, I am not sure if this is how TEE would be implemented.
 
For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.

In x86 and ARM platforms, doesn't the OS pretty much always handle PCIe AER errors (i.e. OS-first for this class of errors)?  (I was reading an Intel overview doc recently that essentially said that - irrespective of whether other classes of errors are OS-first or firmware-first).)
Besides correcting the error in firmware, firmware also logs the necessary PCIe error events to BMC before OS handling that. The firmware RAS logs are retrieved in out-of-band even the system is shut down or the OS crashes. This increases the diagnosability and decreases the cost of customer service in the field.

Abner


Besides memory and PCIe RAS, do we have RAS errors for the processor/HART? such as IPI error or some CE/UC/UCR to HART locally?

Definitely there will be processor/hart errors.  Presumably each hart would output one or more RAS interrupt request signals.

Greg


Re: Non-coherent I/O

Greg Favor
 

On Mon, Jun 14, 2021 at 1:04 PM Greg Favor <gfavor@...> wrote:
I have already sent questions to Andrew to get the official view as to the intent of this aspect of the Priv spec and what is the proper way or perspective with which to be reading the ISA specs.  That then may result in the need for clarifying text to be added to the spec.  And once it is clear as to the scope and bounds of the ISA specs and what they require and allow, then it is left to profile and platform specs to specify tighter requirements.
Here's the results of my Q&A with Andrew: 

- The Priv (and Unpriv) ISA specs are just that.  They are CPU architecture specs and should be read with that limited scope in mind.  They may touch on system-level issues, but they are not trying to constrain the flexibility in how these issues are handled across a wide range of system designs.  (I'll personally add on that RVI now makes an official distinction between ISA (Unpriv and Priv) and Non-ISA (aka system-related) arch specs.  The former apply inside of a hart; the latter apply outside of a hart.)

- Per above, PMAs and the PMA coherency attribute are CPU-specific and only apply to memory accesses by harts.  (One can choose to apply these ideas to accesses by other master agents in a system, but that's not officially a Priv spec matter.)

- The PMA coherency attribute only applies to that hart's accesses.  It is up to software to configure the PMAs in all harts to be the same, or not, as desired.  What is done for non-hart accesses (i.e. by I/O devices) is not specified by the Priv spec.  Hence there are no implications on I/O coherency, one way or another, by the Priv spec.

Naturally many if not most system designs will extend these ideas in some manner across the system and to other masters.  And platform specs may choose to specify and mandate some or all of this.  But that's not the business of the ISA specs.

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Allen Baum
 

Good answers all around; I didn't pick up on the difference between OS-A base and OS-A server difference.

It makes sense in hindsight for the manufacturers to set the MTBF goal and design to meet it. I was concerned that this could be met without the complexity of SB L!D caches, but if those a typical base platforms instead of server platforms, anyway, its not a significant concern in any case.

On Thu, Jun 17, 2021 at 12:01 PM Kumar Sankaran <ksankaran@...> wrote:

To add to what Greg mentioned below, the RAS features as mentioned in the patch is required only for the OS-A platform server extension. We are not mandating any RAS requirements for the OS-A base platform compatibility.

 

Regards

Kumar

From: Greg Favor <gfavor@...>
Sent: Thursday, June 17, 2021 11:54 AM
To: Allen Baum <allen.baum@...>
Cc: Abner Chang <renba.chang@...>; Kumar Sankaran <ksankaran@...>; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] [PATCH 1/1] RAS features for OS-A platform server extension

 

On Thu, Jun 17, 2021 at 11:13 AM Allen Baum <allen.baum@...> wrote:

Is it acceptable to everyone that all single bit errors on all caches must be correctable?

 

Nowadays single-bit errors are far from rare.  There will always be people that run Linux and are willing to accept occasional silent corruptions and whatever mysterious application/data corruptions occur as a result.  But for a standardized server-class platform spec, this is a rather low "table stakes" bar to set.  Virtually no customer of a "server-class" platform will be comfortable without that (especially since the x86 and ARM alternatives provide at least that).

 

That really affects designs in fundamental ways for L1 caches (as opposed to simply detecting).

 

Parity (and invalidate on error detection) suffices for I and WT D caches; and ECC is used on WB D caches.  Even L1 D caches (which is one argument for doing a WT L1 D cache with parity, but the majority of people still do WB L1 D caches with ECC).

 

Understandably some people don't want to deal with ECC on a WB DL1, and parity or nothing may be fine for less-than server-class systems.

 

Not as big a concern for L2 and above.

Speaking from my Intel experience, the rule was expressed as failures per year - and if an L1 cache was small enough to exceed that number, then it didn't need correction.

 

Somewhat analogous, TSMC imposes similarly expressed requirements wrt having redundancy in all the RAMs.  Even just one non-redundant 64 KiB cache can pretty much use up what is allowed to not have redundancy.

 

In any case, the Base platform spec should allow people to make whatever choice they want (and live with the consequences).  But to be competitive and to meet customer expectations (especially in a multi-core world), the Server spec needs to require a higher-than-nothing bar.

 

So, it might be useful to have a measurement baseline like that, rather than an absolute requirement.

 

A functional requirement is simple to specify and aligns with standard industry practices.  The alternatives get more involved and in practice won't provide much of any value over the functional requirement (for server-class systems).

 

The argument is why are you requiring ecc correction on this - and not the register file, or CSRs?

 

This is a baseline requirement - aligned with common/dominant industry practice.  Conversely it is not a dominant industry practice to protect flop-based register files (or flop-based storage structures in general).  (Latch-based register files, depending on whether the bitcell is more SRAM-like or flop-like, fall in one category or the other.)

 

The reason is they're small enough that failures are unlikely - and that's what your rationale should be stated.

 

Nowadays even the aggregate error rate or MTBF due to flop soft errors is not small.  But thankfully for most designs that MTBF component is acceptable within typical MTBF budgets.

 

As far as instead specifying an MTBF requirement, one then gets into system-wide issues and overall MTBF budgets, where it gets spent, what about the technology dependence of all this, and ....  Plus that effectively would provide little guidance to CPU designers as to what is their individual MTBF budget.  Or, conversely, one can probably have long discussions/arguments about what is the right MTBF number to require at the level of a single CPU core.

 

But at the end of the day very few or virtually no customer of a server-class system is going to accept a product that doesn't even have single-bit error protection on the cache hierarchy.

 

Greg

 


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Kumar Sankaran
 

To add to what Greg mentioned below, the RAS features as mentioned in the patch is required only for the OS-A platform server extension. We are not mandating any RAS requirements for the OS-A base platform compatibility.

 

Regards

Kumar

From: Greg Favor <gfavor@...>
Sent: Thursday, June 17, 2021 11:54 AM
To: Allen Baum <allen.baum@...>
Cc: Abner Chang <renba.chang@...>; Kumar Sankaran <ksankaran@...>; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] [PATCH 1/1] RAS features for OS-A platform server extension

 

On Thu, Jun 17, 2021 at 11:13 AM Allen Baum <allen.baum@...> wrote:

Is it acceptable to everyone that all single bit errors on all caches must be correctable?

 

Nowadays single-bit errors are far from rare.  There will always be people that run Linux and are willing to accept occasional silent corruptions and whatever mysterious application/data corruptions occur as a result.  But for a standardized server-class platform spec, this is a rather low "table stakes" bar to set.  Virtually no customer of a "server-class" platform will be comfortable without that (especially since the x86 and ARM alternatives provide at least that).

 

That really affects designs in fundamental ways for L1 caches (as opposed to simply detecting).

 

Parity (and invalidate on error detection) suffices for I and WT D caches; and ECC is used on WB D caches.  Even L1 D caches (which is one argument for doing a WT L1 D cache with parity, but the majority of people still do WB L1 D caches with ECC).

 

Understandably some people don't want to deal with ECC on a WB DL1, and parity or nothing may be fine for less-than server-class systems.

 

Not as big a concern for L2 and above.

Speaking from my Intel experience, the rule was expressed as failures per year - and if an L1 cache was small enough to exceed that number, then it didn't need correction.

 

Somewhat analogous, TSMC imposes similarly expressed requirements wrt having redundancy in all the RAMs.  Even just one non-redundant 64 KiB cache can pretty much use up what is allowed to not have redundancy.

 

In any case, the Base platform spec should allow people to make whatever choice they want (and live with the consequences).  But to be competitive and to meet customer expectations (especially in a multi-core world), the Server spec needs to require a higher-than-nothing bar.

 

So, it might be useful to have a measurement baseline like that, rather than an absolute requirement.

 

A functional requirement is simple to specify and aligns with standard industry practices.  The alternatives get more involved and in practice won't provide much of any value over the functional requirement (for server-class systems).

 

The argument is why are you requiring ecc correction on this - and not the register file, or CSRs?

 

This is a baseline requirement - aligned with common/dominant industry practice.  Conversely it is not a dominant industry practice to protect flop-based register files (or flop-based storage structures in general).  (Latch-based register files, depending on whether the bitcell is more SRAM-like or flop-like, fall in one category or the other.)

 

The reason is they're small enough that failures are unlikely - and that's what your rationale should be stated.

 

Nowadays even the aggregate error rate or MTBF due to flop soft errors is not small.  But thankfully for most designs that MTBF component is acceptable within typical MTBF budgets.

 

As far as instead specifying an MTBF requirement, one then gets into system-wide issues and overall MTBF budgets, where it gets spent, what about the technology dependence of all this, and ....  Plus that effectively would provide little guidance to CPU designers as to what is their individual MTBF budget.  Or, conversely, one can probably have long discussions/arguments about what is the right MTBF number to require at the level of a single CPU core.

 

But at the end of the day very few or virtually no customer of a server-class system is going to accept a product that doesn't even have single-bit error protection on the cache hierarchy.

 

Greg

 

741 - 760 of 1836