Date   

Re: [PATCH 1/1] RAS features for OS-A platform server extension

Abner Chang
 



Greg Favor <gfavor@...> 於 2021年6月23日 週三 上午9:51寫道:
On Tue, Jun 22, 2021 at 5:34 PM Kumar Sankaran <ksankaran@...> wrote:
I think the primary requirements here are the following:
- The platform should provide the capability to configure each RAS
error to trigger firmware-first or OS-first error interrupt.
Agreed. 

Yes.  Which is just a software matter of configuring the interrupt controller accordingly.
Does this mean the interrupt controller would integrate all RAS events (HART, PCI, I/O, memory and etc.)? 
Or there would be a separate hardware box that manages all RAS error events, and maybe some error signals output from that box and connected to the interrupt controller? The interrupt controller just provides the mechanism to morph those error signals to FFM or OSF interrupt?
 
- If the RAS error is handled by firmware, the firmware should be able
to choose to expose the error to S/HS mode for further processes or
just hide the error from S/HS software.
Is there a need to provide all the other details?

Agreed.  The details and mechanics don't need to be discussed (unless they are mandating specific mechanics - which I don't believe is the case). 
Agreed. 

> Yes, to mask the RAS error interrupt or even not to create the log (in RAS status registers or CSR) that OEM doesn't consider that is a useful or important error to product.

This is fine

Maybe just say that "Logging and/or reporting of errors can be masked".
Agreed.

 
Can we summarize the requirement to
- RAS errors should be capable of interrupting TEE.
This is ok for now because there is no hardware signal defined for triggering TEE right? I have more comments on this below. 

This implies a requirement to have a TEE - and defining what constitutes a compliant TEE in the platform spec.  Btw, what distinguishes the TEE from "firmware"?
Please correct me on ARM part if I am wrong.
The equivalent mechanism to TEE is SMM on X86 and TZ on ARM. I don't quite understand how ARM TZ works, however on X86 system, all cores are brought to SMM environment when SMI is triggered. ARM has the equivalent event which is SMC, right?
The above is called management mode (MM) which is defined in UEFI PI spec. MM has the highest privilege than CR0 on X86 and EL3 on ARM. The MM is OS agnostic and the MM event halts any processes and gets the core into management mode to run the firmware code. The environment of MM (data and code) can only be accessed when the core is in MM. Firmware always uses this for the secure stuff, power management, and of course the RAS.

I would like to add one more thing to the RAS requirement but I don't know how to describe it properly because seems we don't have the MM event on RISC-V such as SMI and SMC which can bring the system to MM. So there are two scenarios for RAS on the firmware first model.
- If the platform doesn't have TEE and the hardware event to trigger TEE:
  If the RAS event is configured to firmware first mode, the platform should be able to trigger M-Mode exception to all harts in the physical processor. This prevents the subsequent RAS error propagated by other harts that access the problematic hardware (PCI, memory and etc.)

- If the platform has TEE and the hardware event to trigger TEE:
    If the RAS event is configured to firmware first mode, the platform should be able to trigger TEE event to all harts in the physical processor and bring all harts into TEE. This prevents the subsequent RAS error propagated by other cores which access the problematic hardware (PCI, memory and etc.) 

 
 
The PCIe AER errors have been handled OS first on X86 systems. If I
recall correct, ARM64 initially made PCIe AER errors firmware first
and then later changed to OS first to be compliant with what's already
out there.
The exact manner of handling these PCIe AER errors is also OEM
dependent. Some OEMs will handle it OS first while making a call to
the firmware to take additional corrective action of notifying the BMC
and such. Some ARM64 implementations handle this firmware first and
notify the BMC and then notify the OS.
From a RISC-V platforms requirements perspective, my suggestion is we
simply mention the capability of all errors to have support for
firmware first and OS first and leave it at that.

Agreed all around.
Agreed.

Abner 

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Tue, Jun 22, 2021 at 5:34 PM Kumar Sankaran <ksankaran@...> wrote:
I think the primary requirements here are the following:
- The platform should provide the capability to configure each RAS
error to trigger firmware-first or OS-first error interrupt.

Yes.  Which is just a software matter of configuring the interrupt controller accordingly.
 
- If the RAS error is handled by firmware, the firmware should be able
to choose to expose the error to S/HS mode for further processes or
just hide the error from S/HS software.
Is there a need to provide all the other details?

Agreed.  The details and mechanics don't need to be discussed (unless they are mandating specific mechanics - which I don't believe is the case). 

> Yes, to mask the RAS error interrupt or even not to create the log (in RAS status registers or CSR) that OEM doesn't consider that is a useful or important error to product.

This is fine

Maybe just say that "Logging and/or reporting of errors can be masked".
 
Can we summarize the requirement to
- RAS errors should be capable of interrupting TEE.

This implies a requirement to have a TEE - and defining what constitutes a compliant TEE in the platform spec.  Btw, what distinguishes the TEE from "firmware"?
 
The PCIe AER errors have been handled OS first on X86 systems. If I
recall correct, ARM64 initially made PCIe AER errors firmware first
and then later changed to OS first to be compliant with what's already
out there.
The exact manner of handling these PCIe AER errors is also OEM
dependent. Some OEMs will handle it OS first while making a call to
the firmware to take additional corrective action of notifying the BMC
and such. Some ARM64 implementations handle this firmware first and
notify the BMC and then notify the OS.
From a RISC-V platforms requirements perspective, my suggestion is we
simply mention the capability of all errors to have support for
firmware first and OS first and leave it at that.

Agreed all around.

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Kumar Sankaran
 

Greg - Do you have any further comments/responses to Abner's comments below?
Abner - my comments inline below.

On Fri, Jun 18, 2021 at 9:01 AM Abner Chang <renba.chang@...> wrote:



Greg Favor <gfavor@...> 於 2021年6月18日 週五 上午2:03寫道:

On Thu, Jun 17, 2021 at 8:56 AM Abner Chang <renba.chang@...> wrote:

- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.

I would have thought that this is just a software issue. What kind of hardware mechanism do you picture being needed?
That could be,
- If RAS error triggers M-mode (FFM) and firmware decides to expose the error to OS (could be configured through CSR or RAS registers), then the RAS OS interrupt can be triggered when the system exits M-mode.
- or If RAS error triggers Management mode in TEE, then the RAS OS interrupt to can be triggered when the system exits TEE.
The knob of exposing RAS errors to OS could go with each RAS error configuration register or just one centralized RAS register or CSR for all RAS errors.
Suppose the event to bring the system to TEE has the most priority even the system is executing in M-Mode. This makes sure firmware can address the RAS error immediately when it happens in any privilege.
I think the primary requirements here are the following:
- The platform should provide the capability to configure each RAS
error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able
to choose to expose the error to S/HS mode for further processes or
just hide the error from S/HS software.
Is there a need to provide all the other details?




- Each RAS error should be able to mask through RAS configuration registers.

By "mask" do you mean masking of generation of an error interrupt?
Yes, to mask the RAS error interrupt or even not to create the log (in RAS status registers or CSR) that OEM doesn't consider that is a useful or important error to product.

This is fine


- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

Wouldn't the TEE be running in M-mode? Or where is it expected to be running?
yes,TEE is be running in M-mode if the memory serves me right from the spec. My expectation of TEE is there would be an event that can be triggered by either hardware or software to bring the system to TEE no matter which mode the HART is currently running, I am not sure if this is how TEE would be implemented.

Can we summarize the requirement to
- RAS errors should be capable of interrupting TEE.


For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.

In x86 and ARM platforms, doesn't the OS pretty much always handle PCIe AER errors (i.e. OS-first for this class of errors)? (I was reading an Intel overview doc recently that essentially said that - irrespective of whether other classes of errors are OS-first or firmware-first).)
Besides correcting the error in firmware, firmware also logs the necessary PCIe error events to BMC before OS handling that. The firmware RAS logs are retrieved in out-of-band even the system is shut down or the OS crashes. This increases the diagnosability and decreases the cost of customer service in the field.

Abner
The PCIe AER errors have been handled OS first on X86 systems. If I
recall correct, ARM64 initially made PCIe AER errors firmware first
and then later changed to OS first to be compliant with what's already
out there.
The exact manner of handling these PCIe AER errors is also OEM
dependent. Some OEMs will handle it OS first while making a call to
the firmware to take additional corrective action of notifying the BMC
and such. Some ARM64 implementations handle this firmware first and
notify the BMC and then notify the OS.
From a RISC-V platforms requirements perspective, my suggestion is we
simply mention the capability of all errors to have support for
firmware first and OS first and leave it at that.



Besides memory and PCIe RAS, do we have RAS errors for the processor/HART? such as IPI error or some CE/UC/UCR to HART locally?

Definitely there will be processor/hart errors. Presumably each hart would output one or more RAS interrupt request signals.

Greg
Yes, there will be more RAS errors. For the initial spec, we are only
making the bare minimal set of RAS features mandatory for the server
extension for 2022. We can add more RAS features as things solidify.

--
Regards
Kumar


Re: [PATCH 1/1] Initial commit of PLIC

@guoren
 

On Sun, Jun 20, 2021 at 9:36 PM Abner Chang <renba.chang@...> wrote:

From: Abner Chang <abner.chang@...>

This is the commit for creating the patches for
widely review in Platform Spec HSC task group

Signed-off-by: Abner Chang <abner.chang@...>
---
riscv-plic.adoc | 306 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 306 insertions(+)
create mode 100644 riscv-plic.adoc

diff --git a/riscv-plic.adoc b/riscv-plic.adoc
new file mode 100644
index 0000000..b770e0e
--- /dev/null
+++ b/riscv-plic.adoc
@@ -0,0 +1,306 @@
+= *RISC-V Platform-Level Interrupt Controller Specification*
+
+== Copyright and license information
+
+This RISC-V PLIC specification is
+
+[%hardbreaks]
+(C) 2017 Drew Barbier <drew@...>
+(C) 2018-2019 Palmer Dabbelt <palmer@...>
+(C) 2019 Abner Chang, Hewlett Packard Enterprise <abner.chang@...>
+
+It is licensed under the Creative Commons Attribution 4.0 International
+License (CC-BY 4.0). The full license text is available at
+https://creativecommons.org/licenses/by/4.0/.
+
+== Introduction
+
+This document contains the RISC-V platform-level interrupt controller (PLIC)
+specification, which defines an interrupt controller specifically designed to
+work in the context of RISC-V systems. The PLIC multiplexes various device
+interrupts onto the external interrupt lines of Hart contexts, with
+hardware support for interrupt priorities. +
+This specification defines the general PLIC architecture and operation parameters.
+The PLIC which claimed as PLIC-Compliant standard PLIC should follow the
+implementations mentioned in sections below.
+
+.Figure 1 RISC-V PLIC Interrupt Architecture Block Diagram
+image::Images/PLIC.jpg[GitHub,1000,643, link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLIC.jpg]
+
+== RISC-V PLIC Operation Parameters
+
+General PLIC operation parameter register blocks are defined in this spec, those are: +
+
+- *Interrupt Priorities registers:* +
+ The interrupt priority for each interrupt source. +
+
+- *Interrupt Pending Bits registers:* +
+ The interrupt pending status of each interrupt source. +
+
+- *Interrupt Enables registers:* +
+ The enablement of interrupt source of each context. +
+
+- *Priority Thresholds registers:* +
+ The interrupt priority threshold of each context. +
+
+- *Interrupt Claim registers:* +
+ The register to acquire interrupt source ID of each context. +
+
+- *Interrupt Completion registers:* +
+ The register to send interrupt completion message to the associated gateway. +
+
++
+
+Below is the figure of PLIC Operation Parameter Block Diagram,
+
+.Figure 2 PLIC Operation Parameter Block Diagram
+image::Images/PLICArch.jpg[GitHub, link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLICArch.jpg]
+
+== Memory Map
+
+The `base address of PLIC Memory Map` is platform implementation-specific.
+
+*PLIC Memory Map*
+
+ base + 0x000000: Reserved (interrupt source 0 does not exist)
+ base + 0x000004: Interrupt source 1 priority
+ base + 0x000008: Interrupt source 2 priority
+ ...
+ base + 0x000FFC: Interrupt source 1023 priority
+ base + 0x001000: Interrupt Pending bit 0-31
+ base + 0x00107C: Interrupt Pending bit 992-1023
+ ...
+ base + 0x002000: Enable bits for sources 0-31 on context 0
+ base + 0x002004: Enable bits for sources 32-63 on context 0
+ ...
+ base + 0x00207F: Enable bits for sources 992-1023 on context 0
+ base + 0x002080: Enable bits for sources 0-31 on context 1
+ base + 0x002084: Enable bits for sources 32-63 on context 1
+ ...
+ base + 0x0020FF: Enable bits for sources 992-1023 on context 1
+ base + 0x002100: Enable bits for sources 0-31 on context 2
+ base + 0x002104: Enable bits for sources 32-63 on context 2
+ ...
+ base + 0x00217F: Enable bits for sources 992-1023 on context 2
+ ...
+ base + 0x1F1F80: Enable bits for sources 0-31 on context 15871
+ base + 0x1F1F84: Enable bits for sources 32-63 on context 15871
+ base + 0x1F1FFF: Enable bits for sources 992-1023 on context 15871
+ ...
+ base + 0x1FFFFC: Reserved
+ base + 0x200000: Priority threshold for context 0
+ base + 0x200004: Claim/complete for context 0
+ base + 0x200008: Reserved
+ ...
+ base + 0x200FFC: Reserved
+ base + 0x201000: Priority threshold for context 1
+ base + 0x201004: Claim/complete for context 1
+ ...
+ base + 0x3FFE000: Priority threshold for context 15871
+ base + 0x3FFE004: Claim/complete for context 15871
+ base + 0x3FFE008: Reserved
+ ...
+ base + 0x3FFFFFC: Reserved
+
+Sections below describe the control register blocks of PLIC operation parameters.
+
+== Register Width
+
+The memory map register width is in 32-bit.
+
+== Interrupt Priorities
+
+If PLIC supports Interrupt Priorities, then each PLIC interrupt source can be assigned a priority by writing to its 32-bit
+memory-mapped `priority` register. A priority value of 0 is reserved to mean ''never interrupt'' and effectively
+disables the interrupt. Priority 1 is the lowest active priority while the maximum level of priority depends on
+PLIC implementation. Ties between global interrupts of the same priority are broken by the Interrupt ID; interrupts
+with the lowest ID have the highest
+effective priority. +
+ +
+The base address of Interrupt Source Priority block within PLIC Memory Map region is fixed at 0x000000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name*| *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Source Priority
+|Interrupt Source Priority #0 to #1023
+|1024 * 4 = 4096(0x1000) bytes
+|This is a continuously memory block which contains PLIC Interrupt Source Priority. Total 1024 Interrupt Source Priority
+in this memory block. Interrupt Source Priority #0 is reserved which indicates it does not exist.
+|===
+
+*PLIC Interrupt Source Priority Memory Map* +
+
+ 0x000000: Reserved (interrupt source 0 does not exist)
+ 0x000004: Interrupt source 1 priority
+ 0x000008: Interrupt source 2 priority
+ ...
+ 0x000FFC: Interrupt source 1023 priority
+
+== Interrupt Pending Bits
+
+The current status of the interrupt source pending bits in the PLIC core can be
+read from the pending array, organized as 32-bit register. The pending bit
+for interrupt ID N is stored in bit (N mod 32) of word (N/32). Bit 0
+of word 0, which represents the non-existent interrupt source 0, is hardwired
+to zero.
+
+A pending bit in the PLIC core can be cleared by setting the associated enable
+bit then performing a claim. +
I suggest adding PENDING SET and PENDING CLR registers to implement a
soft plic irq mechanism, here.

PENDING SET: only '1' bits of the value would be set into reg and '0'
bits of the value would be ignored.
PENDING CLR: only '0' bits of the value would be set into reg and '1'
bits of the value would be ignored.

How?

+ +
+The base address of Interrupt Pending Bits block within PLIC Memory Map region is fixed at 0x001000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Pending Bits
+|Interrupt Pending Bit of Interrupt Source #0 to #N
+|1024 / 8 = 128(0x80) bytes
+|This is a continuously memory block contains PLIC Interrupt Pending Bits. Each Interrupt Pending Bit occupies 1-bit from this register block.
+|===
+
+*PLIC Interrupt Pending Bits Memory Map* +
+
+ 0x001000: Interrupt Source #0 to #31 Pending Bits
+ ...
+ 0x00107C: Interrupt Source #992 to #1023 Pending Bits
+
+
+== Interrupt Enables
+
+Each global interrupt can be enabled by setting the corresponding bit in the
+`enables` register. The `enables` registers are accessed as a contiguous array
+of 32-bit registers, packed the same way as the `pending` bits. Bit 0 of enable
+register 0 represents the non-existent interrupt ID 0 and is hardwired to 0.
+PLIC has 15872 Interrupt Enable blocks for the contexts. The `context` is referred
+to the specific privilege mode in the specific Hart of specific RISC-V processor
+instance. How PLIC organizes interrupts for the contexts (Hart and privilege mode)
+is out of RISC-V PLIC specification scope, however it must be spec-out in vendor's
+PLIC specification. +
+ +
+The base address of Interrupt Enable Bits block within PLIC Memory Map region is fixed at 0x002000. +
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Enable Bits
+|Interrupt Enable Bit of Interrupt Source #0 to #1023 for 15872 contexts
+|(1024 / 8) * 15872 = 2031616(0x1f0000) bytes
+|This is a continuously memory block contains PLIC Interrupt Enable Bits of 15872 contexts.
+Each Interrupt Enable Bit occupies 1-bit from this register block and total 15872 Interrupt
+Enable Bit blocks
+|===
+
+*PLIC Interrupt Enable Bits Memory Map* +
+
+ 0x002000: Interrupt Source #0 to #31 Enable Bits on context 0
+ ...
+ 0x00207F: Interrupt Source #992 to #1023 Enable Bits on context 0
+ 0x002080: Interrupt Source #0 to #31 Enable Bits on context 1
+ ...
+ 0x0020FF: Interrupt Source #992 to #1023 Enable Bits on context 1
+ 0x002100: Interrupt Source #0 to #31 Enable Bits on context 2
+ ...
+ 0x00217F: Interrupt Source #992 to #1023 Enable Bits on context 2
+ 0x002180: Interrupt Source #0 to #31 Enable Bits on context 3
+ ...
+ 0x0021FF: Interrupt Source #992 to #1023 Enable Bits on context 3
+ ...
+ ...
+ ...
+ 0x1F1F80: Interrupt Source #0 to #31 on context 15871
+ ...
+ 0x1F1F80: Interrupt Source #992 to #1023 on context 15871
+
+== Priority Thresholds
+
+PLIC provides context based `threshold register` for the settings of a interrupt priority
+threshold of each context. The `threshold register` is a WARL field. The PLIC will mask all
+PLIC interrupts of a priority less than or equal to `threshold`. For example,
+a`threshold` value of zero permits all interrupts with non-zero priority. +
+ +
+The base address of Priority Thresholds register block is located at 4K alignement starts
+from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Priority Threshold
+|Priority Threshold for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register of Priority Thresholds setting for each conetxt
+|===
+
+*PLIC Interrupt Priority Thresholds Memory Map* +
+
+ 0x200000: Priority threshold for context 0
+ 0x201000: Priority threshold for context 1
+ 0x202000: Priority threshold for context 2
+ 0x203000: Priority threshold for context 3
+ ...
+ ...
+ ...
+ 0x3FFF000: Priority threshold for context 15871
+
+== Interrupt Claim Process
+
+The PLIC can perform an interrupt claim by reading the `claim/complete`
+register, which returns the ID of the highest priority pending interrupt or
+zero if there is no pending interrupt. A successful claim will also atomically
+clear the corresponding pending bit on the interrupt source. +
+The PLIC can perform a claim at any time and the claim operation is not affected
+by the setting of the priority threshold register. +
+The Interrupt Claim Process register is context based and is located at
+(4K alignement + 4) starts from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Claim Register
+|Interrupt Claim Process for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register used to acquire interrupt ID for each conetxt
+|===
+
+*PLIC Interrupt Claim Process Memory Map* +
+
+ 0x200004: Interrupt Claim Process for context 0
+ 0x201004: Interrupt Claim Process for context 1
+ 0x202004: Interrupt Claim Process for context 2
+ 0x203004: Interrupt Claim Process for context 3
+ ...
+ ...
+ ...
+ 0x3FFF004: Interrupt Claim Process for context 15871
+
+## Interrupt Completion
+
+The PLIC signals it has completed executing an interrupt handler by writing the
+interrupt ID it received from the claim to the `claim/complete` register. The
+PLIC does not check whether the completion ID is the same as the last claim ID
+for that target. If the completion ID does not match an interrupt source that
+is currently enabled for the target, the completion is silently ignored. +
+The Interrupt Completion registers are context based and located at the same address
+with Interrupt Claim Process register, which is at (4K alignement + 4) starts from
+offset 0x200000.
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Registers*|*Register Block Size in Byte*| *Description*
+|Interrupt Completion Register
+|Interrupt Completion for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is register to write to complete Interrupt process
+|===
+
+*PLIC Interrupt Completion Memory Map* +
+
+ 0x200004: Interrupt Completion for context 0
+ 0x201004: Interrupt Completion for context 1
+ 0x202004: Interrupt Completion for context 2
+ 0x203004: Interrupt Completion for context 3
+ ...
+ ...
+ ...
+ 0x3FFF004: Interrupt Completion for context 15871
+
--
2.19.0.windows.1






--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/


[PATCH 1/1] Initial commit of PLIC

Abner Chang
 

From: Abner Chang <abner.chang@...>

This is the commit for creating the patches for
widely review in Platform Spec HSC task group

Signed-off-by: Abner Chang <abner.chang@...>
---
riscv-plic.adoc | 306 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 306 insertions(+)
create mode 100644 riscv-plic.adoc

diff --git a/riscv-plic.adoc b/riscv-plic.adoc
new file mode 100644
index 0000000..b770e0e
--- /dev/null
+++ b/riscv-plic.adoc
@@ -0,0 +1,306 @@
+= *RISC-V Platform-Level Interrupt Controller Specification*
+
+== Copyright and license information
+
+This RISC-V PLIC specification is
+
+[%hardbreaks]
+(C) 2017 Drew Barbier <drew@...>
+(C) 2018-2019 Palmer Dabbelt <palmer@...>
+(C) 2019 Abner Chang, Hewlett Packard Enterprise <abner.chang@...>
+
+It is licensed under the Creative Commons Attribution 4.0 International
+License (CC-BY 4.0). The full license text is available at
+https://creativecommons.org/licenses/by/4.0/.
+
+== Introduction
+
+This document contains the RISC-V platform-level interrupt controller (PLIC)
+specification, which defines an interrupt controller specifically designed to
+work in the context of RISC-V systems. The PLIC multiplexes various device
+interrupts onto the external interrupt lines of Hart contexts, with
+hardware support for interrupt priorities. +
+This specification defines the general PLIC architecture and operation parameters.
+The PLIC which claimed as PLIC-Compliant standard PLIC should follow the
+implementations mentioned in sections below.
+
+.Figure 1 RISC-V PLIC Interrupt Architecture Block Diagram
+image::Images/PLIC.jpg[GitHub,1000,643, link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLIC.jpg]
+
+== RISC-V PLIC Operation Parameters
+
+General PLIC operation parameter register blocks are defined in this spec, those are: +
+
+- *Interrupt Priorities registers:* +
+ The interrupt priority for each interrupt source. +
+
+- *Interrupt Pending Bits registers:* +
+ The interrupt pending status of each interrupt source. +
+
+- *Interrupt Enables registers:* +
+ The enablement of interrupt source of each context. +
+
+- *Priority Thresholds registers:* +
+ The interrupt priority threshold of each context. +
+
+- *Interrupt Claim registers:* +
+ The register to acquire interrupt source ID of each context. +
+
+- *Interrupt Completion registers:* +
+ The register to send interrupt completion message to the associated gateway. +
+
++
+
+Below is the figure of PLIC Operation Parameter Block Diagram,
+
+.Figure 2 PLIC Operation Parameter Block Diagram
+image::Images/PLICArch.jpg[GitHub, link=https://github.com/riscv/riscv-plic-spec/blob/master/Images/PLICArch.jpg]
+
+== Memory Map
+
+The `base address of PLIC Memory Map` is platform implementation-specific.
+
+*PLIC Memory Map*
+
+ base + 0x000000: Reserved (interrupt source 0 does not exist)
+ base + 0x000004: Interrupt source 1 priority
+ base + 0x000008: Interrupt source 2 priority
+ ...
+ base + 0x000FFC: Interrupt source 1023 priority
+ base + 0x001000: Interrupt Pending bit 0-31
+ base + 0x00107C: Interrupt Pending bit 992-1023
+ ...
+ base + 0x002000: Enable bits for sources 0-31 on context 0
+ base + 0x002004: Enable bits for sources 32-63 on context 0
+ ...
+ base + 0x00207F: Enable bits for sources 992-1023 on context 0
+ base + 0x002080: Enable bits for sources 0-31 on context 1
+ base + 0x002084: Enable bits for sources 32-63 on context 1
+ ...
+ base + 0x0020FF: Enable bits for sources 992-1023 on context 1
+ base + 0x002100: Enable bits for sources 0-31 on context 2
+ base + 0x002104: Enable bits for sources 32-63 on context 2
+ ...
+ base + 0x00217F: Enable bits for sources 992-1023 on context 2
+ ...
+ base + 0x1F1F80: Enable bits for sources 0-31 on context 15871
+ base + 0x1F1F84: Enable bits for sources 32-63 on context 15871
+ base + 0x1F1FFF: Enable bits for sources 992-1023 on context 15871
+ ...
+ base + 0x1FFFFC: Reserved
+ base + 0x200000: Priority threshold for context 0
+ base + 0x200004: Claim/complete for context 0
+ base + 0x200008: Reserved
+ ...
+ base + 0x200FFC: Reserved
+ base + 0x201000: Priority threshold for context 1
+ base + 0x201004: Claim/complete for context 1
+ ...
+ base + 0x3FFE000: Priority threshold for context 15871
+ base + 0x3FFE004: Claim/complete for context 15871
+ base + 0x3FFE008: Reserved
+ ...
+ base + 0x3FFFFFC: Reserved
+
+Sections below describe the control register blocks of PLIC operation parameters.
+
+== Register Width
+
+The memory map register width is in 32-bit.
+
+== Interrupt Priorities
+
+If PLIC supports Interrupt Priorities, then each PLIC interrupt source can be assigned a priority by writing to its 32-bit
+memory-mapped `priority` register. A priority value of 0 is reserved to mean ''never interrupt'' and effectively
+disables the interrupt. Priority 1 is the lowest active priority while the maximum level of priority depends on
+PLIC implementation. Ties between global interrupts of the same priority are broken by the Interrupt ID; interrupts
+with the lowest ID have the highest
+effective priority. +
+ +
+The base address of Interrupt Source Priority block within PLIC Memory Map region is fixed at 0x000000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name*| *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Source Priority
+|Interrupt Source Priority #0 to #1023
+|1024 * 4 = 4096(0x1000) bytes
+|This is a continuously memory block which contains PLIC Interrupt Source Priority. Total 1024 Interrupt Source Priority
+in this memory block. Interrupt Source Priority #0 is reserved which indicates it does not exist.
+|===
+
+*PLIC Interrupt Source Priority Memory Map* +
+
+ 0x000000: Reserved (interrupt source 0 does not exist)
+ 0x000004: Interrupt source 1 priority
+ 0x000008: Interrupt source 2 priority
+ ...
+ 0x000FFC: Interrupt source 1023 priority
+
+== Interrupt Pending Bits
+
+The current status of the interrupt source pending bits in the PLIC core can be
+read from the pending array, organized as 32-bit register. The pending bit
+for interrupt ID N is stored in bit (N mod 32) of word (N/32). Bit 0
+of word 0, which represents the non-existent interrupt source 0, is hardwired
+to zero.
+
+A pending bit in the PLIC core can be cleared by setting the associated enable
+bit then performing a claim. +
+ +
+The base address of Interrupt Pending Bits block within PLIC Memory Map region is fixed at 0x001000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Pending Bits
+|Interrupt Pending Bit of Interrupt Source #0 to #N
+|1024 / 8 = 128(0x80) bytes
+|This is a continuously memory block contains PLIC Interrupt Pending Bits. Each Interrupt Pending Bit occupies 1-bit from this register block.
+|===
+
+*PLIC Interrupt Pending Bits Memory Map* +
+
+ 0x001000: Interrupt Source #0 to #31 Pending Bits
+ ...
+ 0x00107C: Interrupt Source #992 to #1023 Pending Bits
+
+
+== Interrupt Enables
+
+Each global interrupt can be enabled by setting the corresponding bit in the
+`enables` register. The `enables` registers are accessed as a contiguous array
+of 32-bit registers, packed the same way as the `pending` bits. Bit 0 of enable
+register 0 represents the non-existent interrupt ID 0 and is hardwired to 0.
+PLIC has 15872 Interrupt Enable blocks for the contexts. The `context` is referred
+to the specific privilege mode in the specific Hart of specific RISC-V processor
+instance. How PLIC organizes interrupts for the contexts (Hart and privilege mode)
+is out of RISC-V PLIC specification scope, however it must be spec-out in vendor's
+PLIC specification. +
+ +
+The base address of Interrupt Enable Bits block within PLIC Memory Map region is fixed at 0x002000. +
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Enable Bits
+|Interrupt Enable Bit of Interrupt Source #0 to #1023 for 15872 contexts
+|(1024 / 8) * 15872 = 2031616(0x1f0000) bytes
+|This is a continuously memory block contains PLIC Interrupt Enable Bits of 15872 contexts.
+Each Interrupt Enable Bit occupies 1-bit from this register block and total 15872 Interrupt
+Enable Bit blocks
+|===
+
+*PLIC Interrupt Enable Bits Memory Map* +
+
+ 0x002000: Interrupt Source #0 to #31 Enable Bits on context 0
+ ...
+ 0x00207F: Interrupt Source #992 to #1023 Enable Bits on context 0
+ 0x002080: Interrupt Source #0 to #31 Enable Bits on context 1
+ ...
+ 0x0020FF: Interrupt Source #992 to #1023 Enable Bits on context 1
+ 0x002100: Interrupt Source #0 to #31 Enable Bits on context 2
+ ...
+ 0x00217F: Interrupt Source #992 to #1023 Enable Bits on context 2
+ 0x002180: Interrupt Source #0 to #31 Enable Bits on context 3
+ ...
+ 0x0021FF: Interrupt Source #992 to #1023 Enable Bits on context 3
+ ...
+ ...
+ ...
+ 0x1F1F80: Interrupt Source #0 to #31 on context 15871
+ ...
+ 0x1F1F80: Interrupt Source #992 to #1023 on context 15871
+
+== Priority Thresholds
+
+PLIC provides context based `threshold register` for the settings of a interrupt priority
+threshold of each context. The `threshold register` is a WARL field. The PLIC will mask all
+PLIC interrupts of a priority less than or equal to `threshold`. For example,
+a`threshold` value of zero permits all interrupts with non-zero priority. +
+ +
+The base address of Priority Thresholds register block is located at 4K alignement starts
+from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Priority Threshold
+|Priority Threshold for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register of Priority Thresholds setting for each conetxt
+|===
+
+*PLIC Interrupt Priority Thresholds Memory Map* +
+
+ 0x200000: Priority threshold for context 0
+ 0x201000: Priority threshold for context 1
+ 0x202000: Priority threshold for context 2
+ 0x203000: Priority threshold for context 3
+ ...
+ ...
+ ...
+ 0x3FFF000: Priority threshold for context 15871
+
+== Interrupt Claim Process
+
+The PLIC can perform an interrupt claim by reading the `claim/complete`
+register, which returns the ID of the highest priority pending interrupt or
+zero if there is no pending interrupt. A successful claim will also atomically
+clear the corresponding pending bit on the interrupt source. +
+The PLIC can perform a claim at any time and the claim operation is not affected
+by the setting of the priority threshold register. +
+The Interrupt Claim Process register is context based and is located at
+(4K alignement + 4) starts from offset 0x200000.
+
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Function*|*Register Block Size in Byte*| *Description*
+|Interrupt Claim Register
+|Interrupt Claim Process for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is the register used to acquire interrupt ID for each conetxt
+|===
+
+*PLIC Interrupt Claim Process Memory Map* +
+
+ 0x200004: Interrupt Claim Process for context 0
+ 0x201004: Interrupt Claim Process for context 1
+ 0x202004: Interrupt Claim Process for context 2
+ 0x203004: Interrupt Claim Process for context 3
+ ...
+ ...
+ ...
+ 0x3FFF004: Interrupt Claim Process for context 15871
+
+## Interrupt Completion
+
+The PLIC signals it has completed executing an interrupt handler by writing the
+interrupt ID it received from the claim to the `claim/complete` register. The
+PLIC does not check whether the completion ID is the same as the last claim ID
+for that target. If the completion ID does not match an interrupt source that
+is currently enabled for the target, the completion is silently ignored. +
+The Interrupt Completion registers are context based and located at the same address
+with Interrupt Claim Process register, which is at (4K alignement + 4) starts from
+offset 0x200000.
+ +
+[cols="15%,20%,20%,45%"]
+|===
+| *PLIC Register Block Name* | *Registers*|*Register Block Size in Byte*| *Description*
+|Interrupt Completion Register
+|Interrupt Completion for 15872 contexts
+|4096 * 15872 = 65011712(0x3e00000) bytes
+|This is register to write to complete Interrupt process
+|===
+
+*PLIC Interrupt Completion Memory Map* +
+
+ 0x200004: Interrupt Completion for context 0
+ 0x201004: Interrupt Completion for context 1
+ 0x202004: Interrupt Completion for context 2
+ 0x203004: Interrupt Completion for context 3
+ ...
+ ...
+ ...
+ 0x3FFF004: Interrupt Completion for context 15871
+
--
2.19.0.windows.1


[PATCH 0/1] Initial commit of PLIC

Abner Chang
 

From: Abner Chang <abner.chang@...>

As Atish mentioned in the meeting, resend the patch to this task
group for the widely review becasue this document is referred in
RISC-V platform spec.

Abner Chang (1):
Initial commit of PLIC

--
2.19.0.windows.1


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Fri, Jun 18, 2021 at 9:01 AM Abner Chang <renba.chang@...> wrote:
Greg Favor <gfavor@...> 於 2021年6月18日 週五 上午2:03寫道:
On Thu, Jun 17, 2021 at 8:56 AM Abner Chang <renba.chang@...> wrote:
- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.

I would have thought that this is just a software issue.  What kind of hardware mechanism do you picture being needed?
That could be,
- If RAS error triggers M-mode (FFM) and firmware decides to expose the error to OS (could be configured through CSR or RAS registers), then the RAS OS interrupt can be triggered when the system exits M-mode.
- or If RAS error  triggers Management mode in TEE, then  the RAS OS interrupt to can be triggered when the system exits TEE.
The knob of exposing RAS errors to OS could go with each RAS error configuration register or just one centralized RAS register or CSR for all RAS errors.
Suppose the event to bring the system to TEE has the most priority even the system is executing in M-Mode. This makes sure firmware can address the RAS error immediately when it happens in any privilege.

Thanks.  This does seem to be all a matter of software configuring and handling things appropriately.
 

- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

Wouldn't the TEE be running in M-mode?  Or where is it expected to be running?
yes,TEE is be running in M-mode if the memory serves me right from the spec. My expectation of TEE is there would be an event that can be triggered by either hardware or software to bring the system to TEE no matter which mode the HART is currently running, I am not sure if this is how TEE would be implemented.

Then this just becomes a matter of software configuring the interrupt controller to direct a given interrupt source to a given privilege mode.

 
For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.

In x86 and ARM platforms, doesn't the OS pretty much always handle PCIe AER errors (i.e. OS-first for this class of errors)?  (I was reading an Intel overview doc recently that essentially said that - irrespective of whether other classes of errors are OS-first or firmware-first).)
Besides correcting the error in firmware, firmware also logs the necessary PCIe error events to BMC before OS handling that. The firmware RAS logs are retrieved in out-of-band even the system is shut down or the OS crashes. This increases the diagnosability and decreases the cost of customer service in the field.

Just fyi, this paper discusses use of both models in the x86 world: a-tour-beyond-bios-implementing-the-acpi-platform-error-interface-with-the-uefi.  As a number of us will remember from the ARMv8 days, there were big (as in religious) arguments over which model was the right one to adopt.  Ultimately it was accepted that both need to be supported by the architecture.  The point being that the OS/A platform spec should support both and not presume one as the one and only answer.
 
Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Abner Chang
 



Greg Favor <gfavor@...> 於 2021年6月18日 週五 上午2:03寫道:
On Thu, Jun 17, 2021 at 8:56 AM Abner Chang <renba.chang@...> wrote:
- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.

I would have thought that this is just a software issue.  What kind of hardware mechanism do you picture being needed?
That could be,
- If RAS error triggers M-mode (FFM) and firmware decides to expose the error to OS (could be configured through CSR or RAS registers), then the RAS OS interrupt can be triggered when the system exits M-mode.
- or If RAS error  triggers Management mode in TEE, then  the RAS OS interrupt to can be triggered when the system exits TEE.
The knob of exposing RAS errors to OS could go with each RAS error configuration register or just one centralized RAS register or CSR for all RAS errors.
Suppose the event to bring the system to TEE has the most priority even the system is executing in M-Mode. This makes sure firmware can address the RAS error immediately when it happens in any privilege.
 
- Each RAS error should be able to mask through RAS configuration registers.

By "mask" do you mean masking of generation of an error interrupt?
Yes, to mask the RAS error interrupt or even not to create the log (in RAS status registers or CSR) that OEM doesn't consider that is a useful or important error to product. 
 
- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

Wouldn't the TEE be running in M-mode?  Or where is it expected to be running?
yes,TEE is be running in M-mode if the memory serves me right from the spec. My expectation of TEE is there would be an event that can be triggered by either hardware or software to bring the system to TEE no matter which mode the HART is currently running, I am not sure if this is how TEE would be implemented.
 
For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.

In x86 and ARM platforms, doesn't the OS pretty much always handle PCIe AER errors (i.e. OS-first for this class of errors)?  (I was reading an Intel overview doc recently that essentially said that - irrespective of whether other classes of errors are OS-first or firmware-first).)
Besides correcting the error in firmware, firmware also logs the necessary PCIe error events to BMC before OS handling that. The firmware RAS logs are retrieved in out-of-band even the system is shut down or the OS crashes. This increases the diagnosability and decreases the cost of customer service in the field.

Abner


Besides memory and PCIe RAS, do we have RAS errors for the processor/HART? such as IPI error or some CE/UC/UCR to HART locally?

Definitely there will be processor/hart errors.  Presumably each hart would output one or more RAS interrupt request signals.

Greg


Re: Non-coherent I/O

Greg Favor
 

On Mon, Jun 14, 2021 at 1:04 PM Greg Favor <gfavor@...> wrote:
I have already sent questions to Andrew to get the official view as to the intent of this aspect of the Priv spec and what is the proper way or perspective with which to be reading the ISA specs.  That then may result in the need for clarifying text to be added to the spec.  And once it is clear as to the scope and bounds of the ISA specs and what they require and allow, then it is left to profile and platform specs to specify tighter requirements.
Here's the results of my Q&A with Andrew: 

- The Priv (and Unpriv) ISA specs are just that.  They are CPU architecture specs and should be read with that limited scope in mind.  They may touch on system-level issues, but they are not trying to constrain the flexibility in how these issues are handled across a wide range of system designs.  (I'll personally add on that RVI now makes an official distinction between ISA (Unpriv and Priv) and Non-ISA (aka system-related) arch specs.  The former apply inside of a hart; the latter apply outside of a hart.)

- Per above, PMAs and the PMA coherency attribute are CPU-specific and only apply to memory accesses by harts.  (One can choose to apply these ideas to accesses by other master agents in a system, but that's not officially a Priv spec matter.)

- The PMA coherency attribute only applies to that hart's accesses.  It is up to software to configure the PMAs in all harts to be the same, or not, as desired.  What is done for non-hart accesses (i.e. by I/O devices) is not specified by the Priv spec.  Hence there are no implications on I/O coherency, one way or another, by the Priv spec.

Naturally many if not most system designs will extend these ideas in some manner across the system and to other masters.  And platform specs may choose to specify and mandate some or all of this.  But that's not the business of the ISA specs.

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Allen Baum
 

Good answers all around; I didn't pick up on the difference between OS-A base and OS-A server difference.

It makes sense in hindsight for the manufacturers to set the MTBF goal and design to meet it. I was concerned that this could be met without the complexity of SB L!D caches, but if those a typical base platforms instead of server platforms, anyway, its not a significant concern in any case.

On Thu, Jun 17, 2021 at 12:01 PM Kumar Sankaran <ksankaran@...> wrote:

To add to what Greg mentioned below, the RAS features as mentioned in the patch is required only for the OS-A platform server extension. We are not mandating any RAS requirements for the OS-A base platform compatibility.

 

Regards

Kumar

From: Greg Favor <gfavor@...>
Sent: Thursday, June 17, 2021 11:54 AM
To: Allen Baum <allen.baum@...>
Cc: Abner Chang <renba.chang@...>; Kumar Sankaran <ksankaran@...>; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] [PATCH 1/1] RAS features for OS-A platform server extension

 

On Thu, Jun 17, 2021 at 11:13 AM Allen Baum <allen.baum@...> wrote:

Is it acceptable to everyone that all single bit errors on all caches must be correctable?

 

Nowadays single-bit errors are far from rare.  There will always be people that run Linux and are willing to accept occasional silent corruptions and whatever mysterious application/data corruptions occur as a result.  But for a standardized server-class platform spec, this is a rather low "table stakes" bar to set.  Virtually no customer of a "server-class" platform will be comfortable without that (especially since the x86 and ARM alternatives provide at least that).

 

That really affects designs in fundamental ways for L1 caches (as opposed to simply detecting).

 

Parity (and invalidate on error detection) suffices for I and WT D caches; and ECC is used on WB D caches.  Even L1 D caches (which is one argument for doing a WT L1 D cache with parity, but the majority of people still do WB L1 D caches with ECC).

 

Understandably some people don't want to deal with ECC on a WB DL1, and parity or nothing may be fine for less-than server-class systems.

 

Not as big a concern for L2 and above.

Speaking from my Intel experience, the rule was expressed as failures per year - and if an L1 cache was small enough to exceed that number, then it didn't need correction.

 

Somewhat analogous, TSMC imposes similarly expressed requirements wrt having redundancy in all the RAMs.  Even just one non-redundant 64 KiB cache can pretty much use up what is allowed to not have redundancy.

 

In any case, the Base platform spec should allow people to make whatever choice they want (and live with the consequences).  But to be competitive and to meet customer expectations (especially in a multi-core world), the Server spec needs to require a higher-than-nothing bar.

 

So, it might be useful to have a measurement baseline like that, rather than an absolute requirement.

 

A functional requirement is simple to specify and aligns with standard industry practices.  The alternatives get more involved and in practice won't provide much of any value over the functional requirement (for server-class systems).

 

The argument is why are you requiring ecc correction on this - and not the register file, or CSRs?

 

This is a baseline requirement - aligned with common/dominant industry practice.  Conversely it is not a dominant industry practice to protect flop-based register files (or flop-based storage structures in general).  (Latch-based register files, depending on whether the bitcell is more SRAM-like or flop-like, fall in one category or the other.)

 

The reason is they're small enough that failures are unlikely - and that's what your rationale should be stated.

 

Nowadays even the aggregate error rate or MTBF due to flop soft errors is not small.  But thankfully for most designs that MTBF component is acceptable within typical MTBF budgets.

 

As far as instead specifying an MTBF requirement, one then gets into system-wide issues and overall MTBF budgets, where it gets spent, what about the technology dependence of all this, and ....  Plus that effectively would provide little guidance to CPU designers as to what is their individual MTBF budget.  Or, conversely, one can probably have long discussions/arguments about what is the right MTBF number to require at the level of a single CPU core.

 

But at the end of the day very few or virtually no customer of a server-class system is going to accept a product that doesn't even have single-bit error protection on the cache hierarchy.

 

Greg

 


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Kumar Sankaran
 

To add to what Greg mentioned below, the RAS features as mentioned in the patch is required only for the OS-A platform server extension. We are not mandating any RAS requirements for the OS-A base platform compatibility.

 

Regards

Kumar

From: Greg Favor <gfavor@...>
Sent: Thursday, June 17, 2021 11:54 AM
To: Allen Baum <allen.baum@...>
Cc: Abner Chang <renba.chang@...>; Kumar Sankaran <ksankaran@...>; tech-unixplatformspec@...
Subject: Re: [RISC-V] [tech-unixplatformspec] [PATCH 1/1] RAS features for OS-A platform server extension

 

On Thu, Jun 17, 2021 at 11:13 AM Allen Baum <allen.baum@...> wrote:

Is it acceptable to everyone that all single bit errors on all caches must be correctable?

 

Nowadays single-bit errors are far from rare.  There will always be people that run Linux and are willing to accept occasional silent corruptions and whatever mysterious application/data corruptions occur as a result.  But for a standardized server-class platform spec, this is a rather low "table stakes" bar to set.  Virtually no customer of a "server-class" platform will be comfortable without that (especially since the x86 and ARM alternatives provide at least that).

 

That really affects designs in fundamental ways for L1 caches (as opposed to simply detecting).

 

Parity (and invalidate on error detection) suffices for I and WT D caches; and ECC is used on WB D caches.  Even L1 D caches (which is one argument for doing a WT L1 D cache with parity, but the majority of people still do WB L1 D caches with ECC).

 

Understandably some people don't want to deal with ECC on a WB DL1, and parity or nothing may be fine for less-than server-class systems.

 

Not as big a concern for L2 and above.

Speaking from my Intel experience, the rule was expressed as failures per year - and if an L1 cache was small enough to exceed that number, then it didn't need correction.

 

Somewhat analogous, TSMC imposes similarly expressed requirements wrt having redundancy in all the RAMs.  Even just one non-redundant 64 KiB cache can pretty much use up what is allowed to not have redundancy.

 

In any case, the Base platform spec should allow people to make whatever choice they want (and live with the consequences).  But to be competitive and to meet customer expectations (especially in a multi-core world), the Server spec needs to require a higher-than-nothing bar.

 

So, it might be useful to have a measurement baseline like that, rather than an absolute requirement.

 

A functional requirement is simple to specify and aligns with standard industry practices.  The alternatives get more involved and in practice won't provide much of any value over the functional requirement (for server-class systems).

 

The argument is why are you requiring ecc correction on this - and not the register file, or CSRs?

 

This is a baseline requirement - aligned with common/dominant industry practice.  Conversely it is not a dominant industry practice to protect flop-based register files (or flop-based storage structures in general).  (Latch-based register files, depending on whether the bitcell is more SRAM-like or flop-like, fall in one category or the other.)

 

The reason is they're small enough that failures are unlikely - and that's what your rationale should be stated.

 

Nowadays even the aggregate error rate or MTBF due to flop soft errors is not small.  But thankfully for most designs that MTBF component is acceptable within typical MTBF budgets.

 

As far as instead specifying an MTBF requirement, one then gets into system-wide issues and overall MTBF budgets, where it gets spent, what about the technology dependence of all this, and ....  Plus that effectively would provide little guidance to CPU designers as to what is their individual MTBF budget.  Or, conversely, one can probably have long discussions/arguments about what is the right MTBF number to require at the level of a single CPU core.

 

But at the end of the day very few or virtually no customer of a server-class system is going to accept a product that doesn't even have single-bit error protection on the cache hierarchy.

 

Greg

 


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Thu, Jun 17, 2021 at 11:13 AM Allen Baum <allen.baum@...> wrote:
Is it acceptable to everyone that all single bit errors on all caches must be correctable?

Nowadays single-bit errors are far from rare.  There will always be people that run Linux and are willing to accept occasional silent corruptions and whatever mysterious application/data corruptions occur as a result.  But for a standardized server-class platform spec, this is a rather low "table stakes" bar to set.  Virtually no customer of a "server-class" platform will be comfortable without that (especially since the x86 and ARM alternatives provide at least that).
 
That really affects designs in fundamental ways for L1 caches (as opposed to simply detecting).

Parity (and invalidate on error detection) suffices for I and WT D caches; and ECC is used on WB D caches.  Even L1 D caches (which is one argument for doing a WT L1 D cache with parity, but the majority of people still do WB L1 D caches with ECC).

Understandably some people don't want to deal with ECC on a WB DL1, and parity or nothing may be fine for less-than server-class systems.
 
Not as big a concern for L2 and above.
Speaking from my Intel experience, the rule was expressed as failures per year - and if an L1 cache was small enough to exceed that number, then it didn't need correction.

Somewhat analogous, TSMC imposes similarly expressed requirements wrt having redundancy in all the RAMs.  Even just one non-redundant 64 KiB cache can pretty much use up what is allowed to not have redundancy.

In any case, the Base platform spec should allow people to make whatever choice they want (and live with the consequences).  But to be competitive and to meet customer expectations (especially in a multi-core world), the Server spec needs to require a higher-than-nothing bar.
 
So, it might be useful to have a measurement baseline like that, rather than an absolute requirement.

A functional requirement is simple to specify and aligns with standard industry practices.  The alternatives get more involved and in practice won't provide much of any value over the functional requirement (for server-class systems).

The argument is why are you requiring ecc correction on this - and not the register file, or CSRs?

This is a baseline requirement - aligned with common/dominant industry practice.  Conversely it is not a dominant industry practice to protect flop-based register files (or flop-based storage structures in general).  (Latch-based register files, depending on whether the bitcell is more SRAM-like or flop-like, fall in one category or the other.)

The reason is they're small enough that failures are unlikely - and that's what your rationale should be stated.

Nowadays even the aggregate error rate or MTBF due to flop soft errors is not small.  But thankfully for most designs that MTBF component is acceptable within typical MTBF budgets.

As far as instead specifying an MTBF requirement, one then gets into system-wide issues and overall MTBF budgets, where it gets spent, what about the technology dependence of all this, and ....  Plus that effectively would provide little guidance to CPU designers as to what is their individual MTBF budget.  Or, conversely, one can probably have long discussions/arguments about what is the right MTBF number to require at the level of a single CPU core.

But at the end of the day very few or virtually no customer of a server-class system is going to accept a product that doesn't even have single-bit error protection on the cache hierarchy.

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Allen Baum
 

Is it acceptable to everyone that all single bit errors on all caches must be correctable?
That really affects designs in fundamental ways for L1 caches (as opposed to simply detecting).
Not as big a concern for L2 and above.
Speaking from my Intel experience, the rule was expressed as failures per year - and if an L1 cache was small enough to exceed that number, then it didn't need correction.
So, it might be useful to have a measurement baseline like that, rather than an absolute requirement.

The argument is why are you requiring ecc correction on this - and not the register file, or CSRs?
The reason is they're small enough that failures are unlikely - and that's what your rationale should be stated.
There will be platforms that are much more demanding (safety critical) where duplication is required, or majority voting.
I didn't think that we were talking about those application areas.



On Thu, Jun 17, 2021 at 8:56 AM Abner Chang <renba.chang@...> wrote:


Kumar Sankaran <ksankaran@...> 於 2021年6月16日 週三 上午8:17寫道:
Signed-off-by: Kumar Sankaran <ksankaran@...>
---
 riscv-platform-spec.adoc | 42 ++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/riscv-platform-spec.adoc b/riscv-platform-spec.adoc
index 4c356b8..d779452 100644
--- a/riscv-platform-spec.adoc
+++ b/riscv-platform-spec.adoc
@@ -19,18 +19,6 @@
 // table of contents
 toc::[]

-// document copyright and licensing information
-include::licensing.adoc[]
-
-// changelog for the document
-include::changelog.adoc[]
-
-// Introduction: describe the intent and purpose of the document
-include::introduction.adoc[]
-
-// Profiles: (NB: content from very first version)
-include::profiles.adoc[]
-
 == Introduction
 The platform specification defines a set of platforms that specify requirements
 for interoperability between software and hardware. The platform policy
@@ -68,11 +56,13 @@ The M platform has the following extensions:
 |SBI       | Supervisor Binary Interface
 |UEFI      | Unified Extensible Firmware Interface
 |ACPI      | Advanced Configuration and Power Interface
+|APEI      | ACPI Platform Error Interfaces
 |SMBIOS    | System Management Basic I/O System
 |DTS       | Devicetree source file
 |DTB       | Devicetree binary
 |RVA22     | RISC-V Application 2022
 |EE        | Execution Environment
+|OSPM      | Operating System Power Management
 |RV32GC    | RISC-V 32-bit general purpose ISA described as RV32IMAFDC.
 |RV64GC    | RISC-V 64-bit general purpose ISA described as RV64IMAFDC.
 |===
@@ -87,6 +77,7 @@ The M platform has the following extensions:
 |link:[RVA22 Specification]
                                        | TBD
 |link:https://arm-software.github.io/ebbr/[EBBR Specification]
                                        | v2.0.0-pre1
 |link:https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf[ACPI
Specification]              | v6.4
+|link:https://uefi.org/specs/ACPI/6.4/18_ACPI_Platform_Error_Interfaces/ACPI_PLatform_Error_Interfaces.html[APEI
Specification]              | v6.4
 |link:https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.4.0.pdf[SMBIOS
Specification]    | v3.4.0
 |link:[Platform Policy]
                                        | TBD
 |===
@@ -504,6 +495,30 @@ delegate the virtual supervisor timer interrupt
to 'VS' mode.
 * IOMMU

 ==== RAS
+All the below mentioned RAS features are required for the OS-A platform server
+extension
+
+*  Main memory must be protected with SECDED-ECC +
+*  All cache structures must be protected +
+** single-bit errors must be detected and corrected +
+** multi-bit errors can be detected and reported +
+* There must be memory-mapped RAS registers associated with these protected
+structures to log detected errors with information about the type and location
+of the error +
+* The platform must support the APEI specification to convey all error
+information to OSPM +
+* Correctable errors must be reported by hardware and either be corrected or
+recovered by hardware, transparent to system operation and to software +
+* Hardware must provide status of these correctable errors via RAS registers +
+* Uncorrectable errors must be reported by the hardware via RAS error
+registers for system software to take the needed corrective action +
+* Attempted use of corrupted (uncorrectable) data must result in a precise
+exception on that instruction with a distinguishing custom exception cause
+code +
+* Errors logged in RAS registers must be able to generate an interrupt request
+to the system interrupt controller that may be directed to either M-mode or
+S/HS-mode for firmware-first versus OS-first error reporting +
+* PCIe AER capability is required +

Hi Kumar,
I would like to add something.
In order to support the OEM RAS policy,
- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.
- Each RAS error should be able to mask through RAS configuration registers.
- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.
 
Besides memory and PCIe RAS, do we have RAS errors for the processor/HART? such as IPI error or some CE/UC/UCR to HART locally?

Regards,
Abner

 // M Platform
 == M Platform
@@ -593,6 +608,3 @@ also implement PMP support.
 When PMP is supported it is recommended to include at least 4 regions, although
 if possible more should be supported to allow more flexibility. Hardware
 implementations should aim for supporting at least 16 PMP regions.
-
-// acknowledge all of the contributors
-include::contributors.adoc[]
--
2.21.0






Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor
 

On Thu, Jun 17, 2021 at 8:56 AM Abner Chang <renba.chang@...> wrote:
- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.

I would have thought that this is just a software issue.  What kind of hardware mechanism do you picture being needed?
 
- Each RAS error should be able to mask through RAS configuration registers.

By "mask" do you mean masking of generation of an error interrupt?
 
- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

Wouldn't the TEE be running in M-mode?  Or where is it expected to be running?
 
For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.

In x86 and ARM platforms, doesn't the OS pretty much always handle PCIe AER errors (i.e. OS-first for this class of errors)?  (I was reading an Intel overview doc recently that essentially said that - irrespective of whether other classes of errors are OS-first or firmware-first).)

Besides memory and PCIe RAS, do we have RAS errors for the processor/HART? such as IPI error or some CE/UC/UCR to HART locally?

Definitely there will be processor/hart errors.  Presumably each hart would output one or more RAS interrupt request signals.

Greg


Re: [PATCH 1/1] RAS features for OS-A platform server extension

Abner Chang
 



Kumar Sankaran <ksankaran@...> 於 2021年6月16日 週三 上午8:17寫道:
Signed-off-by: Kumar Sankaran <ksankaran@...>
---
 riscv-platform-spec.adoc | 42 ++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/riscv-platform-spec.adoc b/riscv-platform-spec.adoc
index 4c356b8..d779452 100644
--- a/riscv-platform-spec.adoc
+++ b/riscv-platform-spec.adoc
@@ -19,18 +19,6 @@
 // table of contents
 toc::[]

-// document copyright and licensing information
-include::licensing.adoc[]
-
-// changelog for the document
-include::changelog.adoc[]
-
-// Introduction: describe the intent and purpose of the document
-include::introduction.adoc[]
-
-// Profiles: (NB: content from very first version)
-include::profiles.adoc[]
-
 == Introduction
 The platform specification defines a set of platforms that specify requirements
 for interoperability between software and hardware. The platform policy
@@ -68,11 +56,13 @@ The M platform has the following extensions:
 |SBI       | Supervisor Binary Interface
 |UEFI      | Unified Extensible Firmware Interface
 |ACPI      | Advanced Configuration and Power Interface
+|APEI      | ACPI Platform Error Interfaces
 |SMBIOS    | System Management Basic I/O System
 |DTS       | Devicetree source file
 |DTB       | Devicetree binary
 |RVA22     | RISC-V Application 2022
 |EE        | Execution Environment
+|OSPM      | Operating System Power Management
 |RV32GC    | RISC-V 32-bit general purpose ISA described as RV32IMAFDC.
 |RV64GC    | RISC-V 64-bit general purpose ISA described as RV64IMAFDC.
 |===
@@ -87,6 +77,7 @@ The M platform has the following extensions:
 |link:[RVA22 Specification]
                                        | TBD
 |link:https://arm-software.github.io/ebbr/[EBBR Specification]
                                        | v2.0.0-pre1
 |link:https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf[ACPI
Specification]              | v6.4
+|link:https://uefi.org/specs/ACPI/6.4/18_ACPI_Platform_Error_Interfaces/ACPI_PLatform_Error_Interfaces.html[APEI
Specification]              | v6.4
 |link:https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.4.0.pdf[SMBIOS
Specification]    | v3.4.0
 |link:[Platform Policy]
                                        | TBD
 |===
@@ -504,6 +495,30 @@ delegate the virtual supervisor timer interrupt
to 'VS' mode.
 * IOMMU

 ==== RAS
+All the below mentioned RAS features are required for the OS-A platform server
+extension
+
+*  Main memory must be protected with SECDED-ECC +
+*  All cache structures must be protected +
+** single-bit errors must be detected and corrected +
+** multi-bit errors can be detected and reported +
+* There must be memory-mapped RAS registers associated with these protected
+structures to log detected errors with information about the type and location
+of the error +
+* The platform must support the APEI specification to convey all error
+information to OSPM +
+* Correctable errors must be reported by hardware and either be corrected or
+recovered by hardware, transparent to system operation and to software +
+* Hardware must provide status of these correctable errors via RAS registers +
+* Uncorrectable errors must be reported by the hardware via RAS error
+registers for system software to take the needed corrective action +
+* Attempted use of corrupted (uncorrectable) data must result in a precise
+exception on that instruction with a distinguishing custom exception cause
+code +
+* Errors logged in RAS registers must be able to generate an interrupt request
+to the system interrupt controller that may be directed to either M-mode or
+S/HS-mode for firmware-first versus OS-first error reporting +
+* PCIe AER capability is required +

Hi Kumar,
I would like to add something.
In order to support the OEM RAS policy,
- The platform should provide the capability to configure each RAS error to trigger firmware-first or OS-first error interrupt.
- If the RAS error is handled by firmware, the firmware should be able to choose to expose the error to S/HS mode for further processes or just hide the error from S/HS software. This requires some mechanisms provided by the platform and the mechanism should be protected by M-mode.
- Each RAS error should be able to mask through RAS configuration registers.
- We should also consider triggering RAS error interrupt to TEE which is where the firmware management mode resides.

For PCIe RAS,
- The baseline PCIe error or AER interrupt is able to be morphed to firmware-first interrupt before delivering to H/HS software. This gives firmware a chance to log the error, correct the error or hide the error from S/HS software according to OEM RAS policy.
 
Besides memory and PCIe RAS, do we have RAS errors for the processor/HART? such as IPI error or some CE/UC/UCR to HART locally?

Regards,
Abner

 // M Platform
 == M Platform
@@ -593,6 +608,3 @@ also implement PMP support.
 When PMP is supported it is recommended to include at least 4 regions, although
 if possible more should be supported to allow more flexibility. Hardware
 implementations should aim for supporting at least 16 PMP regions.
-
-// acknowledge all of the contributors
-include::contributors.adoc[]
--
2.21.0






Re: [PATCH] Add direct memory access synchronize extension

Allen Baum
 

Arch-test should be involved also.
It is (more than) a bit  complicated because CMOs are instructions that affect non-architectural bits of an implementation 
- so it's unclear what it even means to have an architectural test, much less how to write one.
The framework and tests are currently only handling deterministic archtecutral state and 
The definition of done has an architectural test component, and a proof of concept component.
The CMOs can only do the proof-of-concept part because of the above.

On Tue, Jun 15, 2021 at 4:33 PM David Kruckemyer <dkruckemyer@...> wrote:
Hi all,

My apologies as I just got wind of this discussion (I was unable to attend the last few CMO TG meetings due to travel). I think we should sync up on the CMO TG and SBI/platform efforts since there seems to be a bit of disconnect.

Regarding the CMO TG goals, we have intended to get a basic subset of operations into the profile/platform specifications for this year. The "phase 1" status is listed here:


Though honestly, a bit of this is out of date already, so expect some clarification in the coming days (just need to do some terminology cleanup).

Please do not hesitate to reach out to me with any questions (or to post questions to the the CMO TG mailing list: tech-cmo@... )

Cheers,
David


On Mon, Jun 7, 2021 at 2:35 AM Nick Kossifidis <mick@...> wrote:
Στις 2021-06-07 07:03, Anup Patel έγραψε:
>
> Let's have a simple SBI DMA sync extension in SBI v0.4 spec.
>
> The shared code pages between M-mode and S-mode will have it's own
> Challenges and we will have to define more stuff in SBI spec to support
> this (see above).
>

Totally agree with you, I just thought it'd be a good opportunity to
bring this up so that we can discuss it at some point, let's have
something that works and we can optimize it later on.

> It seems CMO extension might freeze sooner than we think (others can
> comment on this). If CMO extension is frozen by year end then we can
> trap-n-emulate CMO instructions instead of SBI DMA sync extension. If
> it does not freeze by year end then we will have to go ahead with
> SBI DMA sync extension as stop-gap solution.
>

The CMOs TG has a meeting today, I'll try and join and ask for updates
on this.






Re: [PATCH] Add direct memory access synchronize extension

David Kruckemyer
 

FWIW, our (the CMO TG's) priorities are in order as follows:

- Zicbom (maintenance)
- Zicboz (zero)
- Zicbop (prefetch)

We happen to have provisional opcodes for both Zicbom and Zicboz (mostly since they occupy the same real estate).

The primary goal now is to take our overly general spec and distill it down into the three extensions and limit it to the Phase 1 material. Volunteers to help out with that would be greatly appreciated.... :)

Cheers,
David


On Tue, Jun 15, 2021 at 7:32 PM Anup Patel <Anup.Patel@...> wrote:

Hi Paul,

 

Everyone over here is well aware of the importance of fast-tracking basic CMO instructions and getting it frozen soon. The CMO group is also aware of their priorities so we should let them tackle this instead of proposing how they should work.

 

As mentioned quite a few time in this email thread, the SBI DMA sync is only a stop-gap solution (or backup plan) to tackle Linux RISC-V patch acceptance policy if we don’t get basic CMO instructions soon. We would certainly like to avoid SBI DMA sync extension if possible. In fact, we have not included SBI DMA sync extension in the recently frozen SBI v0.3-rc1 spec which will be released next month.

 

It is certainly possible to have basic CMO instructions frozen by 2021 year end. If this happens then we will discard SBI DMA sync proposal and emulate basic CMO instructions in OpenSBI for BeagleV and Allwinner D1 boards. In fact, Atish is still figuring out ways to avoid both SBI DMA sync and CMO instructions for at least BeagleV if that is possible.

 

Regards,

Anup

 

From: Paul Walmsley <paul.walmsley@...>
Sent: 16 June 2021 05:29
To: David Kruckemyer <dkruckemyer@...>; Nick Kossifidis <mick@...>
Cc: Anup Patel <Anup.Patel@...>; Atish Patra <Atish.Patra@...>; tech-unixplatformspec@...; Palmer Dabbelt <palmerdabbelt@...>; Palmer Dabbelt <palmer@...>; tech-cmo@...; John Ingalls <john.ingalls@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] [PATCH] Add direct memory access synchronize extension

 

It would be ideal if the CMO group could focus on fast-tracking the Cache Block Maintenance Operations for Phase 1 and get opcodes assigned, and this part of the specification frozen.  The maintenance operations are mandatory for non-CPU-cache-coherent peripheral DMA to work correctly; that's why these should be completed first.   As far as I can tell, prefetch and zeroing are strictly optimizations, so it would be best if these could be delayed to a Phase 2 -- which could be developed in parallel while Phase 1 goes through the opcode committee, etc. 

 

Then the SBI sync extension should be superfluous. It would be ideal if we could avoid having multiple mechanisms for the same operations.

 

For this to work, though, the CMO group needs to move on the block maintenance instructions quickly. 

 

 

- Paul

 

 

On 6/15/21 4:33 PM, David Kruckemyer wrote:

Hi all,

 

My apologies as I just got wind of this discussion (I was unable to attend the last few CMO TG meetings due to travel). I think we should sync up on the CMO TG and SBI/platform efforts since there seems to be a bit of disconnect.

 

Regarding the CMO TG goals, we have intended to get a basic subset of operations into the profile/platform specifications for this year. The "phase 1" status is listed here:

 

 

Though honestly, a bit of this is out of date already, so expect some clarification in the coming days (just need to do some terminology cleanup).

 

Please do not hesitate to reach out to me with any questions (or to post questions to the the CMO TG mailing list: tech-cmo@... )

 

Cheers,

David

 

 

On Mon, Jun 7, 2021 at 2:35 AM Nick Kossifidis <mick@...> wrote:

Στις 2021-06-07 07:03, Anup Patel έγραψε:
>
> Let's have a simple SBI DMA sync extension in SBI v0.4 spec.
>
> The shared code pages between M-mode and S-mode will have it's own
> Challenges and we will have to define more stuff in SBI spec to support
> this (see above).
>

Totally agree with you, I just thought it'd be a good opportunity to
bring this up so that we can discuss it at some point, let's have
something that works and we can optimize it later on.

> It seems CMO extension might freeze sooner than we think (others can
> comment on this). If CMO extension is frozen by year end then we can
> trap-n-emulate CMO instructions instead of SBI DMA sync extension. If
> it does not freeze by year end then we will have to go ahead with
> SBI DMA sync extension as stop-gap solution.
>

The CMOs TG has a meeting today, I'll try and join and ask for updates
on this.





Re: [PATCH] Add direct memory access synchronize extension

Anup Patel
 

Hi Paul,

 

Everyone over here is well aware of the importance of fast-tracking basic CMO instructions and getting it frozen soon. The CMO group is also aware of their priorities so we should let them tackle this instead of proposing how they should work.

 

As mentioned quite a few time in this email thread, the SBI DMA sync is only a stop-gap solution (or backup plan) to tackle Linux RISC-V patch acceptance policy if we don’t get basic CMO instructions soon. We would certainly like to avoid SBI DMA sync extension if possible. In fact, we have not included SBI DMA sync extension in the recently frozen SBI v0.3-rc1 spec which will be released next month.

 

It is certainly possible to have basic CMO instructions frozen by 2021 year end. If this happens then we will discard SBI DMA sync proposal and emulate basic CMO instructions in OpenSBI for BeagleV and Allwinner D1 boards. In fact, Atish is still figuring out ways to avoid both SBI DMA sync and CMO instructions for at least BeagleV if that is possible.

 

Regards,

Anup

 

From: Paul Walmsley <paul.walmsley@...>
Sent: 16 June 2021 05:29
To: David Kruckemyer <dkruckemyer@...>; Nick Kossifidis <mick@...>
Cc: Anup Patel <Anup.Patel@...>; Atish Patra <Atish.Patra@...>; tech-unixplatformspec@...; Palmer Dabbelt <palmerdabbelt@...>; Palmer Dabbelt <palmer@...>; tech-cmo@...; John Ingalls <john.ingalls@...>
Subject: Re: [RISC-V] [tech-unixplatformspec] [PATCH] Add direct memory access synchronize extension

 

It would be ideal if the CMO group could focus on fast-tracking the Cache Block Maintenance Operations for Phase 1 and get opcodes assigned, and this part of the specification frozen.  The maintenance operations are mandatory for non-CPU-cache-coherent peripheral DMA to work correctly; that's why these should be completed first.   As far as I can tell, prefetch and zeroing are strictly optimizations, so it would be best if these could be delayed to a Phase 2 -- which could be developed in parallel while Phase 1 goes through the opcode committee, etc. 

 

Then the SBI sync extension should be superfluous. It would be ideal if we could avoid having multiple mechanisms for the same operations.

 

For this to work, though, the CMO group needs to move on the block maintenance instructions quickly. 

 

 

- Paul

 

 

On 6/15/21 4:33 PM, David Kruckemyer wrote:

Hi all,

 

My apologies as I just got wind of this discussion (I was unable to attend the last few CMO TG meetings due to travel). I think we should sync up on the CMO TG and SBI/platform efforts since there seems to be a bit of disconnect.

 

Regarding the CMO TG goals, we have intended to get a basic subset of operations into the profile/platform specifications for this year. The "phase 1" status is listed here:

 

 

Though honestly, a bit of this is out of date already, so expect some clarification in the coming days (just need to do some terminology cleanup).

 

Please do not hesitate to reach out to me with any questions (or to post questions to the the CMO TG mailing list: tech-cmo@... )

 

Cheers,

David

 

 

On Mon, Jun 7, 2021 at 2:35 AM Nick Kossifidis <mick@...> wrote:

Στις 2021-06-07 07:03, Anup Patel έγραψε:
>
> Let's have a simple SBI DMA sync extension in SBI v0.4 spec.
>
> The shared code pages between M-mode and S-mode will have it's own
> Challenges and we will have to define more stuff in SBI spec to support
> this (see above).
>

Totally agree with you, I just thought it'd be a good opportunity to
bring this up so that we can discuss it at some point, let's have
something that works and we can optimize it later on.

> It seems CMO extension might freeze sooner than we think (others can
> comment on this). If CMO extension is frozen by year end then we can
> trap-n-emulate CMO instructions instead of SBI DMA sync extension. If
> it does not freeze by year end then we will have to go ahead with
> SBI DMA sync extension as stop-gap solution.
>

The CMOs TG has a meeting today, I'll try and join and ask for updates
on this.





Re: PCIe requirements: Memory vs I/O

Greg Favor
 

Thanks.


On Tue, Jun 15, 2021 at 6:00 PM Josh Scheid <jscheid@...> wrote:
On Tue, Jun 15, 2021 at 5:43 PM Josh Scheid via lists.riscv.org <jscheid=ventanamicro.com@...> wrote:

I can and will do that.  The point of raising this here is to explicitly confirm that the platform intent is to enable Memory PMA within, say, PCIe-managed regions.  With that confirmation now effectively clear we can push on the priv spec.



-Josh


Re: PCIe requirements: Memory vs I/O

Josh Scheid
 

On Tue, Jun 15, 2021 at 5:43 PM Josh Scheid via lists.riscv.org <jscheid=ventanamicro.com@...> wrote:

I can and will do that.  The point of raising this here is to explicitly confirm that the platform intent is to enable Memory PMA within, say, PCIe-managed regions.  With that confirmation now effectively clear we can push on the priv spec.



-Josh

761 - 780 of 1847