Re: [PATCH 1/1] RAS features for OS-A platform server extension

Greg Favor

On Wed, Jun 23, 2021 at 8:25 PM Abner Chang <renba.chang@...> wrote:
Please review the below sentence. 
If the RAS event is configured as the firmware first model, the platform should be able to trigger the higest priority of M-mode interrupt to all HARTs in the physical RV processor. This prevents the subsequent RAS errors are propagated by other HARTs that access the problematic hardware (PCIe, Memory, I/O and etc.)

Note that the priority of any RAS interrupts would be software configurable in the interrupt controller.  Also note that there are other common techniques for preventing the propagation of errors and for isolating the impact of errors (e.g. precise hart exceptions on attempted use of corrupted data, data poisoning, I/O flow termination, ...).
One question:
Besides those RAS events come from the interrupt controller,

In a typical enterprise-class RAS architecture, "error events" are logged in RAS registers, which then optionally generate RAS interrupt requests.  These then go to the system interrupt controller, which prioritizes and routes requests to appropriate harts.  
how about the HART or Memory RAS events?

One would typically have RAS registers (for logging and reporting errors) spread around the system, ideally at all points in the system where errors can be detected and at all points where corrupted data can be consumed.  
Are those RAS events in the scope of exception? or they would be also routed to  interrupt controller?

RAS errors generally result in RAS interrupts, but when a hart tries to consume corrupted data, the ideal RAS behavior is for the hart to take a precise exception on the load instruction that is trying to consume corrupted data.
Or we don't have to worry about this, RAS TG will have the solution?

All this would be covered by a proper RAS architecture (to hopefully be developed by a TG next year).


Join to automatically receive all group messages.