Re: [PATCH 1/1] RAS features for OS-A platform server extension
On Wed, Jun 23, 2021 at 8:25 PM Abner Chang <renba.chang@...> wrote:
Note that the priority of any RAS interrupts would be software configurable in the interrupt controller. Also note that there are other common techniques for preventing the propagation of errors and for isolating the impact of errors (e.g. precise hart exceptions on attempted use of corrupted data, data poisoning, I/O flow termination, ...).
In a typical enterprise-class RAS architecture, "error events" are logged in RAS registers, which then optionally generate RAS interrupt requests. These then go to the system interrupt controller, which prioritizes and routes requests to appropriate harts.
One would typically have RAS registers (for logging and reporting errors) spread around the system, ideally at all points in the system where errors can be detected and at all points where corrupted data can be consumed.
RAS errors generally result in RAS interrupts, but when a hart tries to consume corrupted data, the ideal RAS behavior is for the hart to take a precise exception on the load instruction that is trying to consume corrupted data.
All this would be covered by a proper RAS architecture (to hopefully be developed by a TG next year).