On Mon, Dec 13, 2021 at 6:56 PM Greg Favor <gfavor@...> wrote:
On Mon, Dec 13, 2021 at 5:38 PM Ved Shanbhogue <ved@...> wrote:
This was one of the source of my questions. If the platform specifications intent is to specify the SEE, ISA and non-ISA hardware - the hardware/software contract - as visible to software so that a shrink wrapped operating system can load then I would say its not the platform specifications role to teach how to design resilient hardware. If the goal of the platform specification is to teach hardware designers about how to design resilient hardware then I think the specification falls short in many ways...I think you also hit upon that in the next statement.
I wouldn't view platform mandates of this sort as teaching, but as establishing a baseline that system integrators can depend on - by guiding the hardware developers as to what that expected baseline is. (But I get your point.)
My understanding was the former i.e. establishing the standard for hardware-software interoperability. Specifically in areas of RAS I think where the interoperability is required - e.g. standardized logging/reporting, redirecting reporting to firmware-first, etc. I think should be in the purview.
The fundamental question is whether the goal of the platform spec is solely to ensure hardware-software interoperability and not to go further in ensuring other minimum capabilities that compliant platforms will provide. What should be said and not said about RAS follows from that.
Given that people are leaning towards the more limited scope or goal for the OS-A platforms, then that directly implies that there should be no requirements about what RAS features/coverage/etc. are actually implemented by compliant platforms.
The intent of the platform spec is hardware-software interoperability.
I agree that dictating RAS hardware features is not within the scope
of the platform spec. However, we do want standards for RAS error
handling, error detection, logging/reporting and such. For example
using APEI to convey error information to OSPM is needed for software
So one suggestion is we remove specific errors like single-bit errors,
multi-bit errors and such and limit the features to error handling,
detection and logging/reporting.