Re: Platform specification questions

Greg Favor

On Mon, Dec 13, 2021 at 2:22 PM Ved Shanbhogue <ved@...> wrote:
>Mandate:  *At a minimum, caching structures must be protected such that
>single-bit errors are detected and corrected by hardware.*
Would a mandate be overeaching and why limit it to caches then?

This was just trying to mandate a basic requirement and not go as far as requiring protection of all RAM-based structures - which some may view as overreach.  Conversely I can understand that some people can view that "all caching structures" is already an overreach.  

A product may define its reliability goals and may reason that a certain cache need not be protected due to various reasons like the technology in which the product is built, the altitude at which it is supposed to be used, the architectural vulnerability factor computed for that structure, etc.

I am failing to understand how would we be adding to or removing from the OS-A platform compatibility goals which is to be able to boot a shrink wrapper server operating system by trying to provide a mandate on how it implements reliability?

I think this whole RAS-related topic in the current platform draft was to establish some form of modest RAS requirement (versus no requirement) until a proper RAS arch spec exists.  Although even then (assuming that arch spec is like x86 and ARM RAS specs that are just concerned with standardizing RAS registers for logging and the mechanisms for reporting errors), there still won't be any minimum requirement for actual error detection and correction.

Fundamentally, should the Server platform spec mandate ANY error detection/correction requirements, or just leave it as a wild west among hardware developers to individually and eventually figure out where the line exists as far as the basic needs for RAS in Server-compliant platforms?  And leave it for system integrators to discover that some Server-compliant hardware has less than "basic" RAS?

BUT if the platform spec is ONLY trying to establish hardware/software interoperability, and not also match up hardware and software expectations regarding other areas of functionality such as RAS, then that answers the question.  My own leaning is towards trying to address the latter versus the narrower view that the only concern is software interoperability.  But I understand the arguments both ways.


Join { to automatically receive all group messages.