Hypervisor exception priorities


Paul Donahue
 

Table 3.7 of the privileged spec lists the synchronous exception priorities. The hypervisor chapter adds guest page faults and virtual instruction exceptions but I don't see where it states how they are prioritized (either in a table or in prose). I have intuition about the priorities but it should be explicit.

The explicitly incomplete yet normative list of virtual instruction exceptions also makes it harder to understand the exact virtual/illegal dichotomy.


Thanks,

-Paul


John Hauser
 

Paul Donahue wrote:
Table 3.7 of the privileged spec lists the synchronous exception
priorities. The hypervisor chapter adds guest page faults and virtual
instruction exceptions but I don't see where it states how they are
prioritized (either in a table or in prose). I have intuition about the
priorities but it should be explicit.
I suggest adding a GitHub issue or two here:
https://github.com/riscv/riscv-isa-manual

For virtual instruction exceptions, would anybody assume their priority
is anything other than the same as illegal instruction exceptions?

The explicitly incomplete yet normative list of virtual instruction
exceptions also makes it harder to understand the exact virtual/illegal
dichotomy.
You're not suggesting we drop the list from Section 5.6.1 and let
everyone work out all the specific cases themselves based on the
general rules from that section, are you? I'm guessing not; rather,
it's the incompleteness you're objecting to.

I wrote that the list is not necessarily complete because I was
thinking of extensions such as Smstateen and the AIA (Advanced
Interrupt Architecture), plus the constant risk of additions elsewhere
within the Privileged ISA itself. But if it will make people more
comfortable, we can drop the caution about the list's possible
incompleteness, add a more conscious caveat statement that extensions
may add to the list, and rely on everyone working to keep this list
always complete and up-to-date in the Privileged ISA document.

To be honest, I have my doubts about that last part, but we can give it
a try if that's what people prefer.

If we go the route of promising a complete list of virtual instruction
exceptions, can we count on you, Paul, to study the current list
carefully and report on any cases you see that might be missing?

- John Hauser


Greg Favor
 

On Wed, Aug 11, 2021 at 1:55 PM John Hauser <jh.riscv@...> wrote:
I wrote that the list is not necessarily complete because I was
thinking of extensions such as Smstateen and the AIA (Advanced
Interrupt Architecture), plus the constant risk of additions elsewhere
within the Privileged ISA itself.  But if it will make people more
comfortable, we can drop the caution about the list's possible
incompleteness, add a more conscious caveat statement that extensions
may add to the list, and rely on everyone working to keep this list
always complete and up-to-date in the Privileged ISA document.

Myself, when I read the text in question, I find the "list may be incomplete" to be nebulous and unbounded in what that could mean.  Whereas a statement that the list will grow over time (e.g. due to new arch extensions, but not limited to that) provides some explanation of how and why the list can change over time.

With both the existing and alternative text the need for everyone to keep the list complete and up-to-date exists.  If anything, mentioning "due to new arch extensions" clues people in to what they have to worry about in maintaining the list.

Greg


andrew@...
 



On Wed, Aug 11, 2021 at 2:12 PM Greg Favor <gfavor@...> wrote:
On Wed, Aug 11, 2021 at 1:55 PM John Hauser <jh.riscv@...> wrote:
I wrote that the list is not necessarily complete because I was
thinking of extensions such as Smstateen and the AIA (Advanced
Interrupt Architecture), plus the constant risk of additions elsewhere
within the Privileged ISA itself.  But if it will make people more
comfortable, we can drop the caution about the list's possible
incompleteness, add a more conscious caveat statement that extensions
may add to the list, and rely on everyone working to keep this list
always complete and up-to-date in the Privileged ISA document.

Myself, when I read the text in question, I find the "list may be incomplete" to be nebulous and unbounded in what that could mean.  Whereas a statement that the list will grow over time (e.g. due to new arch extensions, but not limited to that) provides some explanation of how and why the list can change over time.

With both the existing and alternative text the need for everyone to keep the list complete and up-to-date exists.  If anything, mentioning "due to new arch extensions"

This minor amendment seems to me to strike the right balance.

clues people in to what they have to worry about in maintaining the list.

Greg


Josh Scheid
 

On Wed, Aug 11, 2021 at 2:12 PM Greg Favor <gfavor@...> wrote:
On Wed, Aug 11, 2021 at 1:55 PM John Hauser <jh.riscv@...> wrote:
I wrote that the list is not necessarily complete because I was
thinking of extensions such as Smstateen and the AIA (Advanced
Interrupt Architecture), plus the constant risk of additions elsewhere
within the Privileged ISA itself.  But if it will make people more
comfortable, we can drop the caution about the list's possible
incompleteness, add a more conscious caveat statement that extensions
may add to the list, and rely on everyone working to keep this list
always complete and up-to-date in the Privileged ISA document.

Myself, when I read the text in question, I find the "list may be incomplete" to be nebulous and unbounded in what that could mean.  Whereas a statement that the list will grow over time (e.g. due to new arch extensions, but not limited to that) provides some explanation of how and why the list can change over time.

With both the existing and alternative text the need for everyone to keep the list complete and up-to-date exists.  If anything, mentioning "due to new arch extensions" clues people in to what they have to worry about in maintaining the list.


Yes.  Describing the bounds of the things not listed and the rules for evaluating inclusion would be sufficient.


Paul Donahue
 

On Wed, Aug 11, 2021 at 1:55 PM John Hauser <jh.riscv@...> wrote:
Paul Donahue wrote:
> Table 3.7 of the privileged spec lists the synchronous exception
> priorities. The hypervisor chapter adds guest page faults and virtual
> instruction exceptions but I don't see where it states how they are
> prioritized (either in a table or in prose). I have intuition about the
> priorities but it should be explicit.

I suggest adding a GitHub issue or two here:
https://github.com/riscv/riscv-isa-manual


For virtual instruction exceptions, would anybody assume their priority
is anything other than the same as illegal instruction exceptions?

I wouldn't.  Would anybody?  Sometimes I'm surprised...

There would need to be a relative priority between illegal and virtual if the two can exist concurrently, though I think that the definition of virtual instruction exception removes that possibility.

Guest page faults get messier because they can either be for VS-stage PTEs (which are presumably higher priority than page faults in VS-stage) or for the final memory access (which would be lower priority than VS-stage page faults).  And I now realize that you can get access faults on G-stage PTEs which are accessed to translate the instruction-side VS-stage table walks, hoisting that data access fault above a VS-stage instruction page fault.  I think that it's fairly intuitive how it has to work (at least for somebody who understands the big picture) but specifying it will be quite messy.

> The explicitly incomplete yet normative list of virtual instruction
> exceptions also makes it harder to understand the exact virtual/illegal
> dichotomy.

You're not suggesting we drop the list from Section 5.6.1 and let
everyone work out all the specific cases themselves based on the
general rules from that section, are you?  I'm guessing not; rather,
it's the incompleteness you're objecting to.

Yes, it's the incompleteness I'm worried about.

I wrote that the list is not necessarily complete because I was
thinking of extensions such as Smstateen and the AIA (Advanced
Interrupt Architecture), plus the constant risk of additions elsewhere
within the Privileged ISA itself.  But if it will make people more
comfortable, we can drop the caution about the list's possible
incompleteness, add a more conscious caveat statement that extensions
may add to the list, and rely on everyone working to keep this list
always complete and up-to-date in the Privileged ISA document.

To be honest, I have my doubts about that last part, but we can give it
a try if that's what people prefer.

If we go the route of promising a complete list of virtual instruction
exceptions, can we count on you, Paul, to study the current list
carefully and report on any cases you see that might be missing?

I'll do my best.


Thanks,

-Paul

 

    - John Hauser






Greg Favor
 

On Wed, Aug 11, 2021 at 2:45 PM Paul Donahue <pdonahue@...> wrote:
Guest page faults get messier because they can either be for VS-stage PTEs (which are presumably higher priority than page faults in VS-stage) or for the final memory access (which would be lower priority than VS-stage page faults).  And I now realize that you can get access faults on G-stage PTEs which are accessed to translate the instruction-side VS-stage table walks, hoisting that data access fault above a VS-stage instruction page fault.  I think that it's fairly intuitive how it has to work (at least for somebody who understands the big picture) but specifying it will be quite messy.

I haven't double-checked myself, but I believe this mixed priority between PF and GPF exceptions is resolved by the fact that the Priv spec spells out the logical sequence of steps to translate an address - including what causes of faults get checked in what step.

In essence, PF and GPF are of the same priority, but there is always a clear ordering as to whether a PF due to one specific reason or a GPF due to another specific reason should be taken.

Greg


Ved Shanbhogue
 

On 8/11/21 4:57 PM, Greg Favor wrote:
On Wed, Aug 11, 2021 at 2:45 PM Paul Donahue <pdonahue@... <mailto:pdonahue@...>> wrote:
Guest page faults get messier because they can either be for
VS-stage PTEs (which are presumably higher priority than page faults
in VS-stage) or for the final memory access (which would be lower
priority than VS-stage page faults).  And I now realize that you can
get access faults on G-stage PTEs which are accessed to translate
the instruction-side VS-stage table walks, hoisting that data access
fault above a VS-stage instruction page fault.  I think that it's
fairly intuitive how it has to work (at least for somebody who
understands the big picture) but specifying it will be quite messy.
I haven't double-checked myself, but I believe this mixed priority between PF and GPF exceptions is resolved by the fact that the Priv spec spells out the logical sequence of steps to translate an address - including what causes of faults get checked in what step.
In essence, PF and GPF are of the same priority, but there is always a clear ordering as to whether a PF due to one specific reason or a GPF due to another specific reason should be taken.
So a special case occurs for setting A or D bits in first stage PTE where a second level leaf PTE may cause a page fault before and after providing the translation to the first stage.

Walk to find physical address of the first level leaf paging structure entry is considered a LOAD. However once the leaf PTE is loaded it may need an A or D bit update. This requires the page walk to go back to the second level PTE that provided the translation and test for write permission, which could cause a guest page fault, and update the D bit in the second stage PTE that provided the translation.

To ensure that the D bit update in the second stage PTE and the first stage PTE occurs atomically this require the page walk state machine to hold the reservation on second stage PTE line (and the first stage PTE) till the determination has been made whether a D bit update may be needed.

So we may see this order of faults:
- Guest page fault on the walk for translating the leaf paging structure address
- Access fault on first stage leaf PTE address
- Page faults from the first stage leaf PTE
- Guest Page fault - if D bit update needed as noted above
- Guest page fault - for the page translation

I believe at least the x86 architecture avoids this by treating second level walks to obtain translation for first level paging structures as STOREs instead of loads. This may however sometime lead to oversetting the D bit in the second level PTEs when the first level did not actually require A or D bit to be set.

- ved