[RISC-V] [tech-virt-mem] Access faults for paging structures linked to hgatp


Daniel Lustig
 

Forwarding to tech-privileged so the hypervisor folks can weigh in...

On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
Dan Hi -

On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:

The algorithm is what is described in Section 4.3.2 and Section 5.5.
Page faults are converted into guest page faults, but access faults are
still reported as access faults even in the two-level translation case.

Hypervisor folks, please correct me if you disagree.
I did see that and that was why I asked the question. On an access
fault in the second level page table walk, the guest cannot rectify
the problem.
So is the expectation that hypervisors never delegate access faults
and are expected to analyze all access faults caused during VS/VU
execution and reflect those that are caused by VS/VU back to the
guest?

ved


Daniel Lustig
 

Forwarding Richard's response (with his permission)...see below

On 7/8/2021 1:48 PM, Richard Trauben wrote:
Damiel,

On an access, fault in the second level page table walk, the guest cannot
rectify the problem.
I dont believe thats true. Some fraction of guest pages reside in
hypervisor space while the
others will require action from the hypervisor. .Examples include:
1) guest OS that can move in-use guest page to the guest swap space by
managing its part of the table.
2) guests OS can manage the referenced and modified bits of an active guest
PTE by periodically clearing
the bits to disallow those accesses to the page.
3) guest OS malloc/free can remap pages from a guest resident memory free
list.
4) guest OS deliberately setting up an ":yellow" stack page access fault
that detects when the
initially allocated guest stack space is almost exhausted, while guest OS
still has space
it can allocate (or reclaim).
5) guest OS processes should be able to run profilers or set code or data
address breakpoints.
These matching events should call back the guest OS without invoking the
hypervisor.
6) guest OS can invalidate (shoot down) existing TLB PTE entries used when
some/all guest
processes share a guest managed page,

Forcing every guest process access fault to the hypervisor increases
latency and reduces thruput.

A guest process accesses passing virtual-to-intermediate PTE checks
and failing hypervisor intermediate to physical PTE checks will trap to VH
the hypervisor. Data integrity (bus or memory) fails and async fatal error
still need to handled by the hypervisor or a more priviledged machine
runtime. Whether guest process crypto authentication failures call the guest
OS, hypervisor or machine runtime executive can be be configured when the
session key is installed. When Key installation or seefing the random
number
generator is attemped by all but the most trusted level,these may return
as access faults where hypervisor is not trusted to field the access
fault.
A portion of timers can be allocated to guest processes while others are
allocated to either the guest OS, the hypervisor or machine priviledge
level.

Any attempt to overwrite them from a lesser priviledge level is going to
create an access fault. Some of these timer can be configured to report
these errors to an associated privilege level.

Whether a DMA IO-TLB access fault event is reported as an interrupt or
collected as an access fault to a specific DMA SW process *e,g, PCIE
Multiple Logical Device Access Exception) is yet another case where
async faults could be configured as reported to different guest processes
or different guest OS (VMs) or hypervisor or pod orchestration runtime.

Regards,
Richard Trauben

On Thu, Jul 8, 2021 at 8:20 AM Daniel Lustig <dlustig@...> wrote:

Forwarding to tech-privileged so the hypervisor folks can weigh in...

On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
Dan Hi -

On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:

The algorithm is what is described in Section 4.3.2 and Section 5.5.
Page faults are converted into guest page faults, but access faults are
still reported as access faults even in the two-level translation case.

Hypervisor folks, please correct me if you disagree.
I did see that and that was why I asked the question. On an access
fault in the second level page table walk, the guest cannot rectify
the problem.
So is the expectation that hypervisors never delegate access faults
and are expected to analyze all access faults caused during VS/VU
execution and reflect those that are caused by VS/VU back to the
guest?

ved





Vedvyas shanbhogue
 

Dan, Richard Hi -

Sorry I think you misunderstood my comment. I was not asking whether a
hypervisor should always intercept page faults or timers etc. My
question was very specific to the access faults caused when the page
miss handler walks the second level paging structures on a TLB miss.
You wrote about many example that cause page fault. You did not cover
access faults - exception code (1, 5, and 7).

In the current specification there are page-fault (code 12, 13 and 15)
and a guest-page-fault (code 20, 21, and 23) defined.

However the access faults (code 1, 5, and 7) do not have a "guest" variant.

So consider an implementation that causes an "access fault" on
uncorrectable ECC errors in memory.

When doing a VA->IPA->PA translation, the second level page table
established by the hypervisor is walked to perform the IPA->PA
translation. In this process, the load from the paging structure setup
by the hypervisor (this is not the guest page table) encounters this
uncorrectable ECC error. The HW has to now report a "access fault" but
since there is no guest variant defined, the hypervisor will need to
intercept all access faults, filter the ones that were caused on
access to the hypervisors memory - such as the second level paging
structures - and if the fault was not caused by that re-inject the
exception into the guest.

Was this intentional?

ved

On Thu, Jul 8, 2021 at 1:26 PM Daniel Lustig <dlustig@...> wrote:

Forwarding Richard's response (with his permission)...see below

On 7/8/2021 1:48 PM, Richard Trauben wrote:
Damiel,

On an access, fault in the second level page table walk, the guest cannot
rectify the problem.
I dont believe thats true. Some fraction of guest pages reside in
hypervisor space while the
others will require action from the hypervisor. .Examples include:
1) guest OS that can move in-use guest page to the guest swap space by
managing its part of the table.
2) guests OS can manage the referenced and modified bits of an active guest
PTE by periodically clearing
the bits to disallow those accesses to the page.
3) guest OS malloc/free can remap pages from a guest resident memory free
list.
4) guest OS deliberately setting up an ":yellow" stack page access fault
that detects when the
initially allocated guest stack space is almost exhausted, while guest OS
still has space
it can allocate (or reclaim).
5) guest OS processes should be able to run profilers or set code or data
address breakpoints.
These matching events should call back the guest OS without invoking the
hypervisor.
6) guest OS can invalidate (shoot down) existing TLB PTE entries used when
some/all guest
processes share a guest managed page,

Forcing every guest process access fault to the hypervisor increases
latency and reduces thruput.

A guest process accesses passing virtual-to-intermediate PTE checks
and failing hypervisor intermediate to physical PTE checks will trap to VH
the hypervisor. Data integrity (bus or memory) fails and async fatal error
still need to handled by the hypervisor or a more priviledged machine
runtime. Whether guest process crypto authentication failures call the guest
OS, hypervisor or machine runtime executive can be be configured when the
session key is installed. When Key installation or seefing the random
number
generator is attemped by all but the most trusted level,these may return
as access faults where hypervisor is not trusted to field the access
fault.
A portion of timers can be allocated to guest processes while others are
allocated to either the guest OS, the hypervisor or machine priviledge
level.

Any attempt to overwrite them from a lesser priviledge level is going to
create an access fault. Some of these timer can be configured to report
these errors to an associated privilege level.

Whether a DMA IO-TLB access fault event is reported as an interrupt or
collected as an access fault to a specific DMA SW process *e,g, PCIE
Multiple Logical Device Access Exception) is yet another case where
async faults could be configured as reported to different guest processes
or different guest OS (VMs) or hypervisor or pod orchestration runtime.

Regards,
Richard Trauben

On Thu, Jul 8, 2021 at 8:20 AM Daniel Lustig <dlustig@...> wrote:

Forwarding to tech-privileged so the hypervisor folks can weigh in...

On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
Dan Hi -

On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:

The algorithm is what is described in Section 4.3.2 and Section 5.5.
Page faults are converted into guest page faults, but access faults are
still reported as access faults even in the two-level translation case.

Hypervisor folks, please correct me if you disagree.
I did see that and that was why I asked the question. On an access
fault in the second level page table walk, the guest cannot rectify
the problem.
So is the expectation that hypervisors never delegate access faults
and are expected to analyze all access faults caused during VS/VU
execution and reflect those that are caused by VS/VU back to the
guest?

ved








Jonathan Behrens <behrensj@...>
 

As I understand it, this is intentional. Access faults are likely to be substantially less frequent than page faults so the performance overhead of forwarding them from HS-mode to VS-mode shouldn't be too big a deal.

Jonathan


On Sat, Jul 10, 2021 at 2:49 PM Vedvyas shanbhogue via lists.riscv.org <Vedvyas13686=gmail.com@...> wrote:
Dan, Richard Hi -

Sorry I think you misunderstood my comment. I was not asking whether a
hypervisor should always intercept page faults or timers etc. My
question was very specific to the access faults caused when the page
miss handler walks the second level paging structures on a TLB miss.
You wrote about many example that cause page fault. You did not cover
access faults - exception code (1, 5, and 7).

In the current specification there are page-fault (code 12, 13 and 15)
and a guest-page-fault (code 20, 21, and 23) defined.

However the access faults (code 1, 5, and 7) do not have a "guest" variant.

So consider an implementation that causes an "access fault" on
uncorrectable ECC errors in memory.

When doing a VA->IPA->PA translation, the second level page table
established by the hypervisor is walked to perform the IPA->PA
translation. In this process, the load from the paging structure setup
by the hypervisor (this is not the guest page table) encounters this
uncorrectable ECC error. The HW has to now report a "access fault" but
since there is no guest variant defined, the hypervisor will need to
intercept all access faults, filter the ones that were caused on
access to the hypervisors memory - such as the second level paging
structures - and if the fault was not caused by that re-inject the
exception into the guest.

Was this intentional?

ved

On Thu, Jul 8, 2021 at 1:26 PM Daniel Lustig <dlustig@...> wrote:
>
> Forwarding Richard's response (with his permission)...see below
>
> On 7/8/2021 1:48 PM, Richard Trauben wrote:
> > Damiel,
> >
> >> On an access, fault in the second level page table walk, the guest cannot
> > rectify the problem.
> > I dont believe thats true. Some fraction of guest pages reside in
> > hypervisor space while the
> > others will require action from the hypervisor. .Examples include:
> > 1) guest OS that can move in-use guest page to the guest swap space by
> > managing its part of the table.
> > 2) guests OS can manage the referenced and modified bits of an active guest
> > PTE by periodically clearing
> > the bits to disallow those accesses to the page.
> > 3) guest OS malloc/free can remap pages from a guest resident memory free
> > list.
> > 4) guest OS deliberately setting up an ":yellow" stack page access fault
> > that detects when the
> > initially allocated guest stack space is almost exhausted, while guest OS
> > still has space
> > it can allocate (or reclaim).
> > 5) guest OS processes should be able to run profilers or set code or data
> > address breakpoints.
> > These matching events should call back the guest OS without invoking the
> > hypervisor.
> > 6) guest OS can invalidate (shoot down) existing TLB PTE entries used when
> > some/all guest
> >      processes share a guest managed page,
> >
> > Forcing every guest process access fault to the hypervisor increases
> > latency and reduces thruput.
> >
> > A guest process accesses  passing virtual-to-intermediate PTE checks
> > and failing hypervisor intermediate to physical PTE checks will trap to VH
> > the hypervisor. Data integrity (bus or memory) fails and async fatal error
> > still need to handled by the hypervisor or a more priviledged machine
> > runtime. Whether guest process crypto authentication failures call the guest
> > OS, hypervisor or machine runtime executive can be be configured when the
> > session key is installed. When Key installation or seefing the random
> > number
> > generator is attemped by all but the most trusted level,these may return
> > as access faults where hypervisor is not trusted to field the access
> > fault.
> > A portion of timers can be allocated to guest processes while others are
> > allocated to either the guest OS, the hypervisor or machine priviledge
> > level.
> >
> > Any attempt to overwrite them from a lesser priviledge level is going to
> > create an access fault. Some of these timer can be configured to report
> > these errors to an associated privilege level.
> >
> > Whether a DMA IO-TLB access fault event is reported as an interrupt or
> > collected as an access fault to a specific DMA SW process *e,g, PCIE
> > Multiple  Logical Device Access Exception) is yet another case where
> > async faults could be configured as reported to different guest processes
> > or different guest OS (VMs) or hypervisor or pod orchestration runtime.
> >
> > Regards,
> > Richard Trauben
> >
> > On Thu, Jul 8, 2021 at 8:20 AM Daniel Lustig <dlustig@...> wrote:
> >
> >> Forwarding to tech-privileged so the hypervisor folks can weigh in...
> >>
> >> On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
> >>> Dan Hi -
> >>>
> >>> On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:
> >>>>
> >>>> The algorithm is what is described in Section 4.3.2 and Section 5.5.
> >>>> Page faults are converted into guest page faults, but access faults are
> >>>> still reported as access faults even in the two-level translation case.
> >>>>
> >>>> Hypervisor folks, please correct me if you disagree.
> >>>>
> >>> I did see that and that was why I asked the question. On an access
> >>> fault in the second level page table walk, the guest cannot rectify
> >>> the problem.
> >>> So is the expectation that hypervisors never delegate access faults
> >>> and are expected to analyze all access faults caused during VS/VU
> >>> execution and reflect those that are caused by VS/VU back to the
> >>> guest?
> >>>
> >>> ved
> >>>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>
>
>
>
>






Anup Patel
 

I agree with Jonathan.

 

To add more information:

 

  • Any wrong memory access done by Guest/VM (VS-level) via explicit load/store instruction or via VS-stage page table walk will result in Guest page fault taken by hypervisor. The hypervisor can handle invalid memory access from Guest/VM by either to killing the Guest/VM or by injecting an access fault to Guest/VM where the later is generally preferred approach for hypervisors.
  • Any wrong memory access done by Hypervisor (HS-level) via explicit load/store instruction or via S-stage page table walk or via G-stage page table walk will result in access fault taken by M-level runtime firmware (OpenSBI). The M-level runtime firmware will typically redirect the access fault back to the HS-level software (Hypervisor).

 

Access faults at any level are expected to be less frequent and so don’t require special exception codes for VS-level.

 

Regards,

Anup

 

 

From: <tech-privileged@...> on behalf of Jonathan Behrens <behrensj@...>
Date: Sunday, 11 July 2021 at 3:32 AM
To: "tech-virt-mem@..." <tech-virt-mem@...>, "Vedvyas13686@..." <Vedvyas13686@...>
Cc: Dan Lustig <dlustig@...>, Richard Trauben <rtrauben@...>, "tech-privileged@..." <tech-privileged@...>
Subject: Re: [RISC-V] [tech-privileged] [RISC-V] [tech-virt-mem] Access faults for paging structures linked to hgatp

 

As I understand it, this is intentional. Access faults are likely to be substantially less frequent than page faults so the performance overhead of forwarding them from HS-mode to VS-mode shouldn't be too big a deal.

 

Jonathan

 

 

On Sat, Jul 10, 2021 at 2:49 PM Vedvyas shanbhogue via lists.riscv.org <Vedvyas13686=gmail.com@...> wrote:

Dan, Richard Hi -

Sorry I think you misunderstood my comment. I was not asking whether a
hypervisor should always intercept page faults or timers etc. My
question was very specific to the access faults caused when the page
miss handler walks the second level paging structures on a TLB miss.
You wrote about many example that cause page fault. You did not cover
access faults - exception code (1, 5, and 7).

In the current specification there are page-fault (code 12, 13 and 15)
and a guest-page-fault (code 20, 21, and 23) defined.

However the access faults (code 1, 5, and 7) do not have a "guest" variant.

So consider an implementation that causes an "access fault" on
uncorrectable ECC errors in memory.

When doing a VA->IPA->PA translation, the second level page table
established by the hypervisor is walked to perform the IPA->PA
translation. In this process, the load from the paging structure setup
by the hypervisor (this is not the guest page table) encounters this
uncorrectable ECC error. The HW has to now report a "access fault" but
since there is no guest variant defined, the hypervisor will need to
intercept all access faults, filter the ones that were caused on
access to the hypervisors memory - such as the second level paging
structures - and if the fault was not caused by that re-inject the
exception into the guest.

Was this intentional?

ved

On Thu, Jul 8, 2021 at 1:26 PM Daniel Lustig <dlustig@...> wrote:
>
> Forwarding Richard's response (with his permission)...see below
>
> On 7/8/2021 1:48 PM, Richard Trauben wrote:
> > Damiel,
> >
> >> On an access, fault in the second level page table walk, the guest cannot
> > rectify the problem.
> > I dont believe thats true. Some fraction of guest pages reside in
> > hypervisor space while the
> > others will require action from the hypervisor. .Examples include:
> > 1) guest OS that can move in-use guest page to the guest swap space by
> > managing its part of the table.
> > 2) guests OS can manage the referenced and modified bits of an active guest
> > PTE by periodically clearing
> > the bits to disallow those accesses to the page.
> > 3) guest OS malloc/free can remap pages from a guest resident memory free
> > list.
> > 4) guest OS deliberately setting up an ":yellow" stack page access fault
> > that detects when the
> > initially allocated guest stack space is almost exhausted, while guest OS
> > still has space
> > it can allocate (or reclaim).
> > 5) guest OS processes should be able to run profilers or set code or data
> > address breakpoints.
> > These matching events should call back the guest OS without invoking the
> > hypervisor.
> > 6) guest OS can invalidate (shoot down) existing TLB PTE entries used when
> > some/all guest
> >      processes share a guest managed page,
> >
> > Forcing every guest process access fault to the hypervisor increases
> > latency and reduces thruput.
> >
> > A guest process accesses  passing virtual-to-intermediate PTE checks
> > and failing hypervisor intermediate to physical PTE checks will trap to VH
> > the hypervisor. Data integrity (bus or memory) fails and async fatal error
> > still need to handled by the hypervisor or a more priviledged machine
> > runtime. Whether guest process crypto authentication failures call the guest
> > OS, hypervisor or machine runtime executive can be be configured when the
> > session key is installed. When Key installation or seefing the random
> > number
> > generator is attemped by all but the most trusted level,these may return
> > as access faults where hypervisor is not trusted to field the access
> > fault.
> > A portion of timers can be allocated to guest processes while others are
> > allocated to either the guest OS, the hypervisor or machine priviledge
> > level.
> >
> > Any attempt to overwrite them from a lesser priviledge level is going to
> > create an access fault. Some of these timer can be configured to report
> > these errors to an associated privilege level.
> >
> > Whether a DMA IO-TLB access fault event is reported as an interrupt or
> > collected as an access fault to a specific DMA SW process *e,g, PCIE
> > Multiple  Logical Device Access Exception) is yet another case where
> > async faults could be configured as reported to different guest processes
> > or different guest OS (VMs) or hypervisor or pod orchestration runtime.
> >
> > Regards,
> > Richard Trauben
> >
> > On Thu, Jul 8, 2021 at 8:20 AM Daniel Lustig <dlustig@...> wrote:
> >
> >> Forwarding to tech-privileged so the hypervisor folks can weigh in...
> >>
> >> On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
> >>> Dan Hi -
> >>>
> >>> On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:
> >>>>
> >>>> The algorithm is what is described in Section 4.3.2 and Section 5.5.
> >>>> Page faults are converted into guest page faults, but access faults are
> >>>> still reported as access faults even in the two-level translation case.
> >>>>
> >>>> Hypervisor folks, please correct me if you disagree.
> >>>>
> >>> I did see that and that was why I asked the question. On an access
> >>> fault in the second level page table walk, the guest cannot rectify
> >>> the problem.
> >>> So is the expectation that hypervisors never delegate access faults
> >>> and are expected to analyze all access faults caused during VS/VU
> >>> execution and reflect those that are caused by VS/VU back to the
> >>> guest?
> >>>
> >>> ved
> >>>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>
>
>
>
>





Richard Trauben
 

Anup. David, Johnathan, All,

I agree with your recommendation. The policy to report non-translation access faults (ie. all except pte or access rights) access faults beyond the guest OS exception should belong to the guest OS. The VM/Guest OS can be mandatorily be configured by the hypervisor to synchronously report the event via a guest-OS-to-VM out of band exception handler.

When an primary CSR access fault that avoids PTE access rights violations associated with either a memory mapped or IO spaced mapped CSR space due to an access priviledge configured as locked down with a secondary CSR by the hypervisor for hypervisor and above only read/write access be reported?

Consider the case where the primary CSR access is to the security enclave primary, the random number generator or the GICC subsystem to map physical interrupts and virtual interrupts,

Richard


On Sun, Jul 11, 2021 at 8:30 PM Anup Patel <Anup.Patel@...> wrote:

I agree with Jonathan.

 

To add more information:

 

  • Any wrong memory access done by Guest/VM (VS-level) via explicit load/store instruction or via VS-stage page table walk will result in Guest page fault taken by hypervisor. The hypervisor can handle invalid memory access from Guest/VM by either to killing the Guest/VM or by injecting an access fault to Guest/VM where the later is generally preferred approach for hypervisors.
  • Any wrong memory access done by Hypervisor (HS-level) via explicit load/store instruction or via S-stage page table walk or via G-stage page table walk will result in access fault taken by M-level runtime firmware (OpenSBI). The M-level runtime firmware will typically redirect the access fault back to the HS-level software (Hypervisor).

 

Access faults at any level are expected to be less frequent and so don’t require special exception codes for VS-level.

 

Regards,

Anup

 

 

From: <tech-privileged@...> on behalf of Jonathan Behrens <behrensj@...>
Date: Sunday, 11 July 2021 at 3:32 AM
To: "tech-virt-mem@..." <tech-virt-mem@...>, "Vedvyas13686@..." <Vedvyas13686@...>
Cc: Dan Lustig <dlustig@...>, Richard Trauben <rtrauben@...>, "tech-privileged@..." <tech-privileged@...>
Subject: Re: [RISC-V] [tech-privileged] [RISC-V] [tech-virt-mem] Access faults for paging structures linked to hgatp

 

As I understand it, this is intentional. Access faults are likely to be substantially less frequent than page faults so the performance overhead of forwarding them from HS-mode to VS-mode shouldn't be too big a deal.

 

Jonathan

 

 

On Sat, Jul 10, 2021 at 2:49 PM Vedvyas shanbhogue via lists.riscv.org <Vedvyas13686=gmail.com@...> wrote:

Dan, Richard Hi -

Sorry I think you misunderstood my comment. I was not asking whether a
hypervisor should always intercept page faults or timers etc. My
question was very specific to the access faults caused when the page
miss handler walks the second level paging structures on a TLB miss.
You wrote about many example that cause page fault. You did not cover
access faults - exception code (1, 5, and 7).

In the current specification there are page-fault (code 12, 13 and 15)
and a guest-page-fault (code 20, 21, and 23) defined.

However the access faults (code 1, 5, and 7) do not have a "guest" variant.

So consider an implementation that causes an "access fault" on
uncorrectable ECC errors in memory.

When doing a VA->IPA->PA translation, the second level page table
established by the hypervisor is walked to perform the IPA->PA
translation. In this process, the load from the paging structure setup
by the hypervisor (this is not the guest page table) encounters this
uncorrectable ECC error. The HW has to now report a "access fault" but
since there is no guest variant defined, the hypervisor will need to
intercept all access faults, filter the ones that were caused on
access to the hypervisors memory - such as the second level paging
structures - and if the fault was not caused by that re-inject the
exception into the guest.

Was this intentional?

ved

On Thu, Jul 8, 2021 at 1:26 PM Daniel Lustig <dlustig@...> wrote:
>
> Forwarding Richard's response (with his permission)...see below
>
> On 7/8/2021 1:48 PM, Richard Trauben wrote:
> > Damiel,
> >
> >> On an access, fault in the second level page table walk, the guest cannot
> > rectify the problem.
> > I dont believe thats true. Some fraction of guest pages reside in
> > hypervisor space while the
> > others will require action from the hypervisor. .Examples include:
> > 1) guest OS that can move in-use guest page to the guest swap space by
> > managing its part of the table.
> > 2) guests OS can manage the referenced and modified bits of an active guest
> > PTE by periodically clearing
> > the bits to disallow those accesses to the page.
> > 3) guest OS malloc/free can remap pages from a guest resident memory free
> > list.
> > 4) guest OS deliberately setting up an ":yellow" stack page access fault
> > that detects when the
> > initially allocated guest stack space is almost exhausted, while guest OS
> > still has space
> > it can allocate (or reclaim).
> > 5) guest OS processes should be able to run profilers or set code or data
> > address breakpoints.
> > These matching events should call back the guest OS without invoking the
> > hypervisor.
> > 6) guest OS can invalidate (shoot down) existing TLB PTE entries used when
> > some/all guest
> >      processes share a guest managed page,
> >
> > Forcing every guest process access fault to the hypervisor increases
> > latency and reduces thruput.
> >
> > A guest process accesses  passing virtual-to-intermediate PTE checks
> > and failing hypervisor intermediate to physical PTE checks will trap to VH
> > the hypervisor. Data integrity (bus or memory) fails and async fatal error
> > still need to handled by the hypervisor or a more priviledged machine
> > runtime. Whether guest process crypto authentication failures call the guest
> > OS, hypervisor or machine runtime executive can be be configured when the
> > session key is installed. When Key installation or seefing the random
> > number
> > generator is attemped by all but the most trusted level,these may return
> > as access faults where hypervisor is not trusted to field the access
> > fault.
> > A portion of timers can be allocated to guest processes while others are
> > allocated to either the guest OS, the hypervisor or machine priviledge
> > level.
> >
> > Any attempt to overwrite them from a lesser priviledge level is going to
> > create an access fault. Some of these timer can be configured to report
> > these errors to an associated privilege level.
> >
> > Whether a DMA IO-TLB access fault event is reported as an interrupt or
> > collected as an access fault to a specific DMA SW process *e,g, PCIE
> > Multiple  Logical Device Access Exception) is yet another case where
> > async faults could be configured as reported to different guest processes
> > or different guest OS (VMs) or hypervisor or pod orchestration runtime.
> >
> > Regards,
> > Richard Trauben
> >
> > On Thu, Jul 8, 2021 at 8:20 AM Daniel Lustig <dlustig@...> wrote:
> >
> >> Forwarding to tech-privileged so the hypervisor folks can weigh in...
> >>
> >> On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
> >>> Dan Hi -
> >>>
> >>> On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:
> >>>>
> >>>> The algorithm is what is described in Section 4.3.2 and Section 5.5.
> >>>> Page faults are converted into guest page faults, but access faults are
> >>>> still reported as access faults even in the two-level translation case.
> >>>>
> >>>> Hypervisor folks, please correct me if you disagree.
> >>>>
> >>> I did see that and that was why I asked the question. On an access
> >>> fault in the second level page table walk, the guest cannot rectify
> >>> the problem.
> >>> So is the expectation that hypervisors never delegate access faults
> >>> and are expected to analyze all access faults caused during VS/VU
> >>> execution and reflect those that are caused by VS/VU back to the
> >>> guest?
> >>>
> >>> ved
> >>>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>
>
>
>
>





Richard Trauben
 

My earlier note had 2 typos.Corrections are below:

Intended sentence 2 was:
    The policy to report a user or guest-os priviledge non-translation access faults
    (ie. all except other than an IPA or PA pte or associated access right protection)
     should belong to the guest OS.

Intended sentence 3 was:
" The hypervisor must be able to configure the VM/Guest OS to synchronously report
such an access exception via an out of band guest-os-to-hyoervisor exception request".

The questions in sentences 4 & 5 still apply.

Sorry for any confusion.

On Sun, Jul 11, 2021 at 10:28 PM Richard Trauben via lists.riscv.org <rtrauben=gmail.com@...> wrote:
Anup. David, Johnathan, All,

I agree with your recommendation. The policy to report non-translation access faults (ie. all except pte or access rights) access faults beyond the guest OS exception should belong to the guest OS. The VM/Guest OS can be mandatorily be configured by the hypervisor to synchronously report the event via a guest-OS-to-VM out of band exception handler.

When an primary CSR access fault that avoids PTE access rights violations associated with either a memory mapped or IO spaced mapped CSR space due to an access priviledge configured as locked down with a secondary CSR by the hypervisor for hypervisor and above only read/write access be reported?

Consider the case where the primary CSR access is to the security enclave primary, the random number generator or the GICC subsystem to map physical interrupts and virtual interrupts,

Richard

On Sun, Jul 11, 2021 at 8:30 PM Anup Patel <Anup.Patel@...> wrote:

I agree with Jonathan.

 

To add more information:

 

  • Any wrong memory access done by Guest/VM (VS-level) via explicit load/store instruction or via VS-stage page table walk will result in Guest page fault taken by hypervisor. The hypervisor can handle invalid memory access from Guest/VM by either to killing the Guest/VM or by injecting an access fault to Guest/VM where the later is generally preferred approach for hypervisors.
  • Any wrong memory access done by Hypervisor (HS-level) via explicit load/store instruction or via S-stage page table walk or via G-stage page table walk will result in access fault taken by M-level runtime firmware (OpenSBI). The M-level runtime firmware will typically redirect the access fault back to the HS-level software (Hypervisor).

 

Access faults at any level are expected to be less frequent and so don’t require special exception codes for VS-level.

 

Regards,

Anup

 

 

From: <tech-privileged@...> on behalf of Jonathan Behrens <behrensj@...>
Date: Sunday, 11 July 2021 at 3:32 AM
To: "tech-virt-mem@..." <tech-virt-mem@...>, "Vedvyas13686@..." <Vedvyas13686@...>
Cc: Dan Lustig <dlustig@...>, Richard Trauben <rtrauben@...>, "tech-privileged@..." <tech-privileged@...>
Subject: Re: [RISC-V] [tech-privileged] [RISC-V] [tech-virt-mem] Access faults for paging structures linked to hgatp

 

As I understand it, this is intentional. Access faults are likely to be substantially less frequent than page faults so the performance overhead of forwarding them from HS-mode to VS-mode shouldn't be too big a deal.

 

Jonathan

 

 

On Sat, Jul 10, 2021 at 2:49 PM Vedvyas shanbhogue via lists.riscv.org <Vedvyas13686=gmail.com@...> wrote:

Dan, Richard Hi -

Sorry I think you misunderstood my comment. I was not asking whether a
hypervisor should always intercept page faults or timers etc. My
question was very specific to the access faults caused when the page
miss handler walks the second level paging structures on a TLB miss.
You wrote about many example that cause page fault. You did not cover
access faults - exception code (1, 5, and 7).

In the current specification there are page-fault (code 12, 13 and 15)
and a guest-page-fault (code 20, 21, and 23) defined.

However the access faults (code 1, 5, and 7) do not have a "guest" variant.

So consider an implementation that causes an "access fault" on
uncorrectable ECC errors in memory.

When doing a VA->IPA->PA translation, the second level page table
established by the hypervisor is walked to perform the IPA->PA
translation. In this process, the load from the paging structure setup
by the hypervisor (this is not the guest page table) encounters this
uncorrectable ECC error. The HW has to now report a "access fault" but
since there is no guest variant defined, the hypervisor will need to
intercept all access faults, filter the ones that were caused on
access to the hypervisors memory - such as the second level paging
structures - and if the fault was not caused by that re-inject the
exception into the guest.

Was this intentional?

ved

On Thu, Jul 8, 2021 at 1:26 PM Daniel Lustig <dlustig@...> wrote:
>
> Forwarding Richard's response (with his permission)...see below
>
> On 7/8/2021 1:48 PM, Richard Trauben wrote:
> > Damiel,
> >
> >> On an access, fault in the second level page table walk, the guest cannot
> > rectify the problem.
> > I dont believe thats true. Some fraction of guest pages reside in
> > hypervisor space while the
> > others will require action from the hypervisor. .Examples include:
> > 1) guest OS that can move in-use guest page to the guest swap space by
> > managing its part of the table.
> > 2) guests OS can manage the referenced and modified bits of an active guest
> > PTE by periodically clearing
> > the bits to disallow those accesses to the page.
> > 3) guest OS malloc/free can remap pages from a guest resident memory free
> > list.
> > 4) guest OS deliberately setting up an ":yellow" stack page access fault
> > that detects when the
> > initially allocated guest stack space is almost exhausted, while guest OS
> > still has space
> > it can allocate (or reclaim).
> > 5) guest OS processes should be able to run profilers or set code or data
> > address breakpoints.
> > These matching events should call back the guest OS without invoking the
> > hypervisor.
> > 6) guest OS can invalidate (shoot down) existing TLB PTE entries used when
> > some/all guest
> >      processes share a guest managed page,
> >
> > Forcing every guest process access fault to the hypervisor increases
> > latency and reduces thruput.
> >
> > A guest process accesses  passing virtual-to-intermediate PTE checks
> > and failing hypervisor intermediate to physical PTE checks will trap to VH
> > the hypervisor. Data integrity (bus or memory) fails and async fatal error
> > still need to handled by the hypervisor or a more priviledged machine
> > runtime. Whether guest process crypto authentication failures call the guest
> > OS, hypervisor or machine runtime executive can be be configured when the
> > session key is installed. When Key installation or seefing the random
> > number
> > generator is attemped by all but the most trusted level,these may return
> > as access faults where hypervisor is not trusted to field the access
> > fault.
> > A portion of timers can be allocated to guest processes while others are
> > allocated to either the guest OS, the hypervisor or machine priviledge
> > level.
> >
> > Any attempt to overwrite them from a lesser priviledge level is going to
> > create an access fault. Some of these timer can be configured to report
> > these errors to an associated privilege level.
> >
> > Whether a DMA IO-TLB access fault event is reported as an interrupt or
> > collected as an access fault to a specific DMA SW process *e,g, PCIE
> > Multiple  Logical Device Access Exception) is yet another case where
> > async faults could be configured as reported to different guest processes
> > or different guest OS (VMs) or hypervisor or pod orchestration runtime.
> >
> > Regards,
> > Richard Trauben
> >
> > On Thu, Jul 8, 2021 at 8:20 AM Daniel Lustig <dlustig@...> wrote:
> >
> >> Forwarding to tech-privileged so the hypervisor folks can weigh in...
> >>
> >> On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
> >>> Dan Hi -
> >>>
> >>> On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:
> >>>>
> >>>> The algorithm is what is described in Section 4.3.2 and Section 5.5.
> >>>> Page faults are converted into guest page faults, but access faults are
> >>>> still reported as access faults even in the two-level translation case.
> >>>>
> >>>> Hypervisor folks, please correct me if you disagree.
> >>>>
> >>> I did see that and that was why I asked the question. On an access
> >>> fault in the second level page table walk, the guest cannot rectify
> >>> the problem.
> >>> So is the expectation that hypervisors never delegate access faults
> >>> and are expected to analyze all access faults caused during VS/VU
> >>> execution and reflect those that are caused by VS/VU back to the
> >>> guest?
> >>>
> >>> ved
> >>>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>
>
>
>
>





Paolo Bonzini
 

On 12/07/21 05:30, Anup Patel wrote:
* Any wrong memory access done by Guest/VM (VS-level) via explicit
load/store instruction or via VS-stage page table walk will result
in Guest page fault taken by hypervisor. The hypervisor can handle
invalid memory access from Guest/VM by either to killing the
Guest/VM or by injecting an access fault to Guest/VM where the later
is generally preferred approach for hypervisors.
Why wouldn't this actually be taken by the M-level firmware, and from there forwarded twice (first to the hypervisor and second to the guest OS)? medeleg would be the same as in the case below.

Paolo

* Any wrong memory access done by Hypervisor (HS-level) via explicit
load/store instruction or via S-stage page table walk or via G-stage
page table walk will result in access fault taken by M-level runtime
firmware (OpenSBI). The M-level runtime firmware will typically
redirect the access fault back to the HS-level software (Hypervisor).


Anup Patel
 

Hi Paolo,

On 12/07/21, 4:42 PM, "tech-virt-mem@... on behalf of Paolo Bonzini" <tech-virt-mem@... on behalf of pbonzini@...> wrote:

On 12/07/21 05:30, Anup Patel wrote:
> * Any wrong memory access done by Guest/VM (VS-level) via explicit
> load/store instruction or via VS-stage page table walk will result
> in Guest page fault taken by hypervisor. The hypervisor can handle
> invalid memory access from Guest/VM by either to killing the
> Guest/VM or by injecting an access fault to Guest/VM where the later
> is generally preferred approach for hypervisors.

Why wouldn't this actually be taken by the M-level firmware, and from
there forwarded twice (first to the hypervisor and second to the guest
OS)? medeleg would be the same as in the case below.

Over here wrong memory access by Guest/VM means, Guest accessing
memory not mapped in G-stage (Stage2). The M-level firmware delegates
all Guest page faults to HS-level so all Guest page faults are directly taken
by hypervisor.

The little tricky case is when hypervisor has done incorrect mapping in
G-stage (Stage2) which point to non-existent (or not-accessible) address.
In this case, access to such non-existent (or not-accessible) memory
will result in access fault which is first taken by M-level firmware and
then forwarded to hypervisor.

Regards,
Anup

Paolo

> * Any wrong memory access done by Hypervisor (HS-level) via explicit
> load/store instruction or via S-stage page table walk or via G-stage
> page table walk will result in access fault taken by M-level runtime
> firmware (OpenSBI). The M-level runtime firmware will typically
> redirect the access fault back to the HS-level software (Hypervisor).


Anup Patel
 

+John Hause

 

From: <tech-privileged@...> on behalf of Richard Trauben <rtrauben@...>
Date: Monday, 12 July 2021 at 11:15 AM
To: Richard Trauben <rtrauben@...>
Cc: Anup Patel <Anup.Patel@...>, Jonathan Behrens <behrensj@...>, "tech-virt-mem@..." <tech-virt-mem@...>, "Vedvyas13686@..." <Vedvyas13686@...>, Dan Lustig <dlustig@...>, "tech-privileged@..." <tech-privileged@...>
Subject: Re: [RISC-V] [tech-privileged] [RISC-V] [tech-virt-mem] Access faults for paging structures linked to hgatp

 

My earlier note had 2 typos.Corrections are below:

 

Intended sentence 2 was:

    The policy to report a user or guest-os priviledge non-translation access faults

    (ie. all except other than an IPA or PA pte or associated access right protection)

     should belong to the guest OS.

 

[Anup] Yes, non-translation access faults by Guest/VM should be

forwarded/redirected back to Guest/VM as access faults. Again, this

is more of hypervisor software implementation choice.

 

Intended sentence 3 was:

" The hypervisor must be able to configure the VM/Guest OS to synchronously report

such an access exception via an out of band guest-os-to-hyoervisor exception request".

 

[Anup] Hypervisors (including KVM) will typically forward faults intended for Guest/VM

immediately. Not sure what you mean by out of band guest-os-to-hypervisor exception?

 

The questions in sentences 4 & 5 still apply.

 

[Anup] I did not fully understand the terms “primary CSR” and “secondary CSR

Controlling access to primary CSR” but I will try to explain. If we assume that

“primary CSR” here means hcounteren CSR and “secondary CSR” over hear

means hpmcounterX CSRs then any hpmcounterX CSR access by Guest/VM

which is not permitted by hcounteren CSR will be taken as virtual instruction

trap by hypervisor. Similar CSR trapping mechanisms can be defined for other

CSRs as well. Do this answer your question ??

 

Regards,

Anup

 

Sorry for any confusion.

 

On Sun, Jul 11, 2021 at 10:28 PM Richard Trauben via lists.riscv.org <rtrauben=gmail.com@...> wrote:

Anup. David, Johnathan, All,

 

I agree with your recommendation. The policy to report non-translation access faults (ie. all except pte or access rights) access faults beyond the guest OS exception should belong to the guest OS. The VM/Guest OS can be mandatorily be configured by the hypervisor to synchronously report the event via a guest-OS-to-VM out of band exception handler.

 

When an primary CSR access fault that avoids PTE access rights violations associated with either a memory mapped or IO spaced mapped CSR space due to an access priviledge configured as locked down with a secondary CSR by the hypervisor for hypervisor and above only read/write access be reported?

 

Consider the case where the primary CSR access is to the security enclave primary, the random number generator or the GICC subsystem to map physical interrupts and virtual interrupts,

 

Richard

 

On Sun, Jul 11, 2021 at 8:30 PM Anup Patel <Anup.Patel@...> wrote:

I agree with Jonathan.

 

To add more information:

 

  • Any wrong memory access done by Guest/VM (VS-level) via explicit load/store instruction or via VS-stage page table walk will result in Guest page fault taken by hypervisor. The hypervisor can handle invalid memory access from Guest/VM by either to killing the Guest/VM or by injecting an access fault to Guest/VM where the later is generally preferred approach for hypervisors.
  • Any wrong memory access done by Hypervisor (HS-level) via explicit load/store instruction or via S-stage page table walk or via G-stage page table walk will result in access fault taken by M-level runtime firmware (OpenSBI). The M-level runtime firmware will typically redirect the access fault back to the HS-level software (Hypervisor).

 

Access faults at any level are expected to be less frequent and so don’t require special exception codes for VS-level.

 

Regards,

Anup

 

 

From: <tech-privileged@...> on behalf of Jonathan Behrens <behrensj@...>
Date: Sunday, 11 July 2021 at 3:32 AM
To: "tech-virt-mem@..." <tech-virt-mem@...>, "Vedvyas13686@..." <Vedvyas13686@...>
Cc: Dan Lustig <dlustig@...>, Richard Trauben <rtrauben@...>, "tech-privileged@..." <tech-privileged@...>
Subject: Re: [RISC-V] [tech-privileged] [RISC-V] [tech-virt-mem] Access faults for paging structures linked to hgatp

 

As I understand it, this is intentional. Access faults are likely to be substantially less frequent than page faults so the performance overhead of forwarding them from HS-mode to VS-mode shouldn't be too big a deal.

 

Jonathan

 

 

On Sat, Jul 10, 2021 at 2:49 PM Vedvyas shanbhogue via lists.riscv.org <Vedvyas13686=gmail.com@...> wrote:

Dan, Richard Hi -

Sorry I think you misunderstood my comment. I was not asking whether a
hypervisor should always intercept page faults or timers etc. My
question was very specific to the access faults caused when the page
miss handler walks the second level paging structures on a TLB miss.
You wrote about many example that cause page fault. You did not cover
access faults - exception code (1, 5, and 7).

In the current specification there are page-fault (code 12, 13 and 15)
and a guest-page-fault (code 20, 21, and 23) defined.

However the access faults (code 1, 5, and 7) do not have a "guest" variant.

So consider an implementation that causes an "access fault" on
uncorrectable ECC errors in memory.

When doing a VA->IPA->PA translation, the second level page table
established by the hypervisor is walked to perform the IPA->PA
translation. In this process, the load from the paging structure setup
by the hypervisor (this is not the guest page table) encounters this
uncorrectable ECC error. The HW has to now report a "access fault" but
since there is no guest variant defined, the hypervisor will need to
intercept all access faults, filter the ones that were caused on
access to the hypervisors memory - such as the second level paging
structures - and if the fault was not caused by that re-inject the
exception into the guest.

Was this intentional?

ved

On Thu, Jul 8, 2021 at 1:26 PM Daniel Lustig <dlustig@...> wrote:
>
> Forwarding Richard's response (with his permission)...see below
>
> On 7/8/2021 1:48 PM, Richard Trauben wrote:
> > Damiel,
> >
> >> On an access, fault in the second level page table walk, the guest cannot
> > rectify the problem.
> > I dont believe thats true. Some fraction of guest pages reside in
> > hypervisor space while the
> > others will require action from the hypervisor. .Examples include:
> > 1) guest OS that can move in-use guest page to the guest swap space by
> > managing its part of the table.
> > 2) guests OS can manage the referenced and modified bits of an active guest
> > PTE by periodically clearing
> > the bits to disallow those accesses to the page.
> > 3) guest OS malloc/free can remap pages from a guest resident memory free
> > list.
> > 4) guest OS deliberately setting up an ":yellow" stack page access fault
> > that detects when the
> > initially allocated guest stack space is almost exhausted, while guest OS
> > still has space
> > it can allocate (or reclaim).
> > 5) guest OS processes should be able to run profilers or set code or data
> > address breakpoints.
> > These matching events should call back the guest OS without invoking the
> > hypervisor.
> > 6) guest OS can invalidate (shoot down) existing TLB PTE entries used when
> > some/all guest
> >      processes share a guest managed page,
> >
> > Forcing every guest process access fault to the hypervisor increases
> > latency and reduces thruput.
> >
> > A guest process accesses  passing virtual-to-intermediate PTE checks
> > and failing hypervisor intermediate to physical PTE checks will trap to VH
> > the hypervisor. Data integrity (bus or memory) fails and async fatal error
> > still need to handled by the hypervisor or a more priviledged machine
> > runtime. Whether guest process crypto authentication failures call the guest
> > OS, hypervisor or machine runtime executive can be be configured when the
> > session key is installed. When Key installation or seefing the random
> > number
> > generator is attemped by all but the most trusted level,these may return
> > as access faults where hypervisor is not trusted to field the access
> > fault.
> > A portion of timers can be allocated to guest processes while others are
> > allocated to either the guest OS, the hypervisor or machine priviledge
> > level.
> >
> > Any attempt to overwrite them from a lesser priviledge level is going to
> > create an access fault. Some of these timer can be configured to report
> > these errors to an associated privilege level.
> >
> > Whether a DMA IO-TLB access fault event is reported as an interrupt or
> > collected as an access fault to a specific DMA SW process *e,g, PCIE
> > Multiple  Logical Device Access Exception) is yet another case where
> > async faults could be configured as reported to different guest processes
> > or different guest OS (VMs) or hypervisor or pod orchestration runtime.
> >
> > Regards,
> > Richard Trauben
> >
> > On Thu, Jul 8, 2021 at 8:20 AM Daniel Lustig <dlustig@...> wrote:
> >
> >> Forwarding to tech-privileged so the hypervisor folks can weigh in...
> >>
> >> On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
> >>> Dan Hi -
> >>>
> >>> On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:
> >>>>
> >>>> The algorithm is what is described in Section 4.3.2 and Section 5.5.
> >>>> Page faults are converted into guest page faults, but access faults are
> >>>> still reported as access faults even in the two-level translation case.
> >>>>
> >>>> Hypervisor folks, please correct me if you disagree.
> >>>>
> >>> I did see that and that was why I asked the question. On an access
> >>> fault in the second level page table walk, the guest cannot rectify
> >>> the problem.
> >>> So is the expectation that hypervisors never delegate access faults
> >>> and are expected to analyze all access faults caused during VS/VU
> >>> execution and reflect those that are caused by VS/VU back to the
> >>> guest?
> >>>
> >>> ved
> >>>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>
>
>
>
>