Jonathan Behrens <behrensj@...>
toggle quoted message
Show quoted text
As I understand it, this is intentional. Access faults are likely to be substantially less frequent than page faults so the performance overhead of forwarding them from HS-mode to VS-mode shouldn't be too big a deal.
Dan, Richard Hi -
Sorry I think you misunderstood my comment. I was not asking whether a
hypervisor should always intercept page faults or timers etc. My
question was very specific to the access faults caused when the page
miss handler walks the second level paging structures on a TLB miss.
You wrote about many example that cause page fault. You did not cover
access faults - exception code (1, 5, and 7).
In the current specification there are page-fault (code 12, 13 and 15)
and a guest-page-fault (code 20, 21, and 23) defined.
However the access faults (code 1, 5, and 7) do not have a "guest" variant.
So consider an implementation that causes an "access fault" on
uncorrectable ECC errors in memory.
When doing a VA->IPA->PA translation, the second level page table
established by the hypervisor is walked to perform the IPA->PA
translation. In this process, the load from the paging structure setup
by the hypervisor (this is not the guest page table) encounters this
uncorrectable ECC error. The HW has to now report a "access fault" but
since there is no guest variant defined, the hypervisor will need to
intercept all access faults, filter the ones that were caused on
access to the hypervisors memory - such as the second level paging
structures - and if the fault was not caused by that re-inject the
exception into the guest.
Was this intentional?
On Thu, Jul 8, 2021 at 1:26 PM Daniel Lustig <dlustig@...> wrote:
> Forwarding Richard's response (with his permission)...see below
> On 7/8/2021 1:48 PM, Richard Trauben wrote:
> > Damiel,
> >> On an access, fault in the second level page table walk, the guest cannot
> > rectify the problem.
> > I dont believe thats true. Some fraction of guest pages reside in
> > hypervisor space while the
> > others will require action from the hypervisor. .Examples include:
> > 1) guest OS that can move in-use guest page to the guest swap space by
> > managing its part of the table.
> > 2) guests OS can manage the referenced and modified bits of an active guest
> > PTE by periodically clearing
> > the bits to disallow those accesses to the page.
> > 3) guest OS malloc/free can remap pages from a guest resident memory free
> > list.
> > 4) guest OS deliberately setting up an ":yellow" stack page access fault
> > that detects when the
> > initially allocated guest stack space is almost exhausted, while guest OS
> > still has space
> > it can allocate (or reclaim).
> > 5) guest OS processes should be able to run profilers or set code or data
> > address breakpoints.
> > These matching events should call back the guest OS without invoking the
> > hypervisor.
> > 6) guest OS can invalidate (shoot down) existing TLB PTE entries used when
> > some/all guest
> > processes share a guest managed page,
> > Forcing every guest process access fault to the hypervisor increases
> > latency and reduces thruput.
> > A guest process accesses passing virtual-to-intermediate PTE checks
> > and failing hypervisor intermediate to physical PTE checks will trap to VH
> > the hypervisor. Data integrity (bus or memory) fails and async fatal error
> > still need to handled by the hypervisor or a more priviledged machine
> > runtime. Whether guest process crypto authentication failures call the guest
> > OS, hypervisor or machine runtime executive can be be configured when the
> > session key is installed. When Key installation or seefing the random
> > number
> > generator is attemped by all but the most trusted level,these may return
> > as access faults where hypervisor is not trusted to field the access
> > fault.
> > A portion of timers can be allocated to guest processes while others are
> > allocated to either the guest OS, the hypervisor or machine priviledge
> > level.
> > Any attempt to overwrite them from a lesser priviledge level is going to
> > create an access fault. Some of these timer can be configured to report
> > these errors to an associated privilege level.
> > Whether a DMA IO-TLB access fault event is reported as an interrupt or
> > collected as an access fault to a specific DMA SW process *e,g, PCIE
> > Multiple Logical Device Access Exception) is yet another case where
> > async faults could be configured as reported to different guest processes
> > or different guest OS (VMs) or hypervisor or pod orchestration runtime.
> > Regards,
> > Richard Trauben
> > On Thu, Jul 8, 2021 at 8:20 AM Daniel Lustig <dlustig@...> wrote:
> >> Forwarding to tech-privileged so the hypervisor folks can weigh in...
> >> On 6/30/2021 8:10 PM, Vedvyas shanbhogue wrote:
> >>> Dan Hi -
> >>> On Wed, Jun 30, 2021 at 3:24 PM Dan Lustig <dlustig@...> wrote:
> >>>> The algorithm is what is described in Section 4.3.2 and Section 5.5.
> >>>> Page faults are converted into guest page faults, but access faults are
> >>>> still reported as access faults even in the two-level translation case.
> >>>> Hypervisor folks, please correct me if you disagree.
> >>> I did see that and that was why I asked the question. On an access
> >>> fault in the second level page table walk, the guest cannot rectify
> >>> the problem.
> >>> So is the expectation that hypervisors never delegate access faults
> >>> and are expected to analyze all access faults caused during VS/VU
> >>> execution and reflect those that are caused by VS/VU back to the
> >>> guest?
> >>> ved