Proposal: Delegating Exceptions from VS-mode or VU-mode to U-mode
Yifei Jiang
Hi all,
In this proposal, we extended N extension and applied it to H extension for improving the performance of virtual I/O devices in the virtualization scenario. We proposed a new mechanism to delegate exceptions from VS-mode/VU-mode to U-mode. A solution is also implemented based on QEMU simulator and KVM virtualization architecture. Evaluation results show that our proposal achieves nearly 2x faster synchronous I/O processing speed than the orginal system.
The attachment is the detailed proposal. Any comments are welcome.
Regards, Yifei |
|
Phil McCoy
If I understand correctly, there is a security issue around the URET instruction. All the controls for URET are accessible from User mode. This is OK in the scenario you describe where U-Mode is trusted (the U-Mode code is the UART emulation, which is effectively just a privilege-reduced portion of the hypervisor).
There does not appear to be any way for the hypervisor to prevent untrusted user-mode code from URET'ing into any arbitrary location in the virtualized guest (VS/VU) software by programming uepc and ustatus. |
|
Hi,
The context switching overhead for VirtIO interrupts that this proposal is trying to solve is already solved across architectures by:
The point2 is intentionally not done for KVM RISC-V because we don’t want in-kernel PLIC emulation due to lack of MSI support and Virtualization support in PLIC. Also, the in-kernel PLIC emulation will become redundant once new interrupt controller spec is available. We will go for in-kernel interrupt controller emulation once new interrupt controller spec is available.
Is there any other advantage of this proposal over VHost + In-kernel interrupt-controller ??
Regards, Anup
From: tech-privileged@... <tech-privileged@...>
On Behalf Of Yifei Jiang via lists.riscv.org
Sent: 18 September 2020 13:33 To: tech-privileged@... Subject: [RISC-V] [tech-privileged] Proposal: Delegating Exceptions from VS-mode or VU-mode to U-mode
Hi all,
In this proposal, we extended N extension and applied it to H extension for improving the performance of virtual I/O devices in the virtualization scenario. We proposed a new mechanism to delegate exceptions from VS-mode/VU-mode to U-mode. A solution is also implemented based on QEMU simulator and KVM virtualization architecture. Evaluation results show that our proposal achieves nearly 2x faster synchronous I/O processing speed than the orginal system.
The attachment is the detailed proposal. Any comments are welcome.
Regards, Yifei |
|
andrew@...
The N extension is effectively deprecated. We don’t see sufficient demand for user-level interrupts in managed/Unix-like environments to pursue that approach at this time. We do see demand for this feature in embedded systems with two levels of privilege. In this case, the proposal is to run the trusted piece in M-mode and the less-trusted piece in “bare S-mode”: i.e., S-mode without virtual memory. This approach covers the interesting use cases without additional architectural complication, and in particular, the hypervisor architecture need not care.
|
|
Gernot <gernot.heiser@...>
On 19 Sep 2020, at 15:53, Andrew Waterman <andrew@...> wrote:
That’s a real concern, and I seem to have missed any discussion about this. I suspect that this also reflects a confusion resulting from the use of “Unix-like environments”, which seems to get applied to any protected-mode OS.
User-level drivers are a core properties of (well-designed) microkernels, and microkernels are pretty much the only choice for safety- and security-critical systems, and the only kind of OS that is feasible to prove correct (see seL4).
And if you run drivers in user mode, then being able to route interrupts directly to the user-level handlers without invoking the kernel would seem to pretty much eliminate the performance disadvantage microkernels have compared to Linux. In fact,
my immediate reaction seeing this extension was “yeah!”.
Gernot
|
|
David Horner
On 2020-09-19 5:29 a.m., Gernot wrote:
On 19 Sep 2020, at 15:53, Andrew Waterman <andrew@...> wrote: I don't know if these developments will address your concerns, but I see the synergy of these two task groups, Fast-interrupts and Code Size Reduction, as potentially reducing the overhead for M-mode to initiate and manage drivers running in U-mode. I invite you to join these groups to contribute in directing
their ability to support this important segment of the ecosystem.
As I have aligned aspirations,I'm hoping that your response will,
along with mine, be "yeah!" : that running micro-kernel (and other like) components in U-mode will be a first class supported reality obtained through these initiatives.
|
|
Jonathan Behrens <behrensj@...>
On Sat, Sep 19, 2020 at 5:29 AM Gernot via lists.riscv.org <gernot.heiser=data61.csiro.au@...> wrote:
You can achieve the same thing using the hypervisor extension analogously to how M/U systems can avoid user-level interrupts by switching to M/S/U. Instead of running the kernel in S-mode and drivers in U-mode, run the kernel in HS-mode and the drivers in VS-mode. Jonathan |
|
John Hauser
Gernot wrote:
User-level drivers are a core properties of (well-designed)Aside from the fact that doing this requires some version of the quasi-deprecated N extension be implemented in addition to the hypervisor extension, the main problem with this idea is how the hardware decides to send memory access traps to U mode versus HS mode. The delegation provided by sideleg is too crude to suffice for this purpose. Instead, the choice must be encoded on a per-page basis in the G-stage page tables---which is what Huawei's proposal does, naturally enough. But if our real goal is for virtual machines to run as fast as possible, it seems to me the more important subgoal is to minimize the number of times when memory accesses to a virtual device must be trapped and emulated, by maximizing the opportunity for a guest OS to directly control physical devices without emulation. The hardware components needed for this include the new interrupt architecture that is being developed, plus a sufficiently capable IOMMU, a proposal for which is being drafted by a different informal group of interested parties. Even with this new hardware (in whatever form it actually becomes standard for RISC-V), we can expect that some need to trap-and-emulate for virtual devices will remain. But at the current time, I don't know how we can predict very well the performance cost of those traps that remain, or how much improvement we would get from adopting Huawei's proposal. For all I know, it may be that, once we have the new interrupt architecture and an IOMMU, the added improvement from Huawei's proposal is barely noticeable. Maybe it will be, and maybe it won't. All I'm suggesting is, it would be better to evaluate that choice after these other essential pieces are in place, and after all the relevant software has been completed and optimized, as Anup Patel spoke of. One counterargument might be to claim that there is sufficient long- term market interest in supporting the hypervisor extension as best as possible _without_ the new interrupt architecture and IOMMU. I'll leave it to others to try to make that case. - John Hauser |
|
Gernot <gernot.heiser@...>
OK, we had a closer look at the N extension, and it doesn’t do much for us, should have looked closer in the first place :-(
toggle quoted message
Show quoted text
Hence I don’t particularly care about this one – sorry for the unnecessary noise. Gernot On 20 Sep 2020, at 06:22, John Hauser <jh.riscv@...> wrote: |
|
Yifei Jiang
Hi,
Thanks for your comment.
To solve the security problem about the URET instruction, we further add a field, called HUR, in hstatus to control the behavior of URET instruction. When the hstatus.HUR=1, the privilege mode can switch back to VS-mode/VU-mode by executing the URET instruction. Otherwise, the execution of URET instruction causes an illegal instruction trap. The idea is similar to the field hstatus.HU.
The hstatus.HUR is set by the hypervisor only when the vCPU is loaded, and it is cleared only when the vCPU is put. In this case, the vCPU is regarded as a trusted task. So, untrusted user-level tasks can not switch to VS-mode/VU-mode by the URET instruction.
Regards,
Yifei |
|
Yifei Jiang
Hi John,
Yes, we believe that the new interrupt architecture and an IOMMU, which is regarded as a pass-through method as we knew, can indeed minimize the number of times of GuestOS exiting. However, this proposal targets to optimize user-level virtual devices that require to be implemented using the trap-and-emulated paradigm, such as rtc and UART.
When accessing these devices, I/O path overheads caused by context switches between Guest OS and Host OS always dominate in the whole trap-and-emulated overheads.
Therefore, our proposal can improve the performance by kernel-bypassing I/O paths. According to our experiments performed on the RISC-V QEMU emulator and KVM hypervisor, the performance of UART implemented using our solution becomes 2X faster than the one in the original system.
Regards,
Yifei |
|
Yifei Jiang
Hi Anup,
Sorry for the confusing description of the background in this proposal. To be clear, we here divide the implementation of trap-and-emulated I/O devices into full emulated I/O devices and paravirtual I/O devices. This proposal only focuses on improving the performance of full emulated I/O devices implemented in userspace, such as UART. Besides, we believe that other mmio emulation devices which require Guest OS to trap and exit to userspace, such as rtc, can also benefit from our proposal.
Paravirtual I/O devices, which require the GuestOS to trap and send in-kernel interrupts to interact with VirtIO backend, i.e., vHost you mentioned, are not considered in this proposal.
Regards,
Yifei |
|
Hi Yifei,
Both full emulated I/O devices (e.g. UART, RTC, PCIe devices, etc) and paravirtual I/O devices (e.g. VirtIO devices, XenPV devices etc) have MMIO registers which are trap-n-emulated by Hypervisors. The paravirtual I/O devices are carefully designed devices where MMIO register programming is minimum so that MMIO trap-n-emulate overhead is minimum across hypervisors.
In KVM hypervisor, the full emulated I/O devices are usually emulated in KVM user-space whereas paravirtual I/O devices and pass-through devices are usually provided from KVM kernel-space (Refer, VHost, IRQFD, VFIO, etc). The interrupt-controller and timer are the only critical full emulated I/O devices so for most architectures we have KVM in-kernel emulation of interrupt-controller and timer. In case of KVM RISC-V, the timer emulation is totally in-kernel whereas we are waiting for new interrupt-controller spec for KVM RISC-V in-kernel interrupt-controller emulation.
Over decades, the focus across architectures (x86, ARM, PowerPC, etc) and across hypervisors (KVM, Xen, HyperV, VMWare, etc) have been to maximize use of paravirtual I/O devices (i.e. VirtIO, XenPV, etc) and pass-through I/O devices for performance. Modern Guest/VMs usually have mix of pass-through I/O devices, paravirtual I/O devices and full emulated I/O devices where the full emulated I/O devices are only used for less critical things such as UART, RTC, etc. Due to this, none of the architectures have focused on accelerating user-space emulation full emulate I/O devices for KVM (Type2) hypervisors.
If the focus of this proposal is to only improve trap-n-emulate performance of less-critical full emulated I/O devices in KVM user-space then it won’t have any significant impact because:
Regards, Anup
From: tech-privileged@... <tech-privileged@...>
On Behalf Of Yifei Jiang via lists.riscv.org
Sent: 21 September 2020 19:21 To: tech-privileged@... Subject: Re: [RISC-V] [tech-privileged] Proposal: Delegating Exceptions from VS-mode or VU-mode to U-mode
Hi Anup,
Sorry for the confusing description of the background in this proposal. To be clear, we here divide the implementation of trap-and-emulated I/O devices into full emulated I/O devices and paravirtual I/O devices. This proposal only focuses on improving the performance of full emulated I/O devices implemented in userspace, such as UART. Besides, we believe that other mmio emulation devices which require Guest OS to trap and exit to userspace, such as rtc, can also benefit from our proposal.
Paravirtual I/O devices, which require the GuestOS to trap and send in-kernel interrupts to interact with VirtIO backend, i.e., vHost you mentioned, are not considered in this proposal.
Regards, Yifei |
|
Hi Anup,
If I understand correctly, this proposal will not cause the problem of your 2nd point because not all the MMIO will trap to user space in this proposal. The proposal still allow MMIO traps to kernel.
We can use the PTE.MMIO field in this proposal only for the MMIO traps that required to be delegated to user-level. They will not be transferred by the G-stage.
The in-kernel MMIO traps will go to the G-stage because their PTE.MMIO is clear by the hypervisor. Then it will cause page fault on G-stage and trap to kernel as usual.
Regards,
Jiahao |
|