Re: RISC-V H-extension freeze consideration
Greg Favor
On Sat, May 29, 2021 at 12:44 PM John Hauser <jh.riscv@...> wrote: I wrote: Note that x86 TSO ordering is not as strong as x86 I/O ordering. And ARM "Normal memory" ordering is of course even weaker. Also, on x86 one gets TSO ordering only when using cacheable memory types (i.e. WB) - so no TSO-ordered noncacheable "I/O". TSO also allows speculative reads and (in-order) write combining - that are anathema to x86 UC and ARM Device "I/O" memory types. If Arm has a way to declare a page as "I/O", is that different than the The G-stage "I/O" bit as I understood it is to allow a hypervisor to map a guest "I/O" page onto main memory but have the guest's accesses be treated with I/O-style strong ordering. (This is something that neither x86 nor ARM support.) But if what is intended is for the guest accesses to be treated completely as I/O accesses, then that doesn't require this special bit. The G-stage PTE simply specifies the equivalent of x86 UC or ARM Device memory type. (But if the guest was already specifying UC/Device in its stage 2 PTE because it thinks it is talking to I/O, then everything was fine to begin with.) Which leads me back to what I understood this G-stage "I/O" bit to be about (and my issues with it). Greg |
|
Re: RISC-V H-extension freeze consideration
John Hauser
I wrote:
Emulating an embedded system within a virtual machine is something weGreg Favor: It seems like this G-stage "I/O" bit is going down a questionable rabbitDoesn't the x86 architecture have "total store ordering", and wouldn't that fact make the matter moot for it? If Arm has a way to declare a page as "I/O", is that different than the "virtual I/O" bit being debated for RISC-V G-stage address translation? - Supports "legacy" situations that it is unclear who would actually care- John Hauser |
|
Re: RISC-V H-extension freeze consideration
Jonathan Behrens <behrensj@...>
"Old video cards and network cards" is a completely fair answer! If the PTE bit isn't needed otherwise, this seems like a reasonable use. Jonathan > What sort of device exposes regions of memory in I/O space? When I think of |
|
Re: RISC-V H-extension freeze consideration
John Hauser
What sort of device exposes regions of memory in I/O space? When I think ofHistorically, video cards and network cards definitely had memory buffers in what RISC-V would consider I/O space. Yes, typical video and networking hardware may work differently today, but can we be certain there are absolutely no such devices any more of any kind that we need to care about? And will never be in the future, either? I'd be fine if the answer is "yes", but I'm sure not willing to commit to that answer solely on my own incomplete knowledge. - John Hauser |
|
Re: RISC-V H-extension freeze consideration
Jonathan Behrens <behrensj@...>
What sort of device exposes regions of memory in I/O space? When I think of hypervisors emulating devices, all their registers typically do stuff when you write to them. Jonathan Anup Patel wrote: |
|
Re: RISC-V H-extension freeze consideration
John Hauser
Anup Patel wrote:
Why do we need to re-purpose G-bit because we alreadyTo repeat myself: The issue concerns when a hypervisor is emulating aThe PMAs aren't correct in this situation. - John Hauser |
|
Re: RISC-V H-extension freeze consideration
toggle quoted message
Show quoted text
-----Original Message-----From hypervisor perspective, the "G" bit in G-stage PTEs is not used at all. For software emulated MMIO, the hypervisor does not create any mapping in the G-stage to ensure that it always traps which allows hypervisor to trap-n-emulate it. For pass-through MMIO (such as IMSIC guest MSI files directly accessed by Guest), the guest physical address translates to host physical address of actual MMIO device in the G-stage and we will have host PMAs which will mark all MMIO devices as IO regions. At this point, the G bit in the G-stage PTE is unused from software perspective. Why do we need to re-purpose G-bit because we already have PMAs marking all MMIO addresses as I/O region ? Regards, Anup |
|
Re: RISC-V H-extension freeze consideration
John Hauser
Paolo Bonzini wrote:
If the answer to any of the above three questions is no, what can beThere is at least one small but significant change to the hypervisor extension being discussed, to redefine the "G" bit in G-stage address translation PTEs to indicate that a page of guest physical address space is "virtual I/O", meaning the hardware must order VM accesses to those addresses as though they were I/O accesses, not main memory. Another minor change planned is to have attempts to write a strictly read-only CSR always raise an illegal instruction exception, instead of sometimes raising a virtual instruction exception as currently specified. The reason there has been no movement on the hypervisor extension for several months is not because there is totally nothing to do, but because I've lacked the time to attend to it simultaneously with a thousand other things. If you'd like more progress on the hypervisor extension, feel free to drive the discussion to get agreement one way or another on the first point, the "I/O" bit in G-stage PTEs. The issue concerns when a hypervisor is emulating a device that has memory that is supposed to be in I/O space but is actually being emulated using main memory. A guest OS expects accesses to that virtual device memory to be in I/O space and ordered according to the I/O rules, but that's not currently what happens. - John Hauser |
|
Re: RISC-V H-extension freeze consideration
Greg Favor
Paolo, Thanks for the prodding. It's a good reminder as we are all caught up with pushing many things forward. I won't try to provide off-the-cuff answers to your questions, but instead I'll say that Andrew, John, and myself need to meet next week, discuss where we stand on these questions, and flesh out a plan for getting from here to freeze (and then to ratification). As you note, time is quickly passing by and will soon start running short. Lastly, I'll note that the principal factor in Andrew's and John's minds - in holding off H freeze - has been to see enough software support, PoCs, and stability in related arch specs that could affect H (e.g. AIA), before freezing H and then later discovering a regrettable problem. But that obviously can't go on forever. Hence time for the three of us to sort out the path from here to freeze. Greg On Fri, May 28, 2021 at 4:54 AM Paolo Bonzini <pbonzini@...> wrote: > So, in my own opinion, we're getting close. Not a few weeks, but not quarters either. (I'll also say that the "pressure is on" to intelligently try and get through this period of time sooner than later.) |
|
Re: RISC-V H-extension freeze consideration
Paolo Bonzini
> So, in my own opinion, we're getting close. Not a few weeks, but not quarters either. (I'll also say that the "pressure is on" to intelligently try and get through this period of time sooner than later.)
A quarter has passed, so we're 50% of the way towards talking "quarters". So please let me ask three questions: - Has any insight been formed in the AIA specification as to whether it will require changes to the Hypervisor specification, and whether these changes can be done as part of the AIA specification (just like pointer masking is already defining vs* CSRs?) - Has any list been made of which extensions should be frozen before the Hypervisor extension, and is there a clear path towards freezing them in a reasonable time period? Does this list include pointer masking, and if so why (considering that pointer masking is already being specified as if the Hypervisor extension is frozen or ratified first)? - Is there any date being set for whatever meetings are needed to freeze the Hypervisor extension after all the dependendencies are frozen? If the answer to any of the above three questions is no, what can be done to avoid the frankly ludicrous delay in the approval of a specification that has seen no significant change in one year? Thanks, Paolo |
|
Re: proposal for stateen CSRs
John Hauser
Hi all,
Remember the stateen CSR proposal from last month? I have an updated version now that I believe addresses all concerns that were expressed. See below. If you'd like to avoid reading all of it again, the substantive changes are these: - I've tried to make it clear that when access to state is prevented, you get an illegal instruction exception. (Or, when executing in a virtual machine, you may get a virtual instruction exception instead, per the to-be-documented general rule for virtual instruction exceptions.) - I've allocated bit 1 in each stateen0 CSR to control access to "all custom state", shifting up what were previously bits 1 and 2 to now be bits 2 and 3. The earlier E-mail back-and-forth about enabling/disabling all access to custom state has been rendered moot because of a combination of facts I realized: First, a write to a stateen0 CSR that changes bit 1 (the custom bit) from 0 to 1 can conceivably be taken as a trigger for initializing custom registers so that all custom features are enabled and made accessible. Second, people creating custom hardware already have the freedom to build their hardware to respond to this bit transition however they desire, including in ways that perhaps not everyone on this list would approve. Consequently, I no longer see a need to provide a hook to trigger the initialization of custom hardware for full access; it's already inherently possible without any extra explicit mechanism. I look forward to any new feedback, but I'm hopeful that the group can agree on this version of the spec for merging into the latest Privileged ISA draft (with numerous editorial changes for context). Regards, - John Hauser ---------------------------------------- The following is a proposal for an addition to the main Privileged Architecture (not a separately named extension). -------------------- Motivation Currently, the implementation of optional RISC-V extensions has the potential to open covert channels between separate user threads, or between separate guest OSes running under a hypervisor. The problem occurs when an extension adds processor state---usually explicit registers, but possibly other forms of state---that the main OS or hypervisor is unaware of (and hence won't context-switch) but that can be modified/written by one user thread or guest OS and perceived/ examined/read by another. Consider, for example, that the N extension is someday ratified by the RISC-V Association, and a hart implements both S mode and the N extension, with misa.N hardwired = 1. The OS in use on this hart might be oblivious to the N extension and hence might not test for the extension or pay any attention to the eight CSRs it adds to the ISA: ustatus, uie, utvec, uscratch, uepc, ucause, utval, and uip. In that case, most of these CSRs provide an obvious covert channel between user threads. Although traditional practices might consider such a communication channel harmless, the intense focus on security today argues that a means be offered to plug such channels. The F registers of the RISC-V floating-point extensions and the V registers of the vector extension would also be potential covert channels, except for the existence of the FS and VS fields in the sstatus register. An OS that is unaware of, say, the vector extension and its V registers will unwittingly prevent access to those registers by initializing unknown fields of sstatus to zeros, which in this case will include the VS field. Obviously, one way to prevent the use of the N extension's CSRs as a covert channel would be to add to sstatus an "NS" field for the N extension, paralleling the V extension's VS field. However, this is not considered a general solution to the problem due to the number of potential future extensions that may add small amounts of state. Even with a 64-bit sstatus (necessitating adding sstatush for RV32), it is not certain there are enough remaining bits in sstatus to accommodate all future extensions. In any event, there is no need to strain sstatus (and add sstatush) for this purpose. The "enable" flags that are needed to plug covert channels are not generally expected to require swapping on context switches of user threads, making them a less-than-compelling candidate for inclusion in sstatus. Hence, a new place is proposed for them instead. -------------------- Proposal RV64 harts that conform to the RISC-V Privileged Architecture may optionally implement four new 64-bit CSRs at machine level, listed with their CSR addresses: 0x30C mstateen0 (Machine State Enable 0) 0x30D mstateen1 0x30E mstateen2 0x30F mstateen3 If supervisor mode is implemented, another four CSRs would be defined at supervisor level: 0x10C sstateen0 0x10D sstateen1 0x10E sstateen2 0x10F sstateen3 And if the hypervisor extension is implemented, another set of CSRs is added: 0x60C hstateen0 0x60D hstateen1 0x60E hstateen2 0x60F hstateen3 If any "stateen" CSRs is implemented, they must all be implemented for their respective modes. For RV32, the registers listed above are 32-bit, and for the machine-level and hypervisor CSRs there is a corresponding set of high-half CSRs for the upper 32 bits of each register: 0x31C mstateen0h 0x31D mstateen1h 0x31E mstateen2h 0x31F mstateen3h 0x61C hstateen0h 0x61D hstateen1h 0x61E hstateen2h 0x61F hstateen3h For the supervisor-level sstateen registers, high-half CSRs are not added at this time because it is expected the upper 32 bits of these registers will always be zeros, as explained later below. Each bit of a sstateen CSR controls lower-privilege access to an extension's state, for an extension that was not deemed "worthy" of a full XS field in sstatus like the FS and VS fields for the F and V extensions. The number of registers provided at each level is four because it is believed that 4 * 64 = 256 bits for machine and hypervisor levels, and 4 * 32 = 128 bits for supervisor level, will be adequate for many years to come, perhaps for as long as the RISC-V ISA is in use. The exact number four is an attempted compromise between providing too few bits on the one hand and going overboard with CSRs that will never be used on the other. A possible future doubling of the number of stateen CSRs is covered later. The stateen registers at each level control access to state at all lower privilege levels, but not at its own level. This is analogous to how the existing counteren CSRs control access to performance counter registers. Just as with the counteren CSRs, when a stateen CSR prevents access to state by lower privilege levels, an attempt in one of those privilege modes to execute an instruction that would read or write the protected state raises an illegal instruction exception, or, if executing in VS or VU mode and the circumstances for a virtual instruction exception apply, raises a virtual instruction exception instead of an illegal instruction exception. When a stateen CSR prevents access to state for a privilege mode, attempting to execute in that privilege mode an instruction that implicitly updates the state without reading it may or may not raise an illegal instruction or vitual instruction exception. Such cases must be disambiguated by being explicitly specified one way or the other. In many cases, the various bits of the stateen CSRs will have a dual purpose as enables for the ISA extensions that introduce the controlled state. Each bit of a supervisor-level sstateen CSR controls user-level access (from U mode or VU mode) to an extension's state. The intention is to allocate the bits of sstateen CSRs starting at the least-significant end, bit 0, through to bit 31, and then on to the next-higher-numbered sstateen CSR. For every bit with a defined purpose in an sstateen CSR, the same bit is defined in the matching mstateen CSR to control access below machine level to the same state. The upper 32 bits of an mstateen CSR (or for RV32, the corresponding high-half CSR) control access to state that is inherently inaccessible to user level, so no corresponding enable bits in the supervisor-level sstateen CSR are applicable. The intention is to allocate bits for this purpose starting at the most-significant end, bit 63, through to bit 32, and then on to the next-higher mstateen CSR. If the rate that bits are being allocated from the least-significant end for sstateen CSRs is sufficiently low, allocation from the most- significant end of mstateen CSRs may be allowed to encroach on the lower 32 bits before jumping to the next-higher mstateen CSR. In that case, the bit positions of "encroaching" bits will remain forever read- only zeros in the matching sstateen CSRs. With the hypervisor extension, the hstateen CSRs have identical encoding to the mstateen CSRs, except controlling accesses for a virtual machine (from VS and VU modes). Bits in any stateen CSR that are defined to control state that a hart doesn't implement are read-only zeros for that hart. Likewise, all reserved bits not yet given a defined meaning are also read-only zeros. For every bit in an mstateen CSR that is zero (whether read-only zero or set to zero), the same bit appears as read-only zero in the matching hstateen and sstateen CSRs. For every bit in an hstateen CSR that is zero (whether read-only zero or set to zero), the same bit appears as read-only zero in sstateen when accessed from a virtual machine. On reset, all mstateen bits are initialized by the hardware to zeros. If machine-level software changes these values, it is responsible for initializing the matching hstateen and sstateen CSRs to zeros too. Software at each privilege level should set its respective stateen CSRs to indicate the state it is prepared to allow lower-privilege software to access. For OSes and hypervisors, this usually means the state that the OS or hypervisor is prepared to swap on a context switch, or to manage in some other way. Implementing the stateen CSRs is optional (though platform standards can always make them mandatory). When the stateen CSRs are not implemented, all state added by an extension is accessible as defined by that extension. For each mstateen and hstateen CSR, bit 63 is defined to control access to the matching supervisor-level sstateen CSR. That is, bit 63 of mstateen0 and hstateen0 controls access to sstateen0; bit 63 of mstateen1 and hstateen1 controls access to sstateen1; etc. A hypervisor may need this control over accesses to the sstateen CSRs if it ever must emulate for a virtual machine an extension that is supposed to be affected by a bit in an sstateen CSR. (Even if such emulation is uncommon, it shouldn't be excluded.) Machine-level software needs identical control to be able to emulate the hypervisor extension. (That is, machine level needs control over accesses to the supervisor-level sstateen CSRs in order to emulate the hstateen CSRs, which have such control.) If the hypervisor extension is not implemented and a supervisor-level sstateen CSR is all read-only zeros, an implementation may make bit 63 of the matching mstateen read-only zero. In that case, machine-level software should preferably emulate attempts to access the affected sstateen CSR from S mode, ignoring writes and returning zero for reads. Initially, the following bits are proposed to be defined in sstateen0, mstateen0, and hstateen0: bit 0 QUERY data bit 1 All custom state bit 2 fcsr for Zfinx and related extensions (Zdinx, etc.) bit 3 Tentatively reserved for the N extension As a special case, bit 0 is used to control access to the information returned by the optional QUERY instruction, even though this cannot act as a covert channel between multiple threads or guest OSes. For more, see the documentation for the QUERY instruction. Bit 1 controls access to any and all custom state. Bit 2 applies only for the case when floating-point instructions operate on X registers instead of F registers. Whenever misa.F = 1, bit 2 of mstateen0 is read-only zero (and hence read-only zero in hstateen0 and sstateen0 too). For convenience, when the stateen CSRs are implemented and misa.F = 0, then if bit 2 of a relevant stateen0 CSR is zero, _all_ floating-point instructions cause an illegal instruction trap (or virtual instruction trap, if relevant), as though they all touch fcsr, regardless of whether they really do. In addition to the bits listed above for user-accessible state, the following are also proposed initially for mstateen0 and hstateen0: bit 61 Reserved for the RISC-V Advanced Interrupt Architecture bit 62 stimecmp, vstimecmp of Sstc extension bit 63 sstateen0 -------------------- Usage After the machine-level mstateen CSRs are initialized to zeros on reset, machine-level software can set bits in these registers to enable lower-privilege access to the controlled state. This may be either because machine-level software knows how to swap the state or, more likely, because machine-level software isn't swapping supervisor-level environments. (Recall that the main reason the mstateen CSRs must exist is so machine level can emulate the hypervisor extension. When machine level isn't emulating the hypervisor extension, it is likely there will be no need to keep any mstateen bits zero.) If machine level sets any mstateen bits to nonzero, it must initialize the matching hstateen CSRs to zeros if the hypervisor extension is implemented. And if any mstateen bits that are set to one have matching bits in the sstateen CSRs, machine-level software must initialize those sstateen CSRs to zeros. Ordinarily, machine-level software will want to set bit 63 of each mstateen CSR, necessitating that it zero all hstateen and sstateen CSRs. An OS at supervisor level should see the sstateen CSRs initialized to zeros when the OS starts. It can set bits in these registers to enable user-level access to the controlled state, presumably because the OS knows how to context-swap the state. For the sstateen CSRs whose access by a guest OS is permitted by bit 63 of the corresponding hstatus CSRs, a hypervisor must include the sstateen CSRs in the context it swaps for a guest OS. When it starts a new guest OS, it must ensure those sstateen CSRs are initialized to zeros, and it must emulate accesses to any other sstateen CSRs. -------------------- Possible expansion If a need is anticipated, the set of stateen CSRs could in the future be doubled by adding these: 0x38C mstateen4 0x39C mstateen4h 0x38D mstateen5 0x39D mstateen5h 0x38E mstateen6 0x39E mstateen6h 0x38F mstateen7 0x39F mstateen7h 0x18C sstateen4 0x18D sstateen5 0x18E sstateen6 0x18F sstateen7 0x68C hstateen4 0x69C hstateen4h 0x68D hstateen5 0x69D hstateen5h 0x68E hstateen6 0x69E hstateen6h 0x68F hstateen7 0x69F hstateen7h These additional CSRs are not a definite part of the original proposal because it is unclear whether they will ever be needed, and it is believed the rate of consumption of bits in the first group, registers numbered 0-3, will be slow enough that any looming shortage will be perceptible many years in advance. At the moment, it is not known even how many years it may take to exhaust just mstateen0, sstateen0, and hstateen0. |
|
Re: [RISC-V] [tech-tee] [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP)
Dong Du
Hi Robin, As far as I know, currently H-extension does allow hypervisor to access VS memory through, HLV (hypervisor virtual-machine load), HSV (hypervisor virtual-machine store) and, HLVX. However, I am not sure whether there are some mechanisms for Execution Prevention. My thought was to keep consistent with existing strategies in H-ext. But I do not think it's reasonable to insert entries related to VM into MPU, as MPU is used in scenarios without paging/virtualization (in most cases I believe). All the best, Dong ------------------ Original ------------------ Date: Thu, May 6, 2021 10:05 PM To: "Nick Kossifidis"<mick@...>; Cc: "bichengyang"<bichengyang@...>; "tech-privileged"<tech-privileged@...>; "tech-tee"<tech-tee@...>; "杜东"<dudong@...>; "anup.patel"<anup.patel@...>; Subject: Re: [RISC-V] [tech-tee] [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP) Hi, Nick It's much clear. Besides mpu*, we also need vsmpu* and hgmpu*. Another concern is, how to prevent HS-mode hypervisor from accessing VS-mode memory(i.e., high privileged access to low privileged memory )? The SUM bit(SUM=0) can prevent access from HS-mode to U-mode or VS-mode to VU-mode, it can not prevent access from HS-mode to VS-mode, since the S bit in mpu* table can only distinguish S-mode and U-mode. A simple way is to insert entres for VM areas in mpu table and disallow HS-mode to acess? Regards, Robin
|
|
Re: [RISC-V] [tech-tee] [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP)
Robin Zheng <zhengwenbin.zwb@...>
Hi, Nick It's much clear. Besides mpu*, we also need vsmpu* and hgmpu*. Another concern is, how to prevent HS-mode hypervisor from accessing VS-mode memory(i.e., high privileged access to low privileged memory )? The SUM bit(SUM=0) can prevent access from HS-mode to U-mode or VS-mode to VU-mode, it can not prevent access from HS-mode to VS-mode, since the S bit in mpu* table can only distinguish S-mode and U-mode. A simple way is to insert entres for VM areas in mpu table and disallow HS-mode to acess? Regards, Robin
|
|
Re: [RISC-V] [tech-tee] [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP)
mick@...
Στις 2021-05-06 08:04, Robin Zheng via lists.riscv.org έγραψε:
Hi, NickThe goal of SMAP/SMEP (or HSMAP/HSMEP in this case) is to prevent accidental access / execution of memory controlled by a low privileged level, from a higher privileged one (e.g. due to a bug), but for this to happen both privilege levels need to have the same view of the virtual memory or at least share some memory regions / mappings. This doesn't apply here, in the general case the host and the guest will use a different set of physical regions / page tables, and when hgatp != bare, they'll also have a different view of the physical memory. So for accessing the memory of the guest, even when hgatp = bare, the host would need to create shared regions / mappings, and that's not accidental (it can only be accidental when satp = vsatp = hgatp = bare, a highly unlikely scenario), if an attacker manages to compromise the hypervisor like this it's game over. We can't do much in the scenario of a compromised/malicious host, it's still an open issue and the way to solve it is probably to use some form of attestation to prove to the guest that the host is trustworthy (but that's another discussion). The problem is when hgatp = bare, in which case a malicious guest will be able to access the memory of other guests and / or the host by creating mappings to the same physical memory. With the current spec it's possible to prevent this using PMP/ePMP on M-mode, my point is that it makes more sense to e.g. have a set of hmpu* registers and let the hypervisor handle it directly on HS mode. My other point is that since MPU is defined for S mode it should be available to both HS/VS modes. Using a separate bit for distinguishing between HS/VS wouldn't work here because we still need VS to manage its own regions, we need a separate table, so basically I'm talking about mpu*/vmpu*/hmpu* registers in the same way we have satp/vsatp/hgatp. Regards, Nick |
|
Re: [RISC-V] [tech-tee] [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP)
Robin Zheng <zhengwenbin.zwb@...>
Hi, Nick There is nothing preventing satp = bare && vsatp != bare, its a very possible scenario, supported by the current spec, so since we introduce an MPU why not make it available for HS mode as well ? For hypervisor type-I, when satp = bare && vsatp != bare, the protection for VM can be done by 2nd-stage translation(hgatp != bare), but in that case, since the bare hypervisor can access VM memory freely, SMAP protection may be required by HS-mode. That may require the MPU(sPMP) to extend another bit to distinguish VS-mode and HS-mode, just like S bit does, isn't it ? When 2nd-stage translation is disabled(hgatp = bare), it makes sense by MPU, and for SMAP protection, it still requries the distinction for the access, from HS-mode or VS-mode. Since there're 3 modes, i.e., U/VS/HS, would we add another bit besides S bit for MPU(sPMP)? Regards, Robin
|
|
Re: [RISC-V] [tech-tee] [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP)
mick@...
Στις 2021-05-05 07:47, Robin Zheng via lists.riscv.org έγραψε:
sPMP(MPU) is designed for the separation between U-mode and S-mode and it only make sense only when paging is not available. With H extension, there're 3 atp registers to control the translation for different stages:(ccing Anup for more feedback on this) This distinction between type 1 and type 2 hypervisors is a bit obsolete. If you consider type 1 hypervisors as bare-metal, then KVM for example is a type 1 hypervisor, you may check out Anup's patches for KVM on RISC-V and you'll notice that the kernel runs on HS mode, modifying h* CSRs, so basically Host OS is the hypervisor as well, and it still has userspace for monitoring, configuration, device emulation etc. If you consider type 1 hypervisors as standalone without their own userspace then it makes sense to not use paging for their own mappings, they can reduce code complexity and memory usage, in which case having an MPU to enforce memory protections makes perfect sense. There is nothing preventing satp = bare && vsatp != bare, its a very possible scenario, supported by the current spec, so since we introduce an MPU why not make it available for HS mode as well ? Wouldn't that break the concept that "HS-mode acts the same as S-mode, but with additional instructions and CSRs that control the new stage of address translation and support hosting a guest OS in virtual S-mode (VS-mode). Regular S-mode operating systems can execute without modification either in HS-mode or as VS-mode guests" ? Also in case the vendor doesn't want to implement 2-stage translation (hgatp can be hardwired to 0 in the current spec), doesn't it make sense to have an option of using MPU for managing memory protection for the guest's physical memory, providing isolation between the host and the guest and between guests ? Why rely on PMP/ePMP on M-mode for that requiring to go though M-mode every time the hypervisor wants to switch between guests ? Regards, Nick |
|
Re: [RISC-V] [tech-tee] [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP)
Robin Zheng <zhengwenbin.zwb@...>
sPMP(MPU) is designed for the separation between U-mode and S-mode and it only make sense only when paging is not available. With H extension, there're 3 atp registers to control the translation for different stages: - satp (normal translation for HS-mode) - vsatp (1st translation for VS-mode) - hgatp (2nd translation for VS-mode) and there also has 2 hypervisor types for virtualization model: For Hypervisor Type-I, sPMP(MPU) makes sense when paging is not available in Guest OS(s). But it makes no sense for hypervisor itself(who is working in HS-mode) even if hypervisor has no paging too, because Type-I hypervisor only runs under HS-mode and has no user space. For Hypervisor Type-II, sPMP(MPU) also makes sense when paging is not available in Guest OS(s). sPMP(MPU) also makes sense for Host OS who without paging, but for most host OS, e.g., Linux, Windows, MacOS, paging is always available, I didn't see a Host OS who supports virtualization but has no paging support. Thus, when H extension exists, sPMP(MPU) makes more sense for VS-mode other than HS-mode, i don't think we need to support sPMP(MPU) for both(HS-mode and VS-mode). Regards, Robin
|
|
Re: [RISC-V] [tech-tee] [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP)
Greg Favor
On Tue, May 4, 2021 at 1:51 AM Dong Du <dudong@...> wrote:
What combinations has the TEE group come up with so far that have justifying use cases? Or are you searching for combinations that have justifying use cases? If the latter (and this admittedly reflects my own biases), it seems like a stretch to have industrial use cases that implement paging and the H extension, but don't want to use paging for HS-mode, but want to use two-stage paging for VS-mode? Or do you have use cases in mind and are trying to think about how all this stuff should interact? I guess I'm struggling with what seems like a wide open question and with the idea that a TG should be starting from motivating use cases that it is trying to address. (Versus coming up with a potential hammer and looking for some nails.) Somewhat separate from that, when I look back at the following email excerpt from Nick, I have a couple of comments/questions: A scenario we discussed at some point was a trusted hypervisor running on HS mode, with e.g. Linux and a trusted service running on VS mode. The trusted hypervisor is usually very small/simple and may not use paging, so hgatp will be set to bare and it'll fallback to PMP/ePMP as the current hypervisor spec mandates. With sPMP the hypervisor will be able to configure its own regions and also isolate Linux from the trusted service, without going through M-mode using PMP/ePMP, this allows for a much more flexible / clean implementation. >> Setting up HS-mode page tables also provides this ability to set up protected access regions without going through M-mode. (And you can have regions of varying page sizes.) In other words we can use sPMP as a poor man's paging for HS mode and still use paging for VS mode, in which case when operating on VS mode both MMU and sPMP will be active. >> Once one has implemented an MMU, why not use that for your "poor man's paging for
HS-mode" instead of also implementing sPMP? Plus one can then leverage existing hypervisor software (instead of working to get hypervisors to understand and use sPMP instead of page tables). Greg |
|
Re: Updates on the proposal of MPU (privious sPMP)
Dong Du
BTW, we would like to know more opinions from privilege group and H-extension group on how MPU/sPMP and virtualization should be used together, e.g., do we have any scenarios that should use paging, MPU (sPMP), and G-stage translation together? Please feel free to give us any feedbacks to help moving forward. All the best, Dong ------------------ Original ------------------ Date: Tue, May 4, 2021 04:44 PM To: "tech-privileged"<tech-privileged@...>; "tech-tee"<tech-tee@...>; Subject: [RISC-V] [tech-privileged] Updates on the proposal of MPU (privious sPMP) Hello all,
After the discussion of the tee group, we decide to rename sPMP to MPU (the RISC-V Memory Protection Unit), and reuse page fault for MPU fault based on our discussion and feedback of the privileged group. Therefore we propose to rename page fault to MPU/MMU fault for clarity. We also have updates on the proposal of MPU: https://docs.google.com/document/d/1x7esOSBFfpcbDHaRPpe5NEWmav1_8der_nB25Hd5hqs/edit?usp=sharing Best, Bicheng |
|
Updates on the proposal of MPU (privious sPMP)
bichengyang@...
Hello all,
After the discussion of the tee group, we decide to rename sPMP to MPU (the RISC-V Memory Protection Unit), and reuse page fault for MPU fault based on our discussion and feedback of the privileged group. Therefore we propose to rename page fault to MPU/MMU fault for clarity. We also have updates on the proposal of MPU: https://docs.google.com/document/d/1x7esOSBFfpcbDHaRPpe5NEWmav1_8der_nB25Hd5hqs/edit?usp=sharing Best, Bicheng |
|