Proposal: Supervisor Timer CSR and Virtual Supervisor Timer CSR
Hi Everyone,
This is an updated version of our previous proposal on the clock source and clock event source. We have aligned our ideas with the latest hypervisor extension specs, removed the redundant parts, and uncovered some issues if we are going to implement this proposal with the current design. We give an analysis together with the proposed new CSRs in the attached document.
BTW, we have learned recently that there is an on-going work on an improved version of the PLIC which is virtualization-aware, is there any documents available?
Regards,
Siqi
"The vsip and vsie registers are VSXLEN-bit read/write registers that are VS-mode’s versions of supervisor CSRs sip and sie, formatted as shown in Figures 5.22 and 5.23 respectively. When V=1, vsip and vsie substitute for the usual sip and sie, so instructions that normally read or modify sip/sie actually access vsip/vsie instead."
Conversely, if hideleg directs the interrupt to the hypervisor, then vsie/vsip.STIP are not aliases of hie/hip.VSTIP. And hence clearing of hie.VSTIE does not not affect vsie.STIE (vsie.STIE, in fact, is read-only zeros).
Hi Everyone,
This is an updated version of our previous proposal on the clock source and clock event source. We have aligned our ideas with the latest hypervisor extension specs, removed the redundant parts, and uncovered some issues if we are going to implement this proposal with the current design. We give an analysis together with the proposed new CSRs in the attached document.
BTW, we have learned recently that there is an on-going work on an improved version of the PLIC which is virtualization-aware, is there any documents available?
Regards,
Siqi
Hi Everyone,
This is an updated version of our previous proposal on the clock source and clock event source. We have aligned our ideas with the latest hypervisor extension specs, removed the redundant parts, and uncovered some issues if we are going to implement this proposal with the current design. We give an analysis together with the proposed new CSRs in the attached document.
BTW, we have learned recently that there is an on-going work on an improved version of the PLIC which is virtualization-aware, is there any documents available?
Regards,
Siqi
BTW, we have learned recently that there is an on-going work on anThere is an informal group that is working on a proposal for a RISC-V
improved version of the PLIC which is virtualization-aware, is there
any documents available?
"Advanced Interrupt Architecture", which includes replacements for
the current PLIC. No document is available yet because the group's
proposal isn't complete yet; the document is still being written. As
soon as the authors are satisfied with it, the proposal will be shared
with everyone so it can begin receiving wider evaluation and feedback.
I expect that will happen around the end of 2020 or early in 2021.
(Yes, I know, everybody wants it to be sooner. It's already being done
as fast as time allows.)
Regards,
- John Hauser
Another alternative is for the CPU to translate the CSR write instruction into a memory write. This is impractical for a CPU IP, because the CPU designers do not know the SoC memory map (which might vary between implementations/customers, or even be programmable at runtime). There is also the overhead of handling interactions with PMP/PMA, etc.
How feasible is it to generate timer-interrupts for VS-mode software from a counter in the CPU clock domain? The hypervisor could apply an appropriate scaling factor to align with wall-clock time, perhaps using an htimescale register somewhat analogous to htimedelta.
Another option is to create memory-mapped registers for stimecmp/vstimecmp at addresses that are accessible to S-mode/VS-mode software.
Sounds like memory-mapped stimecmp/vstimecmp registers might be the best solution. I don't think anything in the existing spec would prevent this, but it would be nice to have it standardized (at least to the extent that mtime/mtimecmp are standardized...)
Cheers,
Phil
Thanks for the comments.
It seems the proposal is not explicit enough about the type of interrupt. So to answer your question, with vstimecmp, there is in fact a new type of interrupt types. The new type is triggered by vstimecmp but received when the hart is at V==0, might be called SGTI (Supervisor Guest Timer Interrupt, in the same spirit as SGEI). The proposal didn't really distinguish this interrupt with VSTI. With this new interrupt type, this is how things conceptually work: the HS-mode code first receives a SGTI triggered by vstimecmp, consequently, a VSTI is generated by the HS-mode code for VS-mode to handle.
With the current specs, hip.VSTIP is used to represent pending state for both VSTI and SGTI, which can be made to work as shown in the document. Since once SGTI is pending, then the next step is naturally to make a pending VSTI for the guest. What causes the issue is that the enable bits are also shared in existing specs. I believe your comments was caused by another shared aspect which is the delegation bits. The current specs only provide bit 2 in hideleg, which can't be used to control delegation for two types of interrupt.
A better solution might be to introduce a new interrupt type called SGTI and the corresponding pending bits and delegation bits. M-mode delegate SGTI to HS-mode. HS-mode still delegates VSTI to VS-mode. When an interrupt is triggered by vstimecmp, the hart sets SGTI to be pending and generates a trap. The hypervisor then sets VSTI to be pending, and executes the vCPU.
Hope this explains.
Regards,
Siqi
Hi Greg,
Thanks for the comments.
It seems the proposal is not explicit enough about the type of interrupt. So to answer your question, with vstimecmp, there is in fact a new type of interrupt types. The new type is triggered by vstimecmp but received when the hart is at V==0, might be called SGTI (Supervisor Guest Timer Interrupt, in the same spirit as SGEI). The proposal didn't really distinguish this interrupt with VSTI. With this new interrupt type, this is how things conceptually work: the HS-mode code first receives a SGTI triggered by vstimecmp, consequently, a VSTI is generated by the HS-mode code for VS-mode to handle.
With the current specs, hip.VSTIP is used to represent pending state for both VSTI and SGTI, which can be made to work as shown in the document. Since once SGTI is pending, then the next step is naturally to make a pending VSTI for the guest. What causes the issue is that the enable bits are also shared in existing specs. I believe your comments was caused by another shared aspect which is the delegation bits. The current specs only provide bit 2 in hideleg, which can't be used to control delegation for two types of interrupt.
A better solution might be to introduce a new interrupt type called SGTI and the corresponding pending bits and delegation bits. M-mode delegate SGTI to HS-mode. HS-mode still delegates VSTI to VS-mode. When an interrupt is triggered by vstimecmp, the hart sets SGTI to be pending and generates a trap. The hypervisor then sets VSTI to be pending, and executes the vCPU.
Hope this explains.
Regards,
Siqi
I understand your concern. In fact, the goal is not to provide more than one vstimecmp CSR.
You are right that when a VM is context switched out, it's vstimecmp CSR is also switched out. However, there exists time when the VM, or a vCPU, is not executing, however, its CSR context is still 'bound' to a hart. In other words, the vs- CSRs still contain values for that vCPU (including vstimecmp), however, the hart is executing hypervisor code. For example, when the hypervisor is handling certain exception caused by the VM such as a guest page fault, there is no need to switch the vCPU context out.
During this time, since vstimecmp still contains the value for that vCPU, it can fire interrupts. When this interrupt is received by the hart, then obviously it should be the hypervisor that handles it, thus the VGTI. Without VGTI, this interrupt will be delayed until the hypervisor returns to the vCPU. This delay can be long because the hypervisor may not immediately return to the vCPU, i.e. it may well decide that the vCPU needs to yield and schedule something else. Of course, the hypervisor can check at strategic points the value in vstimecmp and make proper decision, but that makes software more complex because those checks might be tricky to be inserted.
So VGTI is purely for the prompt handling of the vstimecmp interrupt when the vCPU is still bound to a hart but not executing.
Lastly, when the hypervisor finally decides to switch a vCPU out, then vstimecmp gets saved and hypervisor uses its own timer to track the timer set by that vCPU, which goes to stimecmp. In this way the only vstimecmp is multiplexed among vCPUs, there's no need for more than one vstimecmp.
Regards,
Siqi
Hi Greg,
I understand your concern. In fact, the goal is not to provide more than one vstimecmp CSR.
You are right that when a VM is context switched out, it's vstimecmp CSR is also switched out. However, there exists time when the VM, or a vCPU, is not executing, however, its CSR context is still 'bound' to a hart. In other words, the vs- CSRs still contain values for that vCPU (including vstimecmp), however, the hart is executing hypervisor code. For example, when the hypervisor is handling certain exception caused by the VM such as a guest page fault, there is no need to switch the vCPU context out.
During this time, since vstimecmp still contains the value for that vCPU, it can fire interrupts. When this interrupt is received by the hart, then obviously it should be the hypervisor that handles it, thus the VGTI. Without VGTI, this interrupt will be delayed until the hypervisor returns to the vCPU. This delay can be long because the hypervisor may not immediately return to the vCPU, i.e. it may well decide that the vCPU needs to yield and schedule something else. Of course, the hypervisor can check at strategic points the value in vstimecmp and make proper decision, but that makes software more complex because those checks might be tricky to be inserted.
So VGTI is purely for the prompt handling of the vstimecmp interrupt when the vCPU is still bound to a hart but not executing.
Lastly, when the hypervisor finally decides to switch a vCPU out, then vstimecmp gets saved and hypervisor uses its own timer to track the timer set by that vCPU, which goes to stimecmp. In this way the only vstimecmp is multiplexed among vCPUs, there's no need for more than one vstimecmp.
Regards,
Siqi
So VGTI is purely for the prompt handling of the vstimecmp interruptI believe a hypervisor can get the same effect by saving and clearing
when the vCPU is still bound to a hart but not executing.
bit 6 of hideleg on entry to a trap handler in HS mode. On trap exit,
restore the saved value of bit 6 of hideleg. (Usually, it should be
possible just to save and restore all of hideleg.) While bit 6 of
hideleg is zero, a VS-level timer interrupt will trap to HS mode,
assuming bit 6 (VSTIE) of hie is also set to enable the trap.
- John Hauser