RISC-V H-extension freeze consideration


Anup Patel
 

Hi All,

The RISC-V H-extension v0.6.1 draft was released almost a year back in
May 2020. There has been no changes in the H-extension specification
since then.

Meanwhile, we have RISC-V H-extension v0.6.1 implemented in QEMU,
Spike, and Rocket FPGA. We also have three different hypervisors ported
to RISC-V H-extension v0.6.1:
1. Xvisor RISC-V (Works on QEMU, Spike, and Rocket FPGA)
2. KVM RISC-V (Works on QEMU, Spike and Rocket FPGA)
3. Bao (Works on Rocket FPGA)

Unfortunately, RISC-V H-extension not being in freeze state is now gating
further software development because major open source projects (such
as Linux RISC-V and GCC RISC-V) have adopted a policy of accepting patches
only for frozen or ratified RISC-V extensions.

Few examples of gated software development:
1. KVM RISC-V not merged in upstream Linux RISC-V. The KVM RISC-V
patches are already reviewed and acked by maintainers in July 2019.
Currently, we are rebasing KVM RISC-V patches with every kernel
release since 1.5+ years.
2. GCC RISC-V not accepting patches for H-extension related instructions
3. KVMTOOL RISC-V not merged because KVM RISC-V is not merged in
upstream Linux RISC-V
4. QEMU KVM RISC-V acceleration not merged because KVM RISC-V is
not merged in upstream Linux RISC-V
5. Various feature additions (such as SBI v0.2, nested, etc) can't happen
(or can't be merged) until KVM RISC-V is merged in Linux RISC-V
6. Libvirt library blocked on QEMU KVM RISC-V acceleration being
available. The Libvirt library is a crucial piece in open-source cloud
solutions (such as open-stack).
7. As time passes more items (such as KVM RISC-V AIA support) will
get blocked if KVM RISC-V is not merged upstream.

We would request the TSC to consider freezing RISC-V H-extension v0.6.1
draft specification. Remaining items in done checklist for ratification can
certainly be completed while H-extension is in the frozen state.

Best Regards,
Anup Patel


Andrew Waterman
 

I’m not in support of freezing it yet. My concern is that development of virtualization-aware interrupt controllers and IOMMUs will lead to reconsideration of some of the details. All of these items are logically interlocking, even if physically disjoint separate. It’s entirely possible that we will make no changes as a result of that further development, but it’s far from certain.

Furthermore, the hypervisor extension is of substantially greater with those other items completed, so we aren’t losing out as much as it might seem by postponing the freeze.

On Tue, Feb 2, 2021 at 7:47 PM Anup Patel <Anup.Patel@...> wrote:
Hi All,



The RISC-V H-extension v0.6.1 draft was released almost a year back in

May 2020. There has been no changes in the H-extension specification

since then.



Meanwhile, we have RISC-V H-extension v0.6.1 implemented in QEMU,

Spike, and Rocket FPGA. We also have three different hypervisors ported

to RISC-V H-extension v0.6.1:

1. Xvisor RISC-V (Works on QEMU, Spike, and Rocket FPGA)

2. KVM RISC-V (Works on QEMU, Spike and Rocket FPGA)

3. Bao (Works on Rocket FPGA)



Unfortunately, RISC-V H-extension not being in freeze state is now gating

further software development because major open source projects (such

as Linux RISC-V and GCC RISC-V) have adopted a policy of accepting patches

only for frozen or ratified RISC-V extensions.



Few examples of gated software development:

1. KVM RISC-V not merged in upstream Linux RISC-V. The KVM RISC-V

    patches are already reviewed and acked by maintainers in July 2019.

    Currently, we are rebasing KVM RISC-V patches with every kernel

    release since 1.5+ years.

2. GCC RISC-V not accepting patches for H-extension related instructions

3. KVMTOOL RISC-V not merged because KVM RISC-V is not merged in

    upstream Linux RISC-V

4. QEMU KVM RISC-V acceleration not merged because KVM RISC-V is

    not merged in upstream Linux RISC-V

5. Various feature additions (such as SBI v0.2, nested, etc) can't happen

   (or can't be merged) until KVM RISC-V is merged in Linux RISC-V

6. Libvirt library blocked on QEMU KVM RISC-V acceleration being

    available. The Libvirt library is a crucial piece in open-source cloud

    solutions (such as open-stack).

7. As time passes more items (such as KVM RISC-V AIA support) will

    get blocked if KVM RISC-V is not merged upstream.



We would request the TSC to consider freezing RISC-V H-extension v0.6.1

draft specification. Remaining items in done checklist for ratification can

certainly be completed while H-extension is in the frozen state.



Best Regards,

Anup Patel


Andrew Waterman
 



On Tue, Feb 2, 2021 at 7:54 PM Andrew Waterman <andrew@...> wrote:
I’m not in support of freezing it yet. My concern is that development of virtualization-aware interrupt controllers and IOMMUs will lead to reconsideration of some of the details. All of these items are logically interlocking, even if physically disjoint separate. It’s entirely possible that we will make no changes as a result of that further development, but it’s far from certain.

Furthermore, the hypervisor extension is of substantially greater with those other items completed, so we aren’t losing out as much as it might seem by postponing the freeze.

* substantially greater utility

Shouldn’t write emails on phone...


On Tue, Feb 2, 2021 at 7:47 PM Anup Patel <Anup.Patel@...> wrote:
Hi All,



The RISC-V H-extension v0.6.1 draft was released almost a year back in

May 2020. There has been no changes in the H-extension specification

since then.



Meanwhile, we have RISC-V H-extension v0.6.1 implemented in QEMU,

Spike, and Rocket FPGA. We also have three different hypervisors ported

to RISC-V H-extension v0.6.1:

1. Xvisor RISC-V (Works on QEMU, Spike, and Rocket FPGA)

2. KVM RISC-V (Works on QEMU, Spike and Rocket FPGA)

3. Bao (Works on Rocket FPGA)



Unfortunately, RISC-V H-extension not being in freeze state is now gating

further software development because major open source projects (such

as Linux RISC-V and GCC RISC-V) have adopted a policy of accepting patches

only for frozen or ratified RISC-V extensions.



Few examples of gated software development:

1. KVM RISC-V not merged in upstream Linux RISC-V. The KVM RISC-V

    patches are already reviewed and acked by maintainers in July 2019.

    Currently, we are rebasing KVM RISC-V patches with every kernel

    release since 1.5+ years.

2. GCC RISC-V not accepting patches for H-extension related instructions

3. KVMTOOL RISC-V not merged because KVM RISC-V is not merged in

    upstream Linux RISC-V

4. QEMU KVM RISC-V acceleration not merged because KVM RISC-V is

    not merged in upstream Linux RISC-V

5. Various feature additions (such as SBI v0.2, nested, etc) can't happen

   (or can't be merged) until KVM RISC-V is merged in Linux RISC-V

6. Libvirt library blocked on QEMU KVM RISC-V acceleration being

    available. The Libvirt library is a crucial piece in open-source cloud

    solutions (such as open-stack).

7. As time passes more items (such as KVM RISC-V AIA support) will

    get blocked if KVM RISC-V is not merged upstream.



We would request the TSC to consider freezing RISC-V H-extension v0.6.1

draft specification. Remaining items in done checklist for ratification can

certainly be completed while H-extension is in the frozen state.



Best Regards,

Anup Patel




Anup Patel
 

On all major architectures (x86 and ARM64), the virtualization-aware interrupt controllers and IOMMUs are totally independent from ISA virtualization support.

 

We already the required ISA support in H-extension for virtualization-aware interrupt controller.

 

The IOMMUs are totally independent of CPU virtualization support on all major architectures and I don’t see how H-extension need to change for IOMMU support.

 

Regards,

Anup

 

From: Andrew Waterman <andrew@...>
Sent: 03 February 2021 09:24
To: Anup Patel <Anup.Patel@...>
Cc: Alistair Francis <Alistair.Francis@...>; Allen Baum <allen.baum@...>; Atish Patra <Atish.Patra@...>; Greg Favor <gfavor@...>; John Hauser <jh.riscv@...>; Krste Asanovic <krste@...>; tech-privileged@...; tech-unixplatformspec@...
Subject: Re: RISC-V H-extension freeze consideration

 

I’m not in support of freezing it yet. My concern is that development of virtualization-aware interrupt controllers and IOMMUs will lead to reconsideration of some of the details. All of these items are logically interlocking, even if physically disjoint separate. It’s entirely possible that we will make no changes as a result of that further development, but it’s far from certain.

 

Furthermore, the hypervisor extension is of substantially greater with those other items completed, so we aren’t losing out as much as it might seem by postponing the freeze.

 

On Tue, Feb 2, 2021 at 7:47 PM Anup Patel <Anup.Patel@...> wrote:

Hi All,



The RISC-V H-extension v0.6.1 draft was released almost a year back in

May 2020. There has been no changes in the H-extension specification

since then.



Meanwhile, we have RISC-V H-extension v0.6.1 implemented in QEMU,

Spike, and Rocket FPGA. We also have three different hypervisors ported

to RISC-V H-extension v0.6.1:

1. Xvisor RISC-V (Works on QEMU, Spike, and Rocket FPGA)

2. KVM RISC-V (Works on QEMU, Spike and Rocket FPGA)

3. Bao (Works on Rocket FPGA)



Unfortunately, RISC-V H-extension not being in freeze state is now gating

further software development because major open source projects (such

as Linux RISC-V and GCC RISC-V) have adopted a policy of accepting patches

only for frozen or ratified RISC-V extensions.



Few examples of gated software development:

1. KVM RISC-V not merged in upstream Linux RISC-V. The KVM RISC-V

    patches are already reviewed and acked by maintainers in July 2019.

    Currently, we are rebasing KVM RISC-V patches with every kernel

    release since 1.5+ years.

2. GCC RISC-V not accepting patches for H-extension related instructions

3. KVMTOOL RISC-V not merged because KVM RISC-V is not merged in

    upstream Linux RISC-V

4. QEMU KVM RISC-V acceleration not merged because KVM RISC-V is

    not merged in upstream Linux RISC-V

5. Various feature additions (such as SBI v0.2, nested, etc) can't happen

   (or can't be merged) until KVM RISC-V is merged in Linux RISC-V

6. Libvirt library blocked on QEMU KVM RISC-V acceleration being

    available. The Libvirt library is a crucial piece in open-source cloud

    solutions (such as open-stack).

7. As time passes more items (such as KVM RISC-V AIA support) will

    get blocked if KVM RISC-V is not merged upstream.



We would request the TSC to consider freezing RISC-V H-extension v0.6.1

draft specification. Remaining items in done checklist for ratification can

certainly be completed while H-extension is in the frozen state.



Best Regards,

Anup Patel


Andrew Waterman
 

In other architectures, those devices are needlessly complex in part because they weren’t co-designed with the ISA. Yes, they can be independently designed, but possibly with regrettable consequences.

On Tue, Feb 2, 2021 at 8:06 PM Anup Patel <Anup.Patel@...> wrote:
















On all major architectures (x86 and ARM64), the virtualization-aware interrupt controllers and IOMMUs are totally independent from ISA virtualization support.



 



We already the required ISA support in H-extension for virtualization-aware interrupt controller.



 



The IOMMUs are totally independent of CPU virtualization support on all major architectures and I don’t see how H-extension need to change for IOMMU support.



 



Regards,



Anup



 









From: Andrew Waterman <andrew@...>


Sent: 03 February 2021 09:24


To: Anup Patel <Anup.Patel@...>


Cc: Alistair Francis <Alistair.Francis@...>; Allen Baum <allen.baum@...>; Atish Patra <Atish.Patra@...>; Greg Favor <gfavor@...>; John Hauser <jh.riscv@...>; Krste Asanovic <krste@...>; tech-privileged@...;

tech-unixplatformspec@...


Subject: Re: RISC-V H-extension freeze consideration







 







I’m not in support of freezing it yet. My concern is that development of virtualization-aware interrupt controllers and IOMMUs will lead to reconsideration of some of the details. All of these items are logically interlocking, even if physically

disjoint separate. It’s entirely possible that we will make no changes as a result of that further development, but it’s far from certain.









 







Furthermore, the hypervisor extension is of substantially greater with those other items completed, so we aren’t losing out as much as it might seem by postponing the freeze.







 







On Tue, Feb 2, 2021 at 7:47 PM Anup Patel <Anup.Patel@...> wrote:







Hi All,











The RISC-V H-extension v0.6.1 draft was released almost a year back in





May 2020. There has been no changes in the H-extension specification





since then.











Meanwhile, we have RISC-V H-extension v0.6.1 implemented in QEMU,





Spike, and Rocket FPGA. We also have three different hypervisors ported





to RISC-V H-extension v0.6.1:





1. Xvisor RISC-V (Works on QEMU, Spike, and Rocket FPGA)





2. KVM RISC-V (Works on QEMU, Spike and Rocket FPGA)





3. Bao (Works on Rocket FPGA)











Unfortunately, RISC-V H-extension not being in freeze state is now gating





further software development because major open source projects (such





as Linux RISC-V and GCC RISC-V) have adopted a policy of accepting patches





only for frozen or ratified RISC-V extensions.











Few examples of gated software development:





1. KVM RISC-V not merged in upstream Linux RISC-V. The KVM RISC-V





    patches are already reviewed and acked by maintainers in July 2019.





    Currently, we are rebasing KVM RISC-V patches with every kernel





    release since 1.5+ years.





2. GCC RISC-V not accepting patches for H-extension related instructions





3. KVMTOOL RISC-V not merged because KVM RISC-V is not merged in





    upstream Linux RISC-V





4. QEMU KVM RISC-V acceleration not merged because KVM RISC-V is





    not merged in upstream Linux RISC-V





5. Various feature additions (such as SBI v0.2, nested, etc) can't happen





   (or can't be merged) until KVM RISC-V is merged in Linux RISC-V





6. Libvirt library blocked on QEMU KVM RISC-V acceleration being





    available. The Libvirt library is a crucial piece in open-source cloud





    solutions (such as open-stack).





7. As time passes more items (such as KVM RISC-V AIA support) will





    get blocked if KVM RISC-V is not merged upstream.











We would request the TSC to consider freezing RISC-V H-extension v0.6.1





draft specification. Remaining items in done checklist for ratification can





certainly be completed while H-extension is in the frozen state.











Best Regards,





Anup Patel


















Greg Favor
 

I generally agree with Andrew.  At the same time I'll also observe that, practically speaking, the AIA is coming soon and it very much directly interacts with the H extension.  So waiting a little longer to see that at least stabilize is a good (and probably necessary) compromise.  (Past that we can then come back to arguing when to draw the line on freezing the H extension spec.)  Further, there are other extensions happening now and the next few months (from the virt-mem group, pointer masking from the J group, and a couple of fast-track extensions) that it would be good to stabilize if not freeze in conjunction with the H extension.  So, in my own opinion, we're getting close.  Not a few weeks, but not quarters either.  (I'll also say that the "pressure is on" to intelligently try and get through this period of time sooner than later.)

Having all these things that interact with virtualization being finalized together (I'm being loose for now wrt official "stable" versus "freeze" milestones, to focus on the general idea) is a good thing for the reasons Andrew mentioned.  Most important (risk-wise) to me is seeing the virt-mem extensions and AIA stuff stabilize.

Now, when it comes to the IOMMU, its architecture needs to (or strongly should) follow the CPU virtualization architecture.  But I think it is an acceptable compromise to not hold up all the preceding because of the IOMMU architecture.  I see very low risk of realizing from the IOMMU architecture that something in the H extension should have been done differently.  Maybe some extra feature will be identified, but that could be done as an extension to the H extension (and I think that is also low risk of happening).  I'll also note that the AIA will cover how an IOMMU handles virtualization of I/O interrupts (aka MSIs).  Which leaves normal translation of I/O addresses to follow the mold of the Supervisor and Hypervisor architectures.  (And, for completeness, many of the other "interesting" aspects of an IOMMU architecture I believe can and should comport with the Supervisor and Hypervisor architectures as needed.)

In short, I think a reasonable compromise is to wait a "little" bit longer for most of the above "coming soon" things, and to decouple the IOMMU timeline from freezing the H extension and related extensions.

Greg


On Tue, Feb 2, 2021 at 8:13 PM Andrew Waterman <andrew@...> wrote:
In other architectures, those devices are needlessly complex in part because they weren’t co-designed with the ISA. Yes, they can be independently designed, but possibly with regrettable consequences.

On Tue, Feb 2, 2021 at 8:06 PM Anup Patel <Anup.Patel@...> wrote:
















On all major architectures (x86 and ARM64), the virtualization-aware interrupt controllers and IOMMUs are totally independent from ISA virtualization support.



 



We already the required ISA support in H-extension for virtualization-aware interrupt controller.



 



The IOMMUs are totally independent of CPU virtualization support on all major architectures and I don’t see how H-extension need to change for IOMMU support.



 



Regards,



Anup



 









From: Andrew Waterman <andrew@...>


Sent: 03 February 2021 09:24


To: Anup Patel <Anup.Patel@...>


Cc: Alistair Francis <Alistair.Francis@...>; Allen Baum <allen.baum@...>; Atish Patra <Atish.Patra@...>; Greg Favor <gfavor@...>; John Hauser <jh.riscv@...>; Krste Asanovic <krste@...>; tech-privileged@...;

tech-unixplatformspec@...


Subject: Re: RISC-V H-extension freeze consideration







 







I’m not in support of freezing it yet. My concern is that development of virtualization-aware interrupt controllers and IOMMUs will lead to reconsideration of some of the details. All of these items are logically interlocking, even if physically

disjoint separate. It’s entirely possible that we will make no changes as a result of that further development, but it’s far from certain.









 







Furthermore, the hypervisor extension is of substantially greater with those other items completed, so we aren’t losing out as much as it might seem by postponing the freeze.







 







On Tue, Feb 2, 2021 at 7:47 PM Anup Patel <Anup.Patel@...> wrote:







Hi All,











The RISC-V H-extension v0.6.1 draft was released almost a year back in





May 2020. There has been no changes in the H-extension specification





since then.











Meanwhile, we have RISC-V H-extension v0.6.1 implemented in QEMU,





Spike, and Rocket FPGA. We also have three different hypervisors ported





to RISC-V H-extension v0.6.1:





1. Xvisor RISC-V (Works on QEMU, Spike, and Rocket FPGA)





2. KVM RISC-V (Works on QEMU, Spike and Rocket FPGA)





3. Bao (Works on Rocket FPGA)











Unfortunately, RISC-V H-extension not being in freeze state is now gating





further software development because major open source projects (such





as Linux RISC-V and GCC RISC-V) have adopted a policy of accepting patches





only for frozen or ratified RISC-V extensions.











Few examples of gated software development:





1. KVM RISC-V not merged in upstream Linux RISC-V. The KVM RISC-V





    patches are already reviewed and acked by maintainers in July 2019.





    Currently, we are rebasing KVM RISC-V patches with every kernel





    release since 1.5+ years.





2. GCC RISC-V not accepting patches for H-extension related instructions





3. KVMTOOL RISC-V not merged because KVM RISC-V is not merged in





    upstream Linux RISC-V





4. QEMU KVM RISC-V acceleration not merged because KVM RISC-V is





    not merged in upstream Linux RISC-V





5. Various feature additions (such as SBI v0.2, nested, etc) can't happen





   (or can't be merged) until KVM RISC-V is merged in Linux RISC-V





6. Libvirt library blocked on QEMU KVM RISC-V acceleration being





    available. The Libvirt library is a crucial piece in open-source cloud





    solutions (such as open-stack).





7. As time passes more items (such as KVM RISC-V AIA support) will





    get blocked if KVM RISC-V is not merged upstream.











We would request the TSC to consider freezing RISC-V H-extension v0.6.1





draft specification. Remaining items in done checklist for ratification can





certainly be completed while H-extension is in the frozen state.











Best Regards,





Anup Patel


















Paolo Bonzini
 

> So, in my own opinion, we're getting close.  Not a few weeks, but not quarters either.  (I'll also say that the "pressure is on" to intelligently try and get through this period of time sooner than later.)

A quarter has passed, so we're 50% of the way towards talking "quarters". So please let me ask three questions:

- Has any insight been formed in the AIA specification as to whether it will require changes to the Hypervisor specification, and whether these changes can be done as part of the AIA specification (just like pointer masking is already defining vs* CSRs?)

- Has any list been made of which extensions should be frozen before the Hypervisor extension, and is there a clear path towards freezing them in a reasonable time period? Does this list include pointer masking, and if so why (considering that pointer masking is already being specified as if the Hypervisor extension is frozen or ratified first)?

- Is there any date being set for whatever meetings are needed to freeze the Hypervisor extension after all the dependendencies are frozen?

If the answer to any of the above three questions is no, what can be done to avoid the frankly ludicrous delay in the approval of a specification that has seen no significant change in one year?

Thanks,

Paolo


Greg Favor
 

Paolo,

Thanks for the prodding.  It's a good reminder as we are all caught up with pushing many things forward.

I won't try to provide off-the-cuff answers to your questions, but instead I'll say that Andrew, John, and myself need to meet next week, discuss where we stand on these questions, and flesh out a plan for getting from here to freeze (and then to ratification).  As you note, time is quickly passing by and will soon start running short.

Lastly, I'll note that the principal factor in Andrew's and John's minds - in holding off H freeze - has been to see enough software support, PoCs, and stability in related arch specs that could affect H (e.g. AIA), before freezing H and then later discovering a regrettable problem.  But that obviously can't go on forever.  Hence time for the three of us to sort out the path from here to freeze.

Greg

On Fri, May 28, 2021 at 4:54 AM Paolo Bonzini <pbonzini@...> wrote:
> So, in my own opinion, we're getting close.  Not a few weeks, but not quarters either.  (I'll also say that the "pressure is on" to intelligently try and get through this period of time sooner than later.)

A quarter has passed, so we're 50% of the way towards talking "quarters". So please let me ask three questions:

- Has any insight been formed in the AIA specification as to whether it will require changes to the Hypervisor specification, and whether these changes can be done as part of the AIA specification (just like pointer masking is already defining vs* CSRs?)

- Has any list been made of which extensions should be frozen before the Hypervisor extension, and is there a clear path towards freezing them in a reasonable time period? Does this list include pointer masking, and if so why (considering that pointer masking is already being specified as if the Hypervisor extension is frozen or ratified first)?

- Is there any date being set for whatever meetings are needed to freeze the Hypervisor extension after all the dependendencies are frozen?

If the answer to any of the above three questions is no, what can be done to avoid the frankly ludicrous delay in the approval of a specification that has seen no significant change in one year?

Thanks,

Paolo


John Hauser
 

Paolo Bonzini wrote:
If the answer to any of the above three questions is no, what can be
done to avoid the frankly ludicrous delay in the approval of a
specification that has seen no significant change in one year?
There is at least one small but significant change to the hypervisor
extension being discussed, to redefine the "G" bit in G-stage address
translation PTEs to indicate that a page of guest physical address
space is "virtual I/O", meaning the hardware must order VM accesses to
those addresses as though they were I/O accesses, not main memory.

Another minor change planned is to have attempts to write a strictly
read-only CSR always raise an illegal instruction exception, instead
of sometimes raising a virtual instruction exception as currently
specified.

The reason there has been no movement on the hypervisor extension
for several months is not because there is totally nothing to do, but
because I've lacked the time to attend to it simultaneously with a
thousand other things.

If you'd like more progress on the hypervisor extension, feel free to
drive the discussion to get agreement one way or another on the first
point, the "I/O" bit in G-stage PTEs. The issue concerns when a
hypervisor is emulating a device that has memory that is supposed to be
in I/O space but is actually being emulated using main memory. A guest
OS expects accesses to that virtual device memory to be in I/O space
and ordered according to the I/O rules, but that's not currently what
happens.

- John Hauser


Anup Patel
 

-----Original Message-----
From: tech-privileged@lists.riscv.org <tech-privileged@lists.riscv.org> On
Behalf Of John Hauser
Sent: 28 May 2021 23:41
To: tech-privileged@lists.riscv.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [RISC-V] [tech-privileged] RISC-V H-extension freeze consideration

Paolo Bonzini wrote:
If the answer to any of the above three questions is no, what can be
done to avoid the frankly ludicrous delay in the approval of a
specification that has seen no significant change in one year?
There is at least one small but significant change to the hypervisor extension
being discussed, to redefine the "G" bit in G-stage address translation PTEs to
indicate that a page of guest physical address space is "virtual I/O", meaning
the hardware must order VM accesses to those addresses as though they
were I/O accesses, not main memory.
From hypervisor perspective, the "G" bit in G-stage PTEs is not used at all.

For software emulated MMIO, the hypervisor does not create any mapping
in the G-stage to ensure that it always traps which allows hypervisor to
trap-n-emulate it.

For pass-through MMIO (such as IMSIC guest MSI files directly accessed
by Guest), the guest physical address translates to host physical address
of actual MMIO device in the G-stage and we will have host PMAs which
will mark all MMIO devices as IO regions.

At this point, the G bit in the G-stage PTE is unused from software
perspective. Why do we need to re-purpose G-bit because we already
have PMAs marking all MMIO addresses as I/O region ?


Another minor change planned is to have attempts to write a strictly read-
only CSR always raise an illegal instruction exception, instead of sometimes
raising a virtual instruction exception as currently specified.

The reason there has been no movement on the hypervisor extension for
several months is not because there is totally nothing to do, but because I've
lacked the time to attend to it simultaneously with a thousand other things.

If you'd like more progress on the hypervisor extension, feel free to drive the
discussion to get agreement one way or another on the first point, the "I/O"
bit in G-stage PTEs. The issue concerns when a hypervisor is emulating a
device that has memory that is supposed to be in I/O space but is actually
being emulated using main memory. A guest OS expects accesses to that
virtual device memory to be in I/O space and ordered according to the I/O
rules, but that's not currently what happens.

- John Hauser



Regards,
Anup


John Hauser
 

Anup Patel wrote:
Why do we need to re-purpose G-bit because we already
have PMAs marking all MMIO addresses as I/O region ?
To repeat myself:

The issue concerns when a hypervisor is emulating a
device that has memory that is supposed to be in I/O space but is actually
being emulated using main memory. A guest OS expects accesses to that
virtual device memory to be in I/O space and ordered according to the I/O
rules, but that's not currently what happens.
The PMAs aren't correct in this situation.

- John Hauser


Jonathan Behrens
 

What sort of device exposes regions of memory in I/O space? When I think of hypervisors emulating devices, all their registers typically do stuff when you write to them.

Jonathan


On Sat, May 29, 2021 at 12:47 PM John Hauser via lists.riscv.org <jh.riscv=jhauser.us@...> wrote:
Anup Patel wrote:
> Why do we need to re-purpose G-bit because we already
> have PMAs marking all MMIO addresses as I/O region ?

To repeat myself:

> The issue concerns when a hypervisor is emulating a
> device that has memory that is supposed to be in I/O space but is actually
> being emulated using main memory.  A guest OS expects accesses to that
> virtual device memory to be in I/O space and ordered according to the I/O
> rules, but that's not currently what happens.

The PMAs aren't correct in this situation.

    - John Hauser






John Hauser
 

What sort of device exposes regions of memory in I/O space? When I think of
hypervisors emulating devices, all their registers typically *do stuff*
when you write to them.
Historically, video cards and network cards definitely had memory
buffers in what RISC-V would consider I/O space. Yes, typical video
and networking hardware may work differently today, but can we be
certain there are absolutely no such devices any more of any kind that
we need to care about? And will never be in the future, either?

I'd be fine if the answer is "yes", but I'm sure not willing to commit
to that answer solely on my own incomplete knowledge.

- John Hauser


Jonathan Behrens
 

"Old video cards and network cards" is a completely fair answer! If the PTE bit isn't needed otherwise, this seems like a reasonable use.

Jonathan


On Sat, May 29, 2021 at 1:24 PM John Hauser via lists.riscv.org <jh.riscv=jhauser.us@...> wrote:
> What sort of device exposes regions of memory in I/O space? When I think of
> hypervisors emulating devices, all their registers typically *do stuff*
> when you write to them.

Historically, video cards and network cards definitely had memory
buffers in what RISC-V would consider I/O space.  Yes, typical video
and networking hardware may work differently today, but can we be
certain there are absolutely no such devices any more of any kind that
we need to care about?  And will never be in the future, either?

I'd be fine if the answer is "yes", but I'm sure not willing to commit
to that answer solely on my own incomplete knowledge.

    - John Hauser






John Hauser
 

I wrote:
Emulating an embedded system within a virtual machine is something we
want to support, which implies an ability to emulate unsophisticated
hardware. Such as a video system that has a main video frame buffer
located in I/O space, as in olden times.
Greg Favor:
It seems like this G-stage "I/O" bit is going down a questionable rabbit
hole that:

- Provides functionality that is not provided by other architectures (x86,
ARM). (E.g. on ARM, main memory, whether cacheable or noncacheable, is
weakly ordered. There is no way to get strong "I/O" ordering within a page
without declaring the page as "I/O".)
Doesn't the x86 architecture have "total store ordering", and wouldn't
that fact make the matter moot for it?

If Arm has a way to declare a page as "I/O", is that different than the
"virtual I/O" bit being debated for RISC-V G-stage address translation?

- Supports "legacy" situations that it is unclear who would actually care
about. (Or is this going to be the special thing that attracts people to
RISC-V since the other architectures have apparently ignored a real need in
the market. Sorry if that strayed too far into sarcasm.)

- Burdens all H implementations with supporting functionality that is
motivated by an uncertain "legacy" situation. At best this bit should be
optional.
- John Hauser


Greg Favor
 

On Sat, May 29, 2021 at 12:44 PM John Hauser <jh.riscv@...> wrote:
I wrote:
> Emulating an embedded system within a virtual machine is something we
> want to support, which implies an ability to emulate unsophisticated
> hardware.  Such as a video system that has a main video frame buffer
> located in I/O space, as in olden times.

Greg Favor:
>   It seems like this G-stage "I/O" bit is going down a questionable rabbit
> hole that:
>
> - Provides functionality that is not provided by other architectures (x86,
> ARM).  (E.g. on ARM, main memory, whether cacheable or noncacheable, is
> weakly ordered.  There is no way to get strong "I/O" ordering within a page
> without declaring the page as "I/O".)

Doesn't the x86 architecture have "total store ordering", and wouldn't
that fact make the matter moot for it?

Note that x86 TSO ordering is not as strong as x86 I/O ordering.  And ARM "Normal memory" ordering is of course even weaker.

Also, on x86 one gets TSO ordering only when using cacheable memory types (i.e. WB) - so no TSO-ordered noncacheable "I/O".  TSO also allows speculative reads and (in-order) write combining - that are anathema to x86 UC and ARM Device "I/O" memory types.
 
If Arm has a way to declare a page as "I/O", is that different than the
"virtual I/O" bit being debated for RISC-V G-stage address translation?

The G-stage "I/O" bit as I understood it is to allow a hypervisor to map a guest "I/O" page onto main memory but have the guest's accesses be treated with I/O-style strong ordering.  (This is something that neither x86 nor ARM support.)

But if what is intended is for the guest accesses to be treated completely as I/O accesses, then that doesn't require this special bit.  The G-stage PTE simply specifies the equivalent of x86 UC or ARM Device memory type.  (But if the guest was already specifying UC/Device in its stage 2 PTE because it thinks it is talking to I/O, then everything was fine to begin with.)

Which leads me back to what I understood this G-stage "I/O" bit to be about (and my issues with it).

Greg


John Hauser
 

Greg Favor wrote:
It seems like this G-stage "I/O" bit is going down a questionable rabbit
hole that:
[...]
I may have found the formula to defuse the issue. I think we can
eliminate the need for a "virtual I/O" bit in G-stage page tables by
simply requiring device drivers to act more conservatively. To that
end, I propose inserting some version of the following two paragraphs
into the Unprivileged ISA's section 1.4, "Memory".

First:

A naturally aligned 256-byte subregion of the address space is
called a _paragraph_. The minimum unit of contiguous main memory
is a complete paragraph. That is, if any byte within a paragraph
is main memory, then every byte of the paragraph must be main
memory; and conversely, if any byte of a paragraph is vacant or
I/O, then every non-vacant byte of the paragraph must be considered
I/O.

Second:

If an I/O device has memory that is accessible in the address
space, and if any paragraph of that memory has the properties
that permit the system to label it as main memory, an execution
environment may choose each such paragraph to be either main
memory or I/O. When the same type of I/O device exists in multiple
systems, portable RISC-V software must assume that device memory
that is considered main memory in one execution environment may be
considered I/O in another execution environment, and vice versa.

It may also be appropriate to add a comment to the FENCE section
reminding that software may not know whether some device memory is
considered main memory or I/O, in which case it will need to fence
conservatively for either possibility.

I welcome comments.

- John Hauser


John Ingalls
 

Another avenue to the same goal of eliminating the need for the "virtual I/O" bit in G-stage PTEs,
rather than changing the guest device driver to add otherwise unnecessary FENCEs to Strongly Ordered IO regions or upgrade FENCE [IO],[IO] to RWIO,RWIO,
require the Hypervisor to not break Guest execution (by telling the Guest one thing and doing another)!

Example text:

"When the Hypervisor informs the Guest that a memory region is Main Memory, IO, or Strongly Ordered IO, then it must not remap those addresses to a memory region of a different type (Main Memory, IO, or Strongly Ordered IO), otherwise memory ordering consistency could be lost by the Guest:
  • If the Hypervisor informs the Guest that a memory region is IO but remaps it to a Main Memory region, then a FENCE [IO],[IO] executed by the guest might not order the desired accesses.
  • If the Hypervisor informs the Guest that a memory region is Main Memory but remaps it to an IO region, then a FENCE [RW],[RW] executed by the guest might not order the desired accesses.
  • If the Hypervisor informs the Guest that a memory region is Strongly Ordered but remaps it to a Weakly Ordered region, then only weak memory ordering might be provided."

-- John


On Sun, May 30, 2021 at 9:06 AM John Hauser <jh.riscv@...> wrote:
Greg Favor wrote:
>   It seems like this G-stage "I/O" bit is going down a questionable rabbit
> hole that:
> [...]

I may have found the formula to defuse the issue.  I think we can
eliminate the need for a "virtual I/O" bit in G-stage page tables by
simply requiring device drivers to act more conservatively.  To that
end, I propose inserting some version of the following two paragraphs
into the Unprivileged ISA's section 1.4, "Memory".

First:

    A naturally aligned 256-byte subregion of the address space is
    called a _paragraph_.  The minimum unit of contiguous main memory
    is a complete paragraph.  That is, if any byte within a paragraph
    is main memory, then every byte of the paragraph must be main
    memory; and conversely, if any byte of a paragraph is vacant or
    I/O, then every non-vacant byte of the paragraph must be considered
    I/O.

Second:

    If an I/O device has memory that is accessible in the address
    space, and if any paragraph of that memory has the properties
    that permit the system to label it as main memory, an execution
    environment may choose each such paragraph to be either main
    memory or I/O.  When the same type of I/O device exists in multiple
    systems, portable RISC-V software must assume that device memory
    that is considered main memory in one execution environment may be
    considered I/O in another execution environment, and vice versa.

It may also be appropriate to add a comment to the FENCE section
reminding that software may not know whether some device memory is
considered main memory or I/O, in which case it will need to fence
conservatively for either possibility.

I welcome comments.

    - John Hauser






Jonathan Behrens
 

I'd say that "must not" is too strong given that the behavior is still fully specified if that advice is ignored. This seems like the place for a non-normative note (if that) which basically just amounts to "this case hurts, so you might not want to do it".

Jonathan


On Sun, May 30, 2021 at 12:37 PM John Ingalls via lists.riscv.org <john.ingalls=sifive.com@...> wrote:
Another avenue to the same goal of eliminating the need for the "virtual I/O" bit in G-stage PTEs,
rather than changing the guest device driver to add otherwise unnecessary FENCEs to Strongly Ordered IO regions or upgrade FENCE [IO],[IO] to RWIO,RWIO,
require the Hypervisor to not break Guest execution (by telling the Guest one thing and doing another)!

Example text:

"When the Hypervisor informs the Guest that a memory region is Main Memory, IO, or Strongly Ordered IO, then it must not remap those addresses to a memory region of a different type (Main Memory, IO, or Strongly Ordered IO), otherwise memory ordering consistency could be lost by the Guest:
  • If the Hypervisor informs the Guest that a memory region is IO but remaps it to a Main Memory region, then a FENCE [IO],[IO] executed by the guest might not order the desired accesses.
  • If the Hypervisor informs the Guest that a memory region is Main Memory but remaps it to an IO region, then a FENCE [RW],[RW] executed by the guest might not order the desired accesses.
  • If the Hypervisor informs the Guest that a memory region is Strongly Ordered but remaps it to a Weakly Ordered region, then only weak memory ordering might be provided."

-- John


On Sun, May 30, 2021 at 9:06 AM John Hauser <jh.riscv@...> wrote:
Greg Favor wrote:
>   It seems like this G-stage "I/O" bit is going down a questionable rabbit
> hole that:
> [...]

I may have found the formula to defuse the issue.  I think we can
eliminate the need for a "virtual I/O" bit in G-stage page tables by
simply requiring device drivers to act more conservatively.  To that
end, I propose inserting some version of the following two paragraphs
into the Unprivileged ISA's section 1.4, "Memory".

First:

    A naturally aligned 256-byte subregion of the address space is
    called a _paragraph_.  The minimum unit of contiguous main memory
    is a complete paragraph.  That is, if any byte within a paragraph
    is main memory, then every byte of the paragraph must be main
    memory; and conversely, if any byte of a paragraph is vacant or
    I/O, then every non-vacant byte of the paragraph must be considered
    I/O.

Second:

    If an I/O device has memory that is accessible in the address
    space, and if any paragraph of that memory has the properties
    that permit the system to label it as main memory, an execution
    environment may choose each such paragraph to be either main
    memory or I/O.  When the same type of I/O device exists in multiple
    systems, portable RISC-V software must assume that device memory
    that is considered main memory in one execution environment may be
    considered I/O in another execution environment, and vice versa.

It may also be appropriate to add a comment to the FENCE section
reminding that software may not know whether some device memory is
considered main memory or I/O, in which case it will need to fence
conservatively for either possibility.

I welcome comments.

    - John Hauser






Paolo Bonzini
 

On 29/05/21 19:24, John Hauser wrote:
What sort of device exposes regions of memory in I/O space? When I think of
hypervisors emulating devices, all their registers typically*do stuff*
when you write to them.
Historically, video cards and network cards definitely had memory
buffers in what RISC-V would consider I/O space.
Yep, typically some kind of video RAM, or a buffer for outgoing/incoming network packets. Some old SCSI controllers also had a "scripts RAM" to program the DMA engine.

In practice, this shouldn't be an issue because typically there is some kind of "doorbell" register that the device writes to after filling in the on-board RAM. The doorbell register is where the synchronization happens between the writer (driver) and the reader (device). Likewise, interrupt injection is where the synchronization happens between the device's writes and the driver's reads.

All in all, this doesn't seem like an absolute necessity in the first version of the specification. There have been extensions to the S-mode page tables like Svnapot and this example is similarly niche.

Paolo

Yes, typical video
and networking hardware may work differently today, but can we be
certain there are absolutely no such devices any more of any kind that
we need to care about? And will never be in the future, either?
I'd be fine if the answer is "yes", but I'm sure not willing to commit
to that answer solely on my own incomplete knowledge.