Virtualization of "main memory" and "I/O" regions
The FENCE instruction exposes the difference between accesses to main memory and accesses to I/O devices via the predecessor and successor sets; however, the distinction is really only defined by PMAs, which describe "main memory" and "I/O" regions. So how does the architecture support virtualization of those regions so that the FENCE instruction behaves appropriately?
Suppose, for example, that a hypervisor virtualizes the memory system for its guest OS, mapping some guest "I/O" regions to hypervisor "main memory" regions. The guest believes that portions of its address space are I/O, and executes the following sequence:
ST [x] // to guest I/O region, hypervisor main memory region
ST [y] // to guest I/O region, hypervisor I/O region
One could presume that, since the PMA for [x] indicates that [x] is "main memory," the store to [y] could be performed before the store to [x]. If ST [y] initiates a side-effect, like a DMA read, and the expectation is that ST [x] is observable to the side-effect, problems may ensue. Is it the responsibility of the hypervisor to ensure the correct behavior, e.g. trap and emulate accesses to [x], or something else? If so, how can a hypervisor reasonably handle all the various combinations of ordering efficiently?
In addition, what happens in the case that the "guest" is S-mode and the "hypervisor" is M-mode?
Perhaps there's an implied "don't do that" in the architecture; in which case, should there be, at a minimum, some commentary text to that effect?
Or perhaps there should be a hardware mechanism that "strengthens" fences?