Small tweak to Privileged spec regarding PMP management?


Greg Favor
 

In section 3.6.2 of the Privileged spec discussing changing PMP settings, it currently says:
"If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the PMP settings synchronously, so no fence is needed."

I would like to suggest removing "or when it is disabled" and just say:
"If page-based virtual memory is not implemented, memory accesses check the PMP settings synchronously, so no fence is needed."

The motivation is that high-performance implementations that support page-based virtual memory have TLBs and want to use them to handle all fetch/load/store memory accesses as they go down load/store execution pipelines during all modes of execution - including while in M-mode.  In the case of M mode, they would effectively just be caching PMA/PMP permission/access control info (as well as identity address mappings).

For designs that implement page-based virtual memory and use their TLBs as described (which is generally true in high-performance designs), not requiring that M-mode software do an sfence.vma after a series of PMP CSR writes means that these CSR writes cannot simply be implemented as CSR writes, but instead each PMP CSR write needs to also perform a heavyweight sfence.vma operation.  This is both heavily redundant (across a series of PMP writes) and is unnatural for an aggressive o-o-o design RISC design in which an sfence.vma operation really is a very strong fencing operation as well as TLB invalidation operation.  (Put differently, a key point of RISC architecture is to simplify hardware in ways that software can easily and efficiently support.)

Given that M-mode software runs a lot of implementation-specific code (including code related with PMA and PMP management), this spec tweak allows for some implementations to simplify their hardware design and include an sfence.vma in their M-mode PMP CSR writing code (while other implementations can choose to not include an sfence.vma in their M-mode code).  But also note that all designs need to at least selectively do an sfence.vma (per section 3.6.2), so this essentially means that the M-mode code would simply always do an sfence.vma after a series of PMP writes.

Lastly note that this change is backward compatible in that software that does do an sfence.vma after PMP changes will run on "old" designs that support page-based virtual memory yet access PMA's and PMP's inline with load/store execution while in M-mode.

Any objections to this simple accomodation for high-performance CPU designs?

Greg


Allen Baum
 

Do you want to add more detail about the page-based virtual memory being disabled case? 
    (that some implementations may require sfence.vma, depending on whether they do XXX with their TLB)?
That would be non-normative, but will alert designers about this corner case.

On Sun, Aug 9, 2020 at 11:45 PM Greg Favor <gfavor@...> wrote:
In section 3.6.2 of the Privileged spec discussing changing PMP settings, it currently says:
"If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the PMP settings synchronously, so no fence is needed."

I would like to suggest removing "or when it is disabled" and just say:
"If page-based virtual memory is not implemented, memory accesses check the PMP settings synchronously, so no fence is needed."

The motivation is that high-performance implementations that support page-based virtual memory have TLBs and want to use them to handle all fetch/load/store memory accesses as they go down load/store execution pipelines during all modes of execution - including while in M-mode.  In the case of M mode, they would effectively just be caching PMA/PMP permission/access control info (as well as identity address mappings).

For designs that implement page-based virtual memory and use their TLBs as described (which is generally true in high-performance designs), not requiring that M-mode software do an sfence.vma after a series of PMP CSR writes means that these CSR writes cannot simply be implemented as CSR writes, but instead each PMP CSR write needs to also perform a heavyweight sfence.vma operation.  This is both heavily redundant (across a series of PMP writes) and is unnatural for an aggressive o-o-o design RISC design in which an sfence.vma operation really is a very strong fencing operation as well as TLB invalidation operation.  (Put differently, a key point of RISC architecture is to simplify hardware in ways that software can easily and efficiently support.)

Given that M-mode software runs a lot of implementation-specific code (including code related with PMA and PMP management), this spec tweak allows for some implementations to simplify their hardware design and include an sfence.vma in their M-mode PMP CSR writing code (while other implementations can choose to not include an sfence.vma in their M-mode code).  But also note that all designs need to at least selectively do an sfence.vma (per section 3.6.2), so this essentially means that the M-mode code would simply always do an sfence.vma after a series of PMP writes.

Lastly note that this change is backward compatible in that software that does do an sfence.vma after PMP changes will run on "old" designs that support page-based virtual memory yet access PMA's and PMP's inline with load/store execution while in M-mode.

Any objections to this simple accomodation for high-performance CPU designs?

Greg


Greg Favor
 

One could argue that the current spec and the sentence in question (with or without the suggested modification), is clear in calling out when an sfence.vma is not required.  But I agree that adding a short non-normative note would avoid any chance of misunderstanding.

Greg


On Mon, Aug 10, 2020 at 12:00 AM Allen Baum <allen.baum@...> wrote:
Do you want to add more detail about the page-based virtual memory being disabled case? 
    (that some implementations may require sfence.vma, depending on whether they do XXX with their TLB)?
That would be non-normative, but will alert designers about this corner case.

On Sun, Aug 9, 2020 at 11:45 PM Greg Favor <gfavor@...> wrote:
In section 3.6.2 of the Privileged spec discussing changing PMP settings, it currently says:
"If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the PMP settings synchronously, so no fence is needed."

I would like to suggest removing "or when it is disabled" and just say:
"If page-based virtual memory is not implemented, memory accesses check the PMP settings synchronously, so no fence is needed."

The motivation is that high-performance implementations that support page-based virtual memory have TLBs and want to use them to handle all fetch/load/store memory accesses as they go down load/store execution pipelines during all modes of execution - including while in M-mode.  In the case of M mode, they would effectively just be caching PMA/PMP permission/access control info (as well as identity address mappings).

For designs that implement page-based virtual memory and use their TLBs as described (which is generally true in high-performance designs), not requiring that M-mode software do an sfence.vma after a series of PMP CSR writes means that these CSR writes cannot simply be implemented as CSR writes, but instead each PMP CSR write needs to also perform a heavyweight sfence.vma operation.  This is both heavily redundant (across a series of PMP writes) and is unnatural for an aggressive o-o-o design RISC design in which an sfence.vma operation really is a very strong fencing operation as well as TLB invalidation operation.  (Put differently, a key point of RISC architecture is to simplify hardware in ways that software can easily and efficiently support.)

Given that M-mode software runs a lot of implementation-specific code (including code related with PMA and PMP management), this spec tweak allows for some implementations to simplify their hardware design and include an sfence.vma in their M-mode PMP CSR writing code (while other implementations can choose to not include an sfence.vma in their M-mode code).  But also note that all designs need to at least selectively do an sfence.vma (per section 3.6.2), so this essentially means that the M-mode code would simply always do an sfence.vma after a series of PMP writes.

Lastly note that this change is backward compatible in that software that does do an sfence.vma after PMP changes will run on "old" designs that support page-based virtual memory yet access PMA's and PMP's inline with load/store execution while in M-mode.

Any objections to this simple accomodation for high-performance CPU designs?

Greg


Greg Favor
 

Here's the PR for this five-word tweak (as described below) to the spec:  https://github.com/riscv/riscv-isa-manual/pull/568


On Sun, Aug 9, 2020 at 11:45 PM Greg Favor <gfavor@...> wrote:
In section 3.6.2 of the Privileged spec discussing changing PMP settings, it currently says:
"If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the PMP settings synchronously, so no fence is needed."

I would like to suggest removing "or when it is disabled" and just say:
"If page-based virtual memory is not implemented, memory accesses check the PMP settings synchronously, so no fence is needed."

The motivation is that high-performance implementations that support page-based virtual memory have TLBs and want to use them to handle all fetch/load/store memory accesses as they go down load/store execution pipelines during all modes of execution - including while in M-mode.  In the case of M mode, they would effectively just be caching PMA/PMP permission/access control info (as well as identity address mappings).

For designs that implement page-based virtual memory and use their TLBs as described (which is generally true in high-performance designs), not requiring that M-mode software do an sfence.vma after a series of PMP CSR writes means that these CSR writes cannot simply be implemented as CSR writes, but instead each PMP CSR write needs to also perform a heavyweight sfence.vma operation.  This is both heavily redundant (across a series of PMP writes) and is unnatural for an aggressive o-o-o design RISC design in which an sfence.vma operation really is a very strong fencing operation as well as TLB invalidation operation.  (Put differently, a key point of RISC architecture is to simplify hardware in ways that software can easily and efficiently support.)

Given that M-mode software runs a lot of implementation-specific code (including code related with PMA and PMP management), this spec tweak allows for some implementations to simplify their hardware design and include an sfence.vma in their M-mode PMP CSR writing code (while other implementations can choose to not include an sfence.vma in their M-mode code).  But also note that all designs need to at least selectively do an sfence.vma (per section 3.6.2), so this essentially means that the M-mode code would simply always do an sfence.vma after a series of PMP writes.

Lastly note that this change is backward compatible in that software that does do an sfence.vma after PMP changes will run on "old" designs that support page-based virtual memory yet access PMA's and PMP's inline with load/store execution while in M-mode.

Any objections to this simple accomodation for high-performance CPU designs?

Greg


andrew@...
 



On Sun, Aug 9, 2020 at 11:45 PM Greg Favor <gfavor@...> wrote:
In section 3.6.2 of the Privileged spec discussing changing PMP settings, it currently says:
"If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the PMP settings synchronously, so no fence is needed."

I would like to suggest removing "or when it is disabled" and just say:
"If page-based virtual memory is not implemented, memory accesses check the PMP settings synchronously, so no fence is needed."

The motivation is that high-performance implementations that support page-based virtual memory have TLBs and want to use them to handle all fetch/load/store memory accesses as they go down load/store execution pipelines during all modes of execution - including while in M-mode.  In the case of M mode, they would effectively just be caching PMA/PMP permission/access control info (as well as identity address mappings).

For designs that implement page-based virtual memory and use their TLBs as described (which is generally true in high-performance designs), not requiring that M-mode software do an sfence.vma after a series of PMP CSR writes means that these CSR writes cannot simply be implemented as CSR writes, but instead each PMP CSR write needs to also perform a heavyweight sfence.vma operation.  This is both heavily redundant (across a series of PMP writes) and is unnatural for an aggressive o-o-o design RISC design in which an sfence.vma operation really is a very strong fencing operation as well as TLB invalidation operation.  (Put differently, a key point of RISC architecture is to simplify hardware in ways that software can easily and efficiently support.)

Given that M-mode software runs a lot of implementation-specific code (including code related with PMA and PMP management), this spec tweak allows for some implementations to simplify their hardware design and include an sfence.vma in their M-mode PMP CSR writing code (while other implementations can choose to not include an sfence.vma in their M-mode code).  But also note that all designs need to at least selectively do an sfence.vma (per section 3.6.2), so this essentially means that the M-mode code would simply always do an sfence.vma after a series of PMP writes.

Lastly note that this change is backward compatible in that software that does do an sfence.vma after PMP changes will run on "old" designs that support page-based virtual memory yet access PMA's and PMP's inline with load/store execution while in M-mode.

The potentially problematic case is that if one attempts to run M-mode-only code on a core that supports VM, it might not know that it needs to execute SFENCE.VMA at all.  (This is not an especially esoteric use case; sometimes people run an RTOS on one of the (symmetric) cores in a multicore complex.)

But you can argue that the M-mode-only code probably needs to be tailored at least somewhat to the system it's running on. And it is possible to handle this case portably, e.g., `if (misa.S) sfence_vma();`.  And if the RTOS has an untrusted piece that runs in U-mode, then the M-mode piece already needs to know about the existence of S-mode for other reasons: it needs to make sure to initialize the satp and m*deleg registers to zero, for example.  The SFENCE.VMA seems no worse.  So I don't object.


Any objections to this simple accomodation for high-performance CPU designs?

Greg