
Tariq Kurd
>For TLBs, the important simplification is PMP/PMA aren't <4KiB in >granularity, as then existing TLB entires can be used to cache >permissions.
Yes - this makes a lot of sense. What about the case where the software updates the PMP entries though? This would then require an sfence.vma to clear the micro-TLBs as the PMP permissions may be out-of-date. The architecture doesn't require this, so can we add this requirement? How is this typically done?
I've found this text now, so please disregard my previous email:
"Hence, when the PMP settings are modified, M-mode software must synchronize the PMP settings with the virtual memory system and any PMP or address-translation caches. This is accomplished by executing an SFENCE.VMA instruction with rs1=x0 and rs2=x0, after the PMP CSRs are written."
Thanks
Tariq
toggle quoted message
Show quoted text
>For TLBs, the important simplification is PMP/PMA aren't <4KiB in >granularity, as then existing TLB entires can be used to cache >permissions.
Yes - this makes a lot of sense. What about the case where the software updates the PMP entries though? This would then require an sfence.vma to clear the micro-TLBs as the PMP permissions may be out-of-date. The architecture doesn't require this, so can we add this requirement? How is this typically done?
Tariq
On Tue, 16 Aug 2022 at 00:41, < krste@...> wrote:
>>>>> On Mon, 15 Aug 2022 10:14:59 +0200, Tariq Kurd <tariq.kurd@...> said:
|| In particular, a portion of a misaligned store that passes the PMP check may become visible, even if another portion fails the PMP check
| I had no idea this was in the spec - so I'm glad you added that comment Allen.
| yes - between MMU pages, PMP regions and PMA regions it's all pretty complex.
| In systems with an MMU do people typically also implement the PMP? And if so why?
Yes.
To contain < M-mode code running on the hart (including implicit
references such as page-table walkers).
M-mode+PMP can provide a monitor that isolates and multiplexes
multiple S-mode stacks, as in Keystone enclave work.
| As the granularity of PMA and PMP regions are implementation defined - I'm wondering if a nice simplification would be to specify them
| both with 64-byte granularity, and 64-byte alignment to match the cache-block size for the CMOs. At least then the PMAs can't cross the
| boundary of a TLB page.
For TLBs, the important simplification is PMP/PMA aren't <4KiB in
granularity, as then existing TLB entires can be used to cache
permissions. Having PMP/PMA granules larger than a page is fine, as
these would only be checked on a TLB miss. If < page, then easiest
solution is to not cache these regions in TLB, forcing a TLB
miss+check on every access, for example. Of course, other alternative
microarch schemes are possible.
Krste
| Tariq
| On Sat, 13 Aug 2022 at 09:02, Allen Baum <allen.baum@...> wrote:
| There are at least 3 potential boundaries: MMU pages, PMP regions, and PMA regions.
| All bytes of an access must be contained within a single PMP region. The operative word there is "access", because a misaligned load
| /store may be (and is typically) split into two separate accesses.
| Ordering of those accesses is not spec'ed, so it's possible to get various exceptions with either the lower or upper part of the load
| /store, (or both).
| When that happens on a store, the trap may occur after either the low hor high alf has been written. (non-determinsitically even, so
| it's a bear to test).
| I don't know if that specific rule applies to PMA's or MMU page crossings,
| but if a misaligned access is split into two (or more, eventually) accesses that don't cross a boundary, then it's moot;
| you treat them individually. .That split is hard to avoid
| But an implementation isn't required to split a misaligned address, and outside of the PMP spec, I don't think that case is mentioned
| An implementation is free to always trap on a misaligned access and perform it byte-by-byte (while ensuring no interrupt can occur in
| the middle, lest someone see a stale value)
| I believe it is also legal to handle it entirely in HW excecpt when it crosses a various boundaries (e.g. cacheline, page, etc), and
| signal a misalign exception if it does.
| Or even signal a misalign exception depending on the phase of the moon (or other non-architecural state).
| Personally, I'd be really happy if we could tighten those rules up a lot.
| On Fri, Aug 12, 2022 at 2:28 PM Greg Favor <gfavor@...> wrote:
| It would be nice if it was architecturally defined/permitted for such straddling accesses to be performed a byte at a time.
| That could be ok for accesses to idempotent memory, but would likely be problematic for a non-idempotent location (e.g. a
| memory-mapped I/O register), and byte accesses to a word MMIO register might not even be allowed by the PMAs for that location.
| --
| Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com
--
Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com
-- Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com
|