Are pages allowed to cross PMA regions?


Andres Amaya Garcia
 

Hello,

There is something unclear to me after reading the PMA section or the Privileged ISA manual (i.e. Section 3.6). Can a virtual paged be mapped to addresses that cross PMA regions? For example, is it acceptable to map a 1GB page such that half its physical addresses have the (e.g.) cacheable attribute but the other half of physical addresses are uncacheable? You could think about this with every attribute: vacant, idempotent, etc.

This sounds odd, but the ISA does not explicitly allow or forbid it. Is it something that must to be supported? If so, are there example use-cases?

Thanks for the help!


Andy Glew (Gmail) <andyglew@...>
 

I cannot say what the RISC-V rule is

 but I can provide example use cases for similar issues from other architectures.

(1) Legacy MMIO map

(2)  non-legacy MMIOmap with  huge, larger and larger pages / 

(3)   device vendors that wish to pack all of their  device memory into a compact region

(4)  security issues

===

(1) for example, x86  has an extremely fragmented  legacy  MMIO map below 1 MB:  some regions are  4K granular, some 16, some 64...  ;    but OS vendors wanted to use a single large page, whether originally 4M/2M or eventually  1G etc. to map it,  because big mappings  reduced TLB pressure, and  in particular because they might also want to access DRAM  or ROM  in that area efficiently.     to deal with this Intel x86 has the ability to "splinter"  large TLB entries (2M/4M/1G/...) into smaller 4K entries,  and has the ability to  indicate subregions of large TLB entries not being present.   e.g. one might have a 4M TLB entry marked such that only [1M,4M) are valid, and accesses in [0,1M)  most lookup splintered entries in the 4K TLBs.

 this was done because  many if not most or all Intel x86 implementations cache memory types,  the things you get from PMAs,  in the TLB

 BTW more and more I wish that I had not decided to store memory types in the TLB,  since there was plenty of time to do an MTRR lookup on a cache miss.

 I'm not saying the RISC-V has to do this. I'm just describing a use case

(1'):   if you allow such fragmentation of memory attributes, and implementation may choose to separate TLBs for translation from a protection look aside buffer for protection and memory attributes -  all that a PLB or perhaps a APLB,  attribute and protection  look aside buffer.  TLB entries are quite big since they require  both physical and virtual addresses, whereas  one may get away with only a few bits per granule,  e.g. 4K granule, with many such granules sharing the same APLB  entry.     the implementation can hide the APLB,  behaving as if it is nothing except TLBs that the US needs to manage


(2)  all the RISC-V people may  deprecate legacy memory map issues,  the same arises even if it's not legacy...

(3)  another use case  is less legacy related:     I/O device vendors sometimes want to constrain all of the physical memory addresses related to their devices to a single  naturally aligned power-of-2 region.   but I/O device vendors often have multiple different memory types for a single device. E.g. a GPU might want to have 1 GB or 16 GB of frame buffer memory, mapped something like write combining,  and a far smaller amount  of active MMIO memory.   e.g. given  a base address B  which is a multiple of a gigabyte,  the I/O device vendor might want [K,K+1G-16K) mapped write combing  Optimize for frame buffer, and [K+1G-16K,K+1G) mapped  non-idempotent uncacheable.

 there is much less need for this nowadays, since PCI  now allows I/O devices to declare a list of their memory requirements,  e.g. 1G WC and 16K UC  in the example above.   PCI then allows the physical addresses associated with the I/O device to be changed, so that  the WC  memory from this device and others is nicely aligned, as is the MMIO UC.    however, not everybody likes the idea of physical addresses being able to change. Moreover, bus bridges from  between different physical address widths  may prefer not to waste physical address ranges.

(4)  if you wish to legislate that virtual memory translations cannot cross PMA boundaries, the question is how do you enforce it.

 if the operating system or hypervisor  that controls the virtual memory translations is the most privileged software in the system, you can probably do this, risking mainly  accidental bugs

 however, quite a few secure systems have privilege domains that are more privileged than the operating system or hypervisor, but which do not want to manage the virtual memory translations.   more they want to allow the operating system or hypervisor to control the page tables as much as possible for performance reasons.    but if there is then a correctness problem if the operating system or hypervisor has allowed a large page translation to cross PMA boundaries, it must be trapped  at least, and possibly emulated if it's transparent.

__________________________________
| www.emclient.com

------ Original Message ------
From "andres.amaya via lists.riscv.org" <andres.amaya=codasip.com@...>
Date 8/12/2022 07:10:40
Subject [RISC-V] [tech-privileged] Are pages allowed to cross PMA regions?

Hello,

There is something unclear to me after reading the PMA section or the Privileged ISA manual (i.e. Section 3.6). Can a virtual paged be mapped to addresses that cross PMA regions? For example, is it acceptable to map a 1GB page such that half its physical addresses have the (e.g.) cacheable attribute but the other half of physical addresses are uncacheable? You could think about this with every attribute: vacant, idempotent, etc.

This sounds odd, but the ISA does not explicitly allow or forbid it. Is it something that must to be supported? If so, are there example use-cases?

Thanks for the help!


Greg Favor
 


Can a virtual paged be mapped to addresses that cross PMA regions? For example, is it acceptable to map a 1GB page such that half its physical addresses have the (e.g.) cacheable attribute but the other half of physical addresses are uncacheable? You could think about this with every attribute: vacant, idempotent, etc.

This sounds odd, but the ISA does not explicitly allow or forbid it. Is it something that must to be supported? If so, are there example use-cases?

The PMA architecture allows a lot of implementation flexibility - including for example having small 4B regions.  In that example one could easily have one 4KB page overlap multiple PMA regions.

Conversely, in a typical OS-A class system using demand-paged virtual memory, the implementor will probably choose to have a minimum 4KB granularity to PMA regions.  Although this still allows 2MB, 1GB, and 512GB pages to overlap multiple PMA regions.  (Which in typical TLB implementations leads to what some would call "atomization" of page mappings into smaller TLB entry mappings.)

In short, if a page overlaps multiple regions, then that needs to be handled properly.  Typically any given load/store/ifetch/implicit access that is being checked will fall in one page and in one PMA region - in which case the behavior is obvious.  But if that access straddles multiple pages and/or PMA regions, then each byte of the access must pass its MMU and PMA checks for the whole access to be allowed.


Krste Asanovic
 

On Fri, 12 Aug 2022 10:35:15 -0700, "Greg Favor" <gfavor@...> said:
| Can a virtual paged be mapped to addresses that cross PMA regions? For example, is it acceptable to map a 1GB page such that half its physical addresses have the (e.g.) cacheable
| attribute but the other half of physical addresses are uncacheable? You could think about this with every attribute: vacant, idempotent, etc.

| This sounds odd, but the ISA does not explicitly allow or forbid it. Is it something that must to be supported? If so, are there example use-cases?

| The PMA architecture allows a lot of implementation flexibility - including for example having small 4B regions.  In that example one could easily have one 4KB page overlap multiple
| PMA regions.

| Conversely, in a typical OS-A class system using demand-paged virtual memory, the implementor will probably choose to have a minimum 4KB granularity to PMA regions.  Although this
| still allows 2MB, 1GB, and 512GB pages to overlap multiple PMA regions.  (Which in typical TLB implementations leads to what some would call "atomization" of page mappings into
| smaller TLB entry mappings.)

Even in a RISC-V OS-A platform, the implementor might be stuck with
using IP peripherals where PMAs vary at the sub-page granularity.

| In short, if a page overlaps multiple regions, then that needs to be handled properly.  Typically any given load/store/ifetch/implicit access that is being checked will fall in one
| page and in one PMA region - in which case the behavior is obvious.  But if that access straddles multiple pages and/or PMA regions, then each byte of the access must pass its MMU
| and PMA checks for the whole access to be allowed.

Yes.

We have some text for this in some places, but these concepts should
really be factored out somewhere central.

Krste

|


Andy Glew (Gmail) <andyglew@...>
 

But if that access straddles multiple pages and/or PMA regions, then each byte of the access must pass its MMU
| and PMA checks for the whole access to be allowed.

It would be nice if it was architecturally defined/permitted for such straddling accesses to be performed a byte at a time.   That makes the trap and emulate handler easier to code.

If not a byte at a time, then whatever is the largest  possible NAPOT size that the  access can be decomposed into.

But anything coarser grained than a byte,  or whatever the finest granule of PMA is,  either requires the trap and emulate handler to probe permissions to guarantee that  the transactions it emits are not themselves straddling,  or you have to be ready to handle nested such trap and emulations.   Or at least tail recursive.  



__________________________________
| www.emclient.com <https://www.emclient.com/get>

------ Original Message ------
From "Krste Asanovic" <krste@...>
To "Greg Favor" <gfavor@...>
Date 8/12/2022 12:57:55
Subject Re: [RISC-V] [tech-privileged] Are pages allowed to cross PMA regions?

 
On Fri, 12 Aug 2022 10:35:15 -0700, "Greg Favor" <gfavor@...> said:
 
| Can a virtual paged be mapped to addresses that cross PMA regions? For example, is it acceptable to map a 1GB page such that half its physical addresses have the (e.g.) cacheable
| attribute but the other half of physical addresses are uncacheable? You could think about this with every attribute: vacant, idempotent, etc.
| This sounds odd, but the ISA does not explicitly allow or forbid it. Is it something that must to be supported? If so, are there example use-cases?
 
| The PMA architecture allows a lot of implementation flexibility - including for example having small 4B regions.  In that example one could easily have one 4KB page overlap multiple
| PMA regions.
 
| Conversely, in a typical OS-A class system using demand-paged virtual memory, the implementor will probably choose to have a minimum 4KB granularity to PMA regions.  Although this
| still allows 2MB, 1GB, and 512GB pages to overlap multiple PMA regions.  (Which in typical TLB implementations leads to what some would call "atomization" of page mappings into
| smaller TLB entry mappings.)
 
Even in a RISC-V OS-A platform, the implementor might be stuck with
using IP peripherals where PMAs vary at the sub-page granularity.
 
| In short, if a page overlaps multiple regions, then that needs to be handled properly.  Typically any given load/store/ifetch/implicit access that is being checked will fall in one
| page and in one PMA region - in which case the behavior is obvious.  But if that access straddles multiple pages and/or PMA regions, then each byte of the access must pass its MMU
| and PMA checks for the whole access to be allowed.
 
Yes.
 
We have some text for this in some places, but these concepts should
really be factored out somewhere central.
 
Krste
 
|
 
 
 


Greg Favor
 

It would be nice if it was architecturally defined/permitted for such straddling accesses to be performed a byte at a time.

That could be ok for accesses to idempotent memory, but would likely be problematic for a non-idempotent location (e.g. a memory-mapped I/O register), and byte accesses to a word MMIO register might not even be allowed by the PMAs for that location.


Allen Baum
 

There are at least 3 potential boundaries: MMU pages, PMP regions, and PMA regions.
All bytes of an access must be contained within a single PMP region. The operative word there is "access", because a misaligned load /store may be (and is typically) split into two separate accesses.
Ordering of those accesses is not spec'ed, so it's possible to get various exceptions with either the lower or upper part of the load/store, (or both).
When that happens on a store, the trap may occur after either the low hor high alf has been written. (non-determinsitically even, so it's a bear to test).

I don't know if that specific rule applies to PMA's or MMU page crossings, 
but if a misaligned access is split into two (or more, eventually) accesses that don't cross a boundary, then it's moot;
you treat them individually. .That split is hard to avoid

But an implementation isn't required to split a misaligned address, and outside of the PMP spec, I don't think that case is mentioned
An implementation is free to always trap on a misaligned access and perform it byte-by-byte (while ensuring no interrupt can occur in the middle, lest someone see a stale value) 
I believe it is also legal to handle it entirely in HW excecpt when it crosses a various boundaries (e.g. cacheline, page, etc), and  signal a misalign exception if it does.
Or even signal a misalign exception depending on the phase of the moon (or other non-architecural state).

Personally, I'd be really happy if we could tighten those rules up a lot.

On Fri, Aug 12, 2022 at 2:28 PM Greg Favor <gfavor@...> wrote:
It would be nice if it was architecturally defined/permitted for such straddling accesses to be performed a byte at a time.

That could be ok for accesses to idempotent memory, but would likely be problematic for a non-idempotent location (e.g. a memory-mapped I/O register), and byte accesses to a word MMIO register might not even be allowed by the PMAs for that location.


Tariq Kurd
 

>In particular, a portion of a misaligned store that passes the PMP check may become visible, even if another portion fails the PMP check

I had no idea this was in the spec - so I'm glad you added that comment Allen.

yes - between MMU pages, PMP regions and PMA regions it's all pretty complex.

In systems with an MMU do people typically also implement the PMP? And if so why?

As the granularity of PMA and PMP regions are implementation defined - I'm wondering if a nice simplification would be to specify them both with 64-byte granularity, and 64-byte alignment to match the cache-block size for the CMOs. At least then the PMAs can't cross the boundary of a TLB page.

Tariq






On Sat, 13 Aug 2022 at 09:02, Allen Baum <allen.baum@...> wrote:
There are at least 3 potential boundaries: MMU pages, PMP regions, and PMA regions.
All bytes of an access must be contained within a single PMP region. The operative word there is "access", because a misaligned load /store may be (and is typically) split into two separate accesses.
Ordering of those accesses is not spec'ed, so it's possible to get various exceptions with either the lower or upper part of the load/store, (or both).
When that happens on a store, the trap may occur after either the low hor high alf has been written. (non-determinsitically even, so it's a bear to test).

I don't know if that specific rule applies to PMA's or MMU page crossings, 
but if a misaligned access is split into two (or more, eventually) accesses that don't cross a boundary, then it's moot;
you treat them individually. .That split is hard to avoid

But an implementation isn't required to split a misaligned address, and outside of the PMP spec, I don't think that case is mentioned
An implementation is free to always trap on a misaligned access and perform it byte-by-byte (while ensuring no interrupt can occur in the middle, lest someone see a stale value) 
I believe it is also legal to handle it entirely in HW excecpt when it crosses a various boundaries (e.g. cacheline, page, etc), and  signal a misalign exception if it does.
Or even signal a misalign exception depending on the phase of the moon (or other non-architecural state).

Personally, I'd be really happy if we could tighten those rules up a lot.

On Fri, Aug 12, 2022 at 2:28 PM Greg Favor <gfavor@...> wrote:
It would be nice if it was architecturally defined/permitted for such straddling accesses to be performed a byte at a time.

That could be ok for accesses to idempotent memory, but would likely be problematic for a non-idempotent location (e.g. a memory-mapped I/O register), and byte accesses to a word MMIO register might not even be allowed by the PMAs for that location.



--

Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com


Krste Asanovic
 

On Mon, 15 Aug 2022 10:14:59 +0200, Tariq Kurd <tariq.kurd@...> said:
|| In particular, a portion of a misaligned store that passes the PMP check may become visible, even if another portion fails the PMP check
| I had no idea this was in the spec - so I'm glad you added that comment Allen.
| yes - between MMU pages, PMP regions and PMA regions it's all pretty complex.
| In systems with an MMU do people typically also implement the PMP? And if so why?

Yes.

To contain < M-mode code running on the hart (including implicit
references such as page-table walkers).

M-mode+PMP can provide a monitor that isolates and multiplexes
multiple S-mode stacks, as in Keystone enclave work.

| As the granularity of PMA and PMP regions are implementation defined - I'm wondering if a nice simplification would be to specify them
| both with 64-byte granularity, and 64-byte alignment to match the cache-block size for the CMOs. At least then the PMAs can't cross the
| boundary of a TLB page.

For TLBs, the important simplification is PMP/PMA aren't <4KiB in
granularity, as then existing TLB entires can be used to cache
permissions. Having PMP/PMA granules larger than a page is fine, as
these would only be checked on a TLB miss. If < page, then easiest
solution is to not cache these regions in TLB, forcing a TLB
miss+check on every access, for example. Of course, other alternative
microarch schemes are possible.

Krste

| Tariq

| On Sat, 13 Aug 2022 at 09:02, Allen Baum <allen.baum@...> wrote:

| There are at least 3 potential boundaries: MMU pages, PMP regions, and PMA regions.
| All bytes of an access must be contained within a single PMP region. The operative word there is "access", because a misaligned load
| /store may be (and is typically) split into two separate accesses.
| Ordering of those accesses is not spec'ed, so it's possible to get various exceptions with either the lower or upper part of the load
| /store, (or both).
| When that happens on a store, the trap may occur after either the low hor high alf has been written. (non-determinsitically even, so
| it's a bear to test).

| I don't know if that specific rule applies to PMA's or MMU page crossings, 
| but if a misaligned access is split into two (or more, eventually) accesses that don't cross a boundary, then it's moot;
| you treat them individually. .That split is hard to avoid

| But an implementation isn't required to split a misaligned address, and outside of the PMP spec, I don't think that case is mentioned
| An implementation is free to always trap on a misaligned access and perform it byte-by-byte (while ensuring no interrupt can occur in
| the middle, lest someone see a stale value) 
| I believe it is also legal to handle it entirely in HW excecpt when it crosses a various boundaries (e.g. cacheline, page, etc), and 
| signal a misalign exception if it does.
| Or even signal a misalign exception depending on the phase of the moon (or other non-architecural state).

| Personally, I'd be really happy if we could tighten those rules up a lot.

| On Fri, Aug 12, 2022 at 2:28 PM Greg Favor <gfavor@...> wrote:

| It would be nice if it was architecturally defined/permitted for such straddling accesses to be performed a byte at a time.

| That could be ok for accesses to idempotent memory, but would likely be problematic for a non-idempotent location (e.g. a
| memory-mapped I/O register), and byte accesses to a word MMIO register might not even be allowed by the PMAs for that location.

|

| --

| Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com


Tariq Kurd
 

>For TLBs, the important simplification is PMP/PMA aren't <4KiB in
>granularity, as then existing TLB entires can be used to cache
>permissions.

Yes - this makes a lot of sense. What about the case where the software updates the PMP entries though? This would then require an sfence.vma to clear the micro-TLBs as the PMP permissions may be out-of-date.
The architecture doesn't require this, so can we add this requirement? How is this typically done?

Tariq





On Tue, 16 Aug 2022 at 00:41, <krste@...> wrote:

>>>>> On Mon, 15 Aug 2022 10:14:59 +0200, Tariq Kurd <tariq.kurd@...> said:
|| In particular, a portion of a misaligned store that passes the PMP check may become visible, even if another portion fails the PMP check
| I had no idea this was in the spec - so I'm glad you added that comment Allen.
| yes - between MMU pages, PMP regions and PMA regions it's all pretty complex.
| In systems with an MMU do people typically also implement the PMP? And if so why?

Yes.

To contain < M-mode code running on the hart (including implicit
references such as page-table walkers).

M-mode+PMP can provide a monitor that isolates and multiplexes
multiple S-mode stacks, as in Keystone enclave work.

| As the granularity of PMA and PMP regions are implementation defined - I'm wondering if a nice simplification would be to specify them
| both with 64-byte granularity, and 64-byte alignment to match the cache-block size for the CMOs. At least then the PMAs can't cross the
| boundary of a TLB page.

For TLBs, the important simplification is PMP/PMA aren't <4KiB in
granularity, as then existing TLB entires can be used to cache
permissions.  Having PMP/PMA granules larger than a page is fine, as
these would only be checked on a TLB miss.  If < page, then easiest
solution is to not cache these regions in TLB, forcing a TLB
miss+check on every access, for example.  Of course, other alternative
microarch schemes are possible.

Krste

| Tariq

| On Sat, 13 Aug 2022 at 09:02, Allen Baum <allen.baum@...> wrote:

|     There are at least 3 potential boundaries: MMU pages, PMP regions, and PMA regions.
|     All bytes of an access must be contained within a single PMP region. The operative word there is "access", because a misaligned load
|     /store may be (and is typically) split into two separate accesses.
|     Ordering of those accesses is not spec'ed, so it's possible to get various exceptions with either the lower or upper part of the load
|     /store, (or both).
|     When that happens on a store, the trap may occur after either the low hor high alf has been written. (non-determinsitically even, so
|     it's a bear to test).

|     I don't know if that specific rule applies to PMA's or MMU page crossings, 
|     but if a misaligned access is split into two (or more, eventually) accesses that don't cross a boundary, then it's moot;
|     you treat them individually. .That split is hard to avoid

|     But an implementation isn't required to split a misaligned address, and outside of the PMP spec, I don't think that case is mentioned
|     An implementation is free to always trap on a misaligned access and perform it byte-by-byte (while ensuring no interrupt can occur in
|     the middle, lest someone see a stale value) 
|     I believe it is also legal to handle it entirely in HW excecpt when it crosses a various boundaries (e.g. cacheline, page, etc), and 
|     signal a misalign exception if it does.
|     Or even signal a misalign exception depending on the phase of the moon (or other non-architecural state).

|     Personally, I'd be really happy if we could tighten those rules up a lot.

|     On Fri, Aug 12, 2022 at 2:28 PM Greg Favor <gfavor@...> wrote:

|             It would be nice if it was architecturally defined/permitted for such straddling accesses to be performed a byte at a time.

|         That could be ok for accesses to idempotent memory, but would likely be problematic for a non-idempotent location (e.g. a
|         memory-mapped I/O register), and byte accesses to a word MMIO register might not even be allowed by the PMAs for that location.

| --

| Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com


--

Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com


Tariq Kurd
 

>For TLBs, the important simplification is PMP/PMA aren't <4KiB in
>granularity, as then existing TLB entires can be used to cache
>permissions.

Yes - this makes a lot of sense. What about the case where the software updates the PMP entries though? This would then require an sfence.vma to clear the micro-TLBs as the PMP permissions may be out-of-date.
The architecture doesn't require this, so can we add this requirement? How is this typically done?

I've found this text now, so please disregard my previous email:

"Hence, when the PMP settings are modified, M-mode software must synchronize the PMP settings with the virtual memory system and any PMP or address-translation caches. This is accomplished by executing an SFENCE.VMA instruction with rs1=x0 and rs2=x0, after the PMP CSRs are written."

Thanks

Tariq



On Tue, 16 Aug 2022 at 13:10, Tariq Kurd via lists.riscv.org <tariq.kurd=codasip.com@...> wrote:
>For TLBs, the important simplification is PMP/PMA aren't <4KiB in
>granularity, as then existing TLB entires can be used to cache
>permissions.

Yes - this makes a lot of sense. What about the case where the software updates the PMP entries though? This would then require an sfence.vma to clear the micro-TLBs as the PMP permissions may be out-of-date.
The architecture doesn't require this, so can we add this requirement? How is this typically done?

Tariq





On Tue, 16 Aug 2022 at 00:41, <krste@...> wrote:

>>>>> On Mon, 15 Aug 2022 10:14:59 +0200, Tariq Kurd <tariq.kurd@...> said:
|| In particular, a portion of a misaligned store that passes the PMP check may become visible, even if another portion fails the PMP check
| I had no idea this was in the spec - so I'm glad you added that comment Allen.
| yes - between MMU pages, PMP regions and PMA regions it's all pretty complex.
| In systems with an MMU do people typically also implement the PMP? And if so why?

Yes.

To contain < M-mode code running on the hart (including implicit
references such as page-table walkers).

M-mode+PMP can provide a monitor that isolates and multiplexes
multiple S-mode stacks, as in Keystone enclave work.

| As the granularity of PMA and PMP regions are implementation defined - I'm wondering if a nice simplification would be to specify them
| both with 64-byte granularity, and 64-byte alignment to match the cache-block size for the CMOs. At least then the PMAs can't cross the
| boundary of a TLB page.

For TLBs, the important simplification is PMP/PMA aren't <4KiB in
granularity, as then existing TLB entires can be used to cache
permissions.  Having PMP/PMA granules larger than a page is fine, as
these would only be checked on a TLB miss.  If < page, then easiest
solution is to not cache these regions in TLB, forcing a TLB
miss+check on every access, for example.  Of course, other alternative
microarch schemes are possible.

Krste

| Tariq

| On Sat, 13 Aug 2022 at 09:02, Allen Baum <allen.baum@...> wrote:

|     There are at least 3 potential boundaries: MMU pages, PMP regions, and PMA regions.
|     All bytes of an access must be contained within a single PMP region. The operative word there is "access", because a misaligned load
|     /store may be (and is typically) split into two separate accesses.
|     Ordering of those accesses is not spec'ed, so it's possible to get various exceptions with either the lower or upper part of the load
|     /store, (or both).
|     When that happens on a store, the trap may occur after either the low hor high alf has been written. (non-determinsitically even, so
|     it's a bear to test).

|     I don't know if that specific rule applies to PMA's or MMU page crossings, 
|     but if a misaligned access is split into two (or more, eventually) accesses that don't cross a boundary, then it's moot;
|     you treat them individually. .That split is hard to avoid

|     But an implementation isn't required to split a misaligned address, and outside of the PMP spec, I don't think that case is mentioned
|     An implementation is free to always trap on a misaligned access and perform it byte-by-byte (while ensuring no interrupt can occur in
|     the middle, lest someone see a stale value) 
|     I believe it is also legal to handle it entirely in HW excecpt when it crosses a various boundaries (e.g. cacheline, page, etc), and 
|     signal a misalign exception if it does.
|     Or even signal a misalign exception depending on the phase of the moon (or other non-architecural state).

|     Personally, I'd be really happy if we could tighten those rules up a lot.

|     On Fri, Aug 12, 2022 at 2:28 PM Greg Favor <gfavor@...> wrote:

|             It would be nice if it was architecturally defined/permitted for such straddling accesses to be performed a byte at a time.

|         That could be ok for accesses to idempotent memory, but would likely be problematic for a non-idempotent location (e.g. a
|         memory-mapped I/O register), and byte accesses to a word MMIO register might not even be allowed by the PMAs for that location.

| --

| Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com


--

Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com



--

Tariq Kurd | Lead IP Architect | Codasip UK Design Centre | www.codasip.com


Andres Amaya Garcia
 

Thank you all for the valuable input!

In summary, it is possible to have virtual memory pages that straddle multiple PMA and PMP regions. There are simplifications or implementation decisions that can be made to deal with this situation: Limiting PMA/PMP regions to be >= 4KB, caching in TLB, attribute caches, etc.

However, the rules regarding some of these cases (misaligned accesses across regions, straddling, etc) appear to be rather loose in RISC V (see Allen Baum's message). Is there any ongoing work/plans to revisit the subject and perhaps clarify some of it in the Privileged Specification? If not, is it worth tracking this somewhere? Perhaps creating a GitHub issue? (P.S. I am new to the RISC V community, so don't know how to go about it if there is interest in the subject)

Once again, thanks for the help!