RISC-V Vector Extension post-public review updates - fault flagging


David Horner
 


On 2021-11-17 5:36 p.m., Krste Asanovic wrote:
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding dificulties, see below -

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.

Krste


On Nov 17, 2021, at 2:21 PM, Bruce Hoult <bruce@...> wrote:

On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:

 

 

From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.

 

Yes, that needs to be considered as well.

 

I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.

 

While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.

 

Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.

 

How does this contribute to vectorizing all loops?


Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow.

there are two aspects here -

a) checking array indexes are within bounds, which absent proof that the indexes are always in bound should be checked.

    We have a viable mechanism for that. The index values are loaded into a vector register in any event.

    Set a mask on the compare of the index values with a scalar bound.

b) handle a fault exceptions expeditiously by returning a mask of would be faults.

    This should be the exception case and therefore does not need expediting.

    As Krste says the OS level can provide a cooperative mechanism to set a corresponding mask when exceptions occur, if it is deemed the application [or system code] needs it.

   But this feature does not mitigate the out of bound array fetch. Many out of bounds locations can be in accessible memory.

Thus  no need for instructions to have this feature whether supporting all loop constructs or not.

 I think this was not included for security reasons rather than ignored.


I don't think there is any additional security implication.

I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.




Jonathan Behrens <behrensj@...>
 

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Jonathan

On Wed, Nov 17, 2021 at 6:39 PM David Horner via lists.riscv.org <ds2horner=gmail.com@...> wrote:


On 2021-11-17 5:36 p.m., Krste Asanovic wrote:
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding dificulties, see below -

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.

Krste


On Nov 17, 2021, at 2:21 PM, Bruce Hoult <bruce@...> wrote:

On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:

 

 

From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.

 

Yes, that needs to be considered as well.

 

I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.

 

While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.

 

Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.

 

How does this contribute to vectorizing all loops?


Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow.

there are two aspects here -

a) checking array indexes are within bounds, which absent proof that the indexes are always in bound should be checked.

    We have a viable mechanism for that. The index values are loaded into a vector register in any event.

    Set a mask on the compare of the index values with a scalar bound.

b) handle a fault exceptions expeditiously by returning a mask of would be faults.

    This should be the exception case and therefore does not need expediting.

    As Krste says the OS level can provide a cooperative mechanism to set a corresponding mask when exceptions occur, if it is deemed the application [or system code] needs it.

   But this feature does not mitigate the out of bound array fetch. Many out of bounds locations can be in accessible memory.

Thus  no need for instructions to have this feature whether supporting all loop constructs or not.

 I think this was not included for security reasons rather than ignored.


I don't think there is any additional security implication.

I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.




Krste Asanovic
 

A paranoid hypervisor could restrict ff loads to always reduce to vl=1, or only after x failed probes, for example.

Krste

On Nov 17, 2021, at 4:18 PM, Jonathan Behrens <behrensj@...> wrote:

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Jonathan

On Wed, Nov 17, 2021 at 6:39 PM David Horner via lists.riscv.org <ds2horner=gmail.com@...> wrote:


On 2021-11-17 5:36 p.m., Krste Asanovic wrote:
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding dificulties, see below -

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.

Krste


On Nov 17, 2021, at 2:21 PM, Bruce Hoult <bruce@...> wrote:

On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:
 
 

From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.

 

Yes, that needs to be considered as well.

 

I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.

 

While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.

 

Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.

 

How does this contribute to vectorizing all loops?


Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow.

there are two aspects here -

a) checking array indexes are within bounds, which absent proof that the indexes are always in bound should be checked.

    We have a viable mechanism for that. The index values are loaded into a vector register in any event.

    Set a mask on the compare of the index values with a scalar bound.

b) handle a fault exceptions expeditiously by returning a mask of would be faults.

    This should be the exception case and therefore does not need expediting.

    As Krste says the OS level can provide a cooperative mechanism to set a corresponding mask when exceptions occur, if it is deemed the application [or system code] needs it.

   But this feature does not mitigate the out of bound array fetch. Many out of bounds locations can be in accessible memory.

Thus  no need for instructions to have this feature whether supporting all loop constructs or not.

 I think this was not included for security reasons rather than ignored.


I don't think there is any additional security implication.

I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.







David Horner
 


On 2021-11-17 7:18 p.m., Jonathan Behrens wrote:
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are?
yes. I will below.
In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

It could depending upon what implementation details are designed into the hart.

Control Mechanisms:

If the first address of the vector load is problematic, whether first fault or not, the instruction will trap.

So, only in the case that an instruction starts on a valid address and reads past the end of that valid range might the instruction not fault.

The hart is allowed to fault even then.

A count of fault-first events could trigger a trap, thus any misbehaving applications could thus be identified and managed.

This appears to be distinctly different from the SVE design.


Jonathan

On Wed, Nov 17, 2021 at 6:39 PM David Horner via lists.riscv.org <ds2horner=gmail.com@...> wrote:


On 2021-11-17 5:36 p.m., Krste Asanovic wrote:
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding dificulties, see below -

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.

Krste


On Nov 17, 2021, at 2:21 PM, Bruce Hoult <bruce@...> wrote:

On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:

 

 

From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.

 

Yes, that needs to be considered as well.

 

I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.

 

While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.

 

Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.

 

How does this contribute to vectorizing all loops?


Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow.

there are two aspects here -

a) checking array indexes are within bounds, which absent proof that the indexes are always in bound should be checked.

    We have a viable mechanism for that. The index values are loaded into a vector register in any event.

    Set a mask on the compare of the index values with a scalar bound.

b) handle a fault exceptions expeditiously by returning a mask of would be faults.

    This should be the exception case and therefore does not need expediting.

    As Krste says the OS level can provide a cooperative mechanism to set a corresponding mask when exceptions occur, if it is deemed the application [or system code] needs it.

   But this feature does not mitigate the out of bound array fetch. Many out of bounds locations can be in accessible memory.

Thus  no need for instructions to have this feature whether supporting all loop constructs or not.

 I think this was not included for security reasons rather than ignored.


I don't think there is any additional security implication.

I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.




andrew@...
 



On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)


Jonathan

On Wed, Nov 17, 2021 at 6:39 PM David Horner via lists.riscv.org <ds2horner=gmail.com@...> wrote:


On 2021-11-17 5:36 p.m., Krste Asanovic wrote:
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding dificulties, see below -

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.

Krste


On Nov 17, 2021, at 2:21 PM, Bruce Hoult <bruce@...> wrote:

On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:

 

 

From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.

 

Yes, that needs to be considered as well.

 

I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.

 

While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.

 

Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.

 

How does this contribute to vectorizing all loops?


Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow.

there are two aspects here -

a) checking array indexes are within bounds, which absent proof that the indexes are always in bound should be checked.

    We have a viable mechanism for that. The index values are loaded into a vector register in any event.

    Set a mask on the compare of the index values with a scalar bound.

b) handle a fault exceptions expeditiously by returning a mask of would be faults.

    This should be the exception case and therefore does not need expediting.

    As Krste says the OS level can provide a cooperative mechanism to set a corresponding mask when exceptions occur, if it is deemed the application [or system code] needs it.

   But this feature does not mitigate the out of bound array fetch. Many out of bounds locations can be in accessible memory.

Thus  no need for instructions to have this feature whether supporting all loop constructs or not.

 I think this was not included for security reasons rather than ignored.


I don't think there is any additional security implication.

I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.




Jonathan Behrens <behrensj@...>
 



On Wed, Nov 17, 2021 at 8:24 PM Andrew Waterman via lists.riscv.org <andrew=sifive.com@...> wrote:


On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)

The physical address space that the hypervisor tells the guest about may not match the one installed in hgatp. For instance, some pages of the guest's memory might be marked copy-on-write or swapped out to disk. Or a particular device may supposedly be mapped into the guest VM, but actually just be an unmapped region so the host can trap-and-emulate any accesses to it. Even today it is possible for a guest VM to indirectly learn that these things might be happening, but directly being able to check whether a page is mapped adds a new level.


andrew@...
 



On Wed, Nov 17, 2021 at 5:41 PM Jonathan Behrens <behrensj@...> wrote:


On Wed, Nov 17, 2021 at 8:24 PM Andrew Waterman via lists.riscv.org <andrew=sifive.com@...> wrote:


On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)

The physical address space that the hypervisor tells the guest about may not match the one installed in hgatp. For instance, some pages of the guest's memory might be marked copy-on-write or swapped out to disk. Or a particular device may supposedly be mapped into the guest VM, but actually just be an unmapped region so the host can trap-and-emulate any accesses to it. Even today it is possible for a guest VM to indirectly learn that these things might be happening, but directly being able to check whether a page is mapped adds a new level.

Yeah, agreed that detecting paged-out pages is a similar information leak.  (Though I think COW is not relevant here, since we're only talking about load instructions.)


David Horner
 


On 2021-11-17 8:45 p.m., Andrew Waterman wrote:


On Wed, Nov 17, 2021 at 5:41 PM Jonathan Behrens <behrensj@...> wrote:


On Wed, Nov 17, 2021 at 8:24 PM Andrew Waterman via lists.riscv.org <andrew=sifive.com@...> wrote:


On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)

The physical address space that the hypervisor tells the guest about may not match the one installed in hgatp. For instance, some pages of the guest's memory might be marked copy-on-write or swapped out to disk. Or a particular device may supposedly be mapped into the guest VM, but actually just be an unmapped region so the host can trap-and-emulate any accesses to it. Even today it is possible for a guest VM to indirectly learn that these things might be happening, but directly being able to check whether a page is mapped adds a new level.

Yeah, agreed that detecting paged-out pages is a similar information leak.

The VS having this awareness can be very beneficial.

It allows the OS to better manage its resources. It can switch to handling other supervisory actions while that data is paged/staged in.

Never the less, the control mechanisms I previously mentioned apply here as well. The VS is in control, it can "leak" or not as it sees fit.

(Though I think COW is not relevant here, since we're only talking about load instructions.)


David Horner
 

that should have been "The HS is in control, it can "leak" or not as it sees fit" obviously.

On 2021-11-17 9:16 p.m., David Horner via lists.riscv.org wrote:


On 2021-11-17 8:45 p.m., Andrew Waterman wrote:


On Wed, Nov 17, 2021 at 5:41 PM Jonathan Behrens <behrensj@...> wrote:


On Wed, Nov 17, 2021 at 8:24 PM Andrew Waterman via lists.riscv.org <andrew=sifive.com@...> wrote:


On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)

The physical address space that the hypervisor tells the guest about may not match the one installed in hgatp. For instance, some pages of the guest's memory might be marked copy-on-write or swapped out to disk. Or a particular device may supposedly be mapped into the guest VM, but actually just be an unmapped region so the host can trap-and-emulate any accesses to it. Even today it is possible for a guest VM to indirectly learn that these things might be happening, but directly being able to check whether a page is mapped adds a new level.

Yeah, agreed that detecting paged-out pages is a similar information leak.

The VS having this awareness can be very beneficial.

It allows the OS to better manage its resources. It can switch to handling other supervisory actions while that data is paged/staged in.

Never the less, the control mechanisms I previously mentioned apply here as well.


The HS is in control, it can "leak" or not as it sees fit.


(Though I think COW is not relevant here, since we're only talking about load instructions.)


Jonathan Behrens <behrensj@...>
 

Are the mechanisms you mentioned hypothetical future ISA extensions, or something included in the current vector extension? In particular, I don't see anything about M-mode and/or HS-mode requesting a trap if too many non-faulting fault-first-load instructions are executed which modify vl.

Jonathan


On Wed, Nov 17, 2021 at 9:19 PM David Horner <ds2horner@...> wrote:

that should have been "The HS is in control, it can "leak" or not as it sees fit" obviously.

On 2021-11-17 9:16 p.m., David Horner via lists.riscv.org wrote:


On 2021-11-17 8:45 p.m., Andrew Waterman wrote:


On Wed, Nov 17, 2021 at 5:41 PM Jonathan Behrens <behrensj@...> wrote:


On Wed, Nov 17, 2021 at 8:24 PM Andrew Waterman via lists.riscv.org <andrew=sifive.com@...> wrote:


On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)

The physical address space that the hypervisor tells the guest about may not match the one installed in hgatp. For instance, some pages of the guest's memory might be marked copy-on-write or swapped out to disk. Or a particular device may supposedly be mapped into the guest VM, but actually just be an unmapped region so the host can trap-and-emulate any accesses to it. Even today it is possible for a guest VM to indirectly learn that these things might be happening, but directly being able to check whether a page is mapped adds a new level.

Yeah, agreed that detecting paged-out pages is a similar information leak.

The VS having this awareness can be very beneficial.

It allows the OS to better manage its resources. It can switch to handling other supervisory actions while that data is paged/staged in.

Never the less, the control mechanisms I previously mentioned apply here as well.


The HS is in control, it can "leak" or not as it sees fit.


(Though I think COW is not relevant here, since we're only talking about load instructions.)


David Horner
 

The trap if first address bad is stipulated behaviour.

The other are not specified in the vector extension , but 
1. the counter is part of the generalized performance spec.
2. Always trap but allow resume is an implementation option.
3. Dynamically limit vl=1 is a hypothetical extension that could have a csr to manage.

Obviously these could be defined as part of the privilege spec and the VM guys probably want to nail them down exactly.

I prefer minimalistic S and M support with least specs to allow implementations latitude.

On Wed., Nov. 17, 2021, 21:33 Jonathan Behrens, <behrensj@...> wrote:
Are the mechanisms you mentioned hypothetical future ISA extensions, or something included in the current vector extension? In particular, I don't see anything about M-mode and/or HS-mode requesting a trap if too many non-faulting fault-first-load instructions are executed which modify vl.

Jonathan

On Wed, Nov 17, 2021 at 9:19 PM David Horner <ds2horner@...> wrote:

that should have been "The HS is in control, it can "leak" or not as it sees fit" obviously.

On 2021-11-17 9:16 p.m., David Horner via lists.riscv.org wrote:


On 2021-11-17 8:45 p.m., Andrew Waterman wrote:


On Wed, Nov 17, 2021 at 5:41 PM Jonathan Behrens <behrensj@...> wrote:


On Wed, Nov 17, 2021 at 8:24 PM Andrew Waterman via lists.riscv.org <andrew=sifive.com@...> wrote:


On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)

The physical address space that the hypervisor tells the guest about may not match the one installed in hgatp. For instance, some pages of the guest's memory might be marked copy-on-write or swapped out to disk. Or a particular device may supposedly be mapped into the guest VM, but actually just be an unmapped region so the host can trap-and-emulate any accesses to it. Even today it is possible for a guest VM to indirectly learn that these things might be happening, but directly being able to check whether a page is mapped adds a new level.

Yeah, agreed that detecting paged-out pages is a similar information leak.

The VS having this awareness can be very beneficial.

It allows the OS to better manage its resources. It can switch to handling other supervisory actions while that data is paged/staged in.

Never the less, the control mechanisms I previously mentioned apply here as well.


The HS is in control, it can "leak" or not as it sees fit.


(Though I think COW is not relevant here, since we're only talking about load instructions.)


Bill Huffman
 

So if I set up indexes on a ff gather so that I ensure the first one is legal while the rest access pages I want to probe, how does the higher privilege software know I learned about all those pages?

 

I suppose an implementation could trap on any element if it started with vstart=0 and only complete with zero values loaded if vstart>0.  Then the higher privilege software could cooperate and restart even if the element failed for these instructions with vstart pointed at the failed element.  But I don’t think that was the expected implementation.

 

      Bill

 

From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of David Horner
Sent: Wednesday, November 17, 2021 10:12 PM
To: Jonathan Behrens <behrensj@...>
Cc: Andrew Waterman <andrew@...>; vector <tech-vector-ext@...>
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates - fault flagging

 

EXTERNAL MAIL

The trap if first address bad is stipulated behaviour.

 

The other are not specified in the vector extension , but 

1. the counter is part of the generalized performance spec.

2. Always trap but allow resume is an implementation option.

3. Dynamically limit vl=1 is a hypothetical extension that could have a csr to manage.

 

Obviously these could be defined as part of the privilege spec and the VM guys probably want to nail them down exactly.

 

I prefer minimalistic S and M support with least specs to allow implementations latitude.

 

On Wed., Nov. 17, 2021, 21:33 Jonathan Behrens, <behrensj@...> wrote:

Are the mechanisms you mentioned hypothetical future ISA extensions, or something included in the current vector extension? In particular, I don't see anything about M-mode and/or HS-mode requesting a trap if too many non-faulting fault-first-load instructions are executed which modify vl.

 

Jonathan

 

On Wed, Nov 17, 2021 at 9:19 PM David Horner <ds2horner@...> wrote:

that should have been "The HS is in control, it can "leak" or not as it sees fit" obviously.

On 2021-11-17 9:16 p.m., David Horner via lists.riscv.org wrote:

 

On 2021-11-17 8:45 p.m., Andrew Waterman wrote:

 

 

On Wed, Nov 17, 2021 at 5:41 PM Jonathan Behrens <behrensj@...> wrote:

 

 

On Wed, Nov 17, 2021 at 8:24 PM Andrew Waterman via lists.riscv.org <andrew=sifive.com@...> wrote:

 

 

On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

 

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

 

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)

 

The physical address space that the hypervisor tells the guest about may not match the one installed in hgatp. For instance, some pages of the guest's memory might be marked copy-on-write or swapped out to disk. Or a particular device may supposedly be mapped into the guest VM, but actually just be an unmapped region so the host can trap-and-emulate any accesses to it. Even today it is possible for a guest VM to indirectly learn that these things might be happening, but directly being able to check whether a page is mapped adds a new level.

 

Yeah, agreed that detecting paged-out pages is a similar information leak.

The VS having this awareness can be very beneficial.

It allows the OS to better manage its resources. It can switch to handling other supervisory actions while that data is paged/staged in.

Never the less, the control mechanisms I previously mentioned apply here as well.



The HS is in control, it can "leak" or not as it sees fit.



(Though I think COW is not relevant here, since we're only talking about load instructions.)


David Horner
 


On 2021-11-17 10:33 p.m., Bill Huffman wrote:

So if I set up indexes on a ff gather so that I ensure the first one is legal while the rest access pages I want to probe, how does the higher privilege software know I learned about all those pages?

First there is currently no ff gather.

But if there were, the vl would need to be truncated to the first in sequence that faulted.

That would potentially require back tracking by the handler from the element load that faulted first, to each of the earlier loads in the list.

A substantial effort to essentially be thrown away on the next try to discover the page mappings.

This was another reasons that ff gather was rejected. It does not play well with the parallel load behaviour that is allowed for loads.

 

I suppose an implementation could trap on any element if it started with vstart=0 and only complete with zero values loaded if vstart>0.

Then the higher privilege software could cooperate and restart even if the element failed for these instructions with vstart pointed at the failed element. 

But I don’t think that was the expected implementation.

Not sure I follow this.

But note:

If FF gather were implemented the designer would probably always trap on any fault.

If the OS determines the virtual addresses are legitimate it could preemptively page-in/allocate the requested addresses.

If any of the addresses are illegal/illegitimate it could certainly mark this application suspect and escalate its management to whatever security features are enabled.

This is a legal option for load sequential first fault but we have at most 2 pages/regions to deal with.

Indeed, the OS can emulate the set vl length behaviour. It would still be an optimization even in the guard page scenario.

 

      Bill

 

From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of David Horner
Sent: Wednesday, November 17, 2021 10:12 PM
To: Jonathan Behrens <behrensj@...>
Cc: Andrew Waterman <andrew@...>; vector <tech-vector-ext@...>
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates - fault flagging

 

EXTERNAL MAIL

The trap if first address bad is stipulated behaviour.

 

The other are not specified in the vector extension , but 

1. the counter is part of the generalized performance spec.

2. Always trap but allow resume is an implementation option.

3. Dynamically limit vl=1 is a hypothetical extension that could have a csr to manage.

 

Obviously these could be defined as part of the privilege spec and the VM guys probably want to nail them down exactly.

 

I prefer minimalistic S and M support with least specs to allow implementations latitude.

 

On Wed., Nov. 17, 2021, 21:33 Jonathan Behrens, <behrensj@...> wrote:

Are the mechanisms you mentioned hypothetical future ISA extensions, or something included in the current vector extension? In particular, I don't see anything about M-mode and/or HS-mode requesting a trap if too many non-faulting fault-first-load instructions are executed which modify vl.

 

Jonathan

 

On Wed, Nov 17, 2021 at 9:19 PM David Horner <ds2horner@...> wrote:

that should have been "The HS is in control, it can "leak" or not as it sees fit" obviously.

On 2021-11-17 9:16 p.m., David Horner via lists.riscv.org wrote:

 

On 2021-11-17 8:45 p.m., Andrew Waterman wrote:

 

 

On Wed, Nov 17, 2021 at 5:41 PM Jonathan Behrens <behrensj@...> wrote:

 

 

On Wed, Nov 17, 2021 at 8:24 PM Andrew Waterman via lists.riscv.org <andrew=sifive.com@...> wrote:

 

 

On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens <behrensj@...> wrote:

The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region.  It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels

 

Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.

 

Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space.  (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)

 

The physical address space that the hypervisor tells the guest about may not match the one installed in hgatp. For instance, some pages of the guest's memory might be marked copy-on-write or swapped out to disk. Or a particular device may supposedly be mapped into the guest VM, but actually just be an unmapped region so the host can trap-and-emulate any accesses to it. Even today it is possible for a guest VM to indirectly learn that these things might be happening, but directly being able to check whether a page is mapped adds a new level.

 

Yeah, agreed that detecting paged-out pages is a similar information leak.

The VS having this awareness can be very beneficial.

It allows the OS to better manage its resources. It can switch to handling other supervisory actions while that data is paged/staged in.

Never the less, the control mechanisms I previously mentioned apply here as well.



The HS is in control, it can "leak" or not as it sees fit.



(Though I think COW is not relevant here, since we're only talking about load instructions.)


Bruce Hoult
 

On Thu, Nov 18, 2021 at 5:07 PM David Horner <ds2horner@...> wrote:

But if there were, the vl would need to be truncated to the first in sequence that faulted.

That would potentially require back tracking by the handler from the element load that faulted first, to each of the earlier loads in the list.

Simple implementations could simply execute it sequentially. Or have the trap handler execute the loads sequentially if any of them fault.

A substantial effort to essentially be thrown away on the next try to discover the page mappings.

We don't care how slowly malicious code runs.

This was another reasons that ff gather was rejected. It does not play well with the parallel load behaviour that is allowed for loads.

It plays just as well as any gather does, in the absence of faults.

Faulting is very much NOT expected behaviour. You're probably about to terminate the program anyway, or drop into the debugger. The main requirement is that the user can see which iteration of their loop would have failed if the code had been left as scalar instructions instead of auto-vectorised.

If FF gather were implemented the designer would probably always trap on any fault.

If the OS determines the virtual addresses are legitimate it could preemptively page-in/allocate the requested addresses.

If any of the addresses are illegal/illegitimate it could certainly mark this application suspect and escalate its management to whatever security features are enabled.

This is a legal option for load sequential first fault but we have at most 2 pages/regions to deal with.

One region, but it could be many page table entries, given sufficiently long vector registers -- up to 17 with 65536 bit VLEN and LMUL=8.