Re: RISC-V Vector Extension post-public review updates - fault flagging
On Wed, Nov 17, 2021 at 4:19 PM Jonathan Behrens < behrensj@...> wrote:
The security concern was being able to probe addresses to find
accessible regions without free of being killed on touching a
prohibited region. It was noted that this is still present even
for unit-stride in supervisor mode when using translation to
arbitrarily probe supervisor physical space. However, I believe
these security concerns are manageable through control mechanisms
at higher privilege levels
Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.
Perhaps I'm being obtuse, but I'm having trouble understanding why this specific case is a concern: it's within VS-mode's purview to know anything and everything about the guest physical address space. (The situation is materially different than S vs. U, because those two share a VA space, whereas VS' GPA space is disjoint from HS' VA space.)
On 2021-11-17 5:36 p.m., Krste Asanovic
wrote:
The primary reason was lack of encoding space for
non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding
dificulties, see below -
The security concern was being able to probe addresses to find
accessible regions without free of being killed on touching a
prohibited region. It was noted that this is still present even
for unit-stride in supervisor mode when using translation to
arbitrarily probe supervisor physical space. However, I believe
these security concerns are manageable through control mechanisms
at higher privilege levels.
Krste
On Nov 17, 2021, at 2:21 PM, Bruce Hoult
< bruce@...> wrote:
On Thu, Nov 18, 2021 at 10:33 AM Bill
Huffman <huffman@...>
wrote:
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November
17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>;
Grigorios Magklis <grigorios.magklis@...>;
tech-vector-ext@...
Subject: Re: [RISC-V]
[tech-vector-ext] RISC-V Vector Extension
post-public review updates
Don't
forget some code may want to use a
mask in inverted sense for individual
instructions, without explicitly
creating a new mask. This was not
listed in the "wish list for 64 bits"
below, but it was in early RVV drafts.
Yes, that needs to be
considered as well.
I'm
not sure how common that really is,
and non-store uses can usually just
use a vmerge.vmm at the end anyway, at
the expense of possibly using extra
registers.
While
on the subject of future features, and
somewhat related ... the one big thing
I've noticed RVV lacking that SVE has
is a non-faulting version of indexed
loads ("gather") which creates a mask
showing which elements were
accessible. In SVE this goes into a
CSR which can then be moved into a
mask register, but of course with
sufficient encoding bits you could
directly put it into a normal
register.
Traditional
vector code doesn't really need this,
but SVE has an aim to be able to
vectorise all loops.
How does this
contribute to vectorizing all loops?
Because
otherwise you can't safely vectorise loops that do
indirect array accesses (e.g. a[b[i]]) with
data-dependent control flow.
there are two aspects here -
a) checking array indexes are within bounds, which absent proof
that the indexes are always in bound should be checked.
We have a viable mechanism for that. The index values are
loaded into a vector register in any event.
Set a mask on the compare of the index values with a scalar
bound.
b) handle a fault exceptions expeditiously by returning a mask of
would be faults.
This should be the exception case and therefore does not need
expediting.
As Krste says the OS level can provide a cooperative
mechanism to set a corresponding mask when exceptions occur, if it
is deemed the application [or system code] needs it.
But this feature does not mitigate the out of bound array
fetch. Many out of bounds locations can be in accessible memory.
Thus no need for instructions to have this feature whether
supporting all loop constructs or not.
I think this was not
included for security reasons rather
than ignored.
I don't think
there is any additional security implication.
I could be
wrong, as I'm not an expert on SVE, but I
believe that even if the gather operation is
done (somewhat) in parallel or in random order,
the instruction doesn't actually return a mask
indicating all the failed accesses. All mask
bits after the first element that was
inaccessible are also set to false. The
following code will process all the initial
elements and then invert the mask and loop back
and try to process the tail starting from the
first inaccessible element, which will then
actually fault if the loop didn't exit or skip
that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates - fault flagging
On 2021-11-17 7:18 p.m., Jonathan
Behrens wrote:
The security concern was being able to probe addresses to find
accessible regions without free of being killed on touching a
prohibited region. It was noted that this is still present
even for unit-stride in supervisor mode when using translation
to arbitrarily probe supervisor physical space. However, I
believe these security concerns are manageable through control
mechanisms at higher privilege levels
Could someone say what these control mechanisms are?
yes. I will below.
In particular, it seems like a VS-mode guest operating
system could probe the entire guest physical address space
using fault-on-first load without triggering any intervention
from HS-mode or M-mode.
It could depending upon what implementation details are designed
into the hart.
Control Mechanisms:
If the first address of the vector load is problematic, whether
first fault or not, the instruction will trap.
So, only in the case that an instruction starts on a valid
address and reads past the end of that valid range might the
instruction not fault.
The hart is allowed to fault even then.
A count of fault-first events could trigger a trap, thus any
misbehaving applications could thus be identified and managed.
This appears to be distinctly different from the SVE design.
On 2021-11-17 5:36 p.m., Krste Asanovic wrote:
The primary reason was lack of encoding space for
non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the
encoding dificulties, see below -
The security concern was being able to probe addresses to
find accessible regions without free of being killed on
touching a prohibited region. It was noted that this is
still present even for unit-stride in supervisor mode when
using translation to arbitrarily probe supervisor physical
space. However, I believe these security concerns are
manageable through control mechanisms at higher privilege
levels.
Krste
On Nov 17, 2021, at 2:21 PM, Bruce Hoult < bruce@...>
wrote:
On
Thu, Nov 18, 2021 at 10:33 AM Bill
Huffman <huffman@...>
wrote:
From:
Bruce Hoult <bruce@...>
Sent: Wednesday, November 17,
2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>;
Grigorios Magklis <grigorios.magklis@...>;
tech-vector-ext@...
Subject: Re: [RISC-V]
[tech-vector-ext] RISC-V Vector
Extension post-public review updates
Don't
forget some code may want to
use a mask in inverted sense
for individual instructions,
without explicitly creating a
new mask. This was not listed
in the "wish list for 64 bits"
below, but it was in early RVV
drafts.
Yes, that
needs to be considered as well.
I'm
not sure how common that
really is, and non-store uses
can usually just use a
vmerge.vmm at the end anyway,
at the expense of possibly
using extra registers.
While
on the subject of future
features, and somewhat related
... the one big thing I've
noticed RVV lacking that SVE
has is a non-faulting version
of indexed loads ("gather")
which creates a mask showing
which elements were
accessible. In SVE this goes
into a CSR which can then be
moved into a mask register,
but of course with sufficient
encoding bits you could
directly put it into a normal
register.
Traditional
vector code doesn't really
need this, but SVE has an aim
to be able to vectorise all
loops.
How does this
contribute to vectorizing all
loops?
Because otherwise
you can't safely vectorise loops that do
indirect array accesses (e.g. a[b[i]])
with data-dependent control flow.
there are two aspects here -
a) checking array indexes are within bounds, which absent
proof that the indexes are always in bound should be
checked.
We have a viable mechanism for that. The index values
are loaded into a vector register in any event.
Set a mask on the compare of the index values with a
scalar bound.
b) handle a fault exceptions expeditiously by returning a
mask of would be faults.
This should be the exception case and therefore does
not need expediting.
As Krste says the OS level can provide a cooperative
mechanism to set a corresponding mask when exceptions
occur, if it is deemed the application [or system code]
needs it.
But this feature does not mitigate the out of bound
array fetch. Many out of bounds locations can be in
accessible memory.
Thus no need for instructions to have this feature
whether supporting all loop constructs or not.
I think this
was not included for security
reasons rather than ignored.
I don't think
there is any additional security
implication.
I could be
wrong, as I'm not an expert on SVE, but
I believe that even if the gather
operation is done (somewhat) in parallel
or in random order, the instruction
doesn't actually return a mask
indicating all the failed accesses. All
mask bits after the first element that
was inaccessible are also set to false.
The following code will process all the
initial elements and then invert the
mask and loop back and try to process
the tail starting from the first
inaccessible element, which will then
actually fault if the loop didn't exit
or skip that element based on program
logic.
|
|
Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
Earlier Intel Larrabee design had variant that required loop around unsuccessful gather according to mask bits. I believe some folks on this list were responsible for that...
toggle quoted messageShow quoted text
On Nov 17, 2021, at 4:36 PM, Bruce Hoult < bruce@...> wrote:
At one point I thought that in the case of a gather load the FFR could return an arbitrary mask. But reading the documentation again today I think it's constrained to a (possibly empty) run of 1s followed by a (possibly empty) run of 0s, so yes even in the gather load case simply reducing vlen would do the trick.
You are of course expecting that in correct code either there will be no faulting addresses, or else something in the program logic will cause the loop to exit or skip the bad address before looping back and faulting on retrying the bad address.
On Thu, Nov 18, 2021 at 12:43 PM Krste Asanovic < krste@...> wrote: SVE uses a special dedicated FFR register to hold these first-faulting load mask bits.
RVV just reuses vector length register in a natural way.
Krste
On Nov 17, 2021, at 3:33 PM, Bill Huffman < huffman@...> wrote:
Yes, for safely vectorizing loops with indirect references and data dependent control flow. I didn’t see how the mask result did that. Having it be zero for all elements after the one that fails (and presumably the data is zero as well) and having the right way of using that to retry the whole loop makes sense. That just wasn’t clear from the first description. Bill The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions. The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region. It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.
On Nov 17, 2021, at 2:21 PM, Bruce Hoult < bruce@...> wrote: On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote: Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts. Yes, that needs to be considered as well. I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers. While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register. Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops. How does this contribute to vectorizing all loops?
Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow. I think this was not included for security reasons rather than ignored.
I don't think there is any additional security implication. I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates

Bruce Hoult
At one point I thought that in the case of a gather load the FFR could return an arbitrary mask. But reading the documentation again today I think it's constrained to a (possibly empty) run of 1s followed by a (possibly empty) run of 0s, so yes even in the gather load case simply reducing vlen would do the trick.
You are of course expecting that in correct code either there will be no faulting addresses, or else something in the program logic will cause the loop to exit or skip the bad address before looping back and faulting on retrying the bad address.
toggle quoted messageShow quoted text
On Thu, Nov 18, 2021 at 12:43 PM Krste Asanovic < krste@...> wrote: SVE uses a special dedicated FFR register to hold these first-faulting load mask bits.
RVV just reuses vector length register in a natural way.
Krste
On Nov 17, 2021, at 3:33 PM, Bill Huffman < huffman@...> wrote:
Yes, for safely vectorizing loops with indirect references and data dependent control flow. I didn’t see how the mask result did that. Having it be zero for all elements after the one that fails (and presumably the data is zero as well) and having the right way of using that to retry the whole loop makes sense. That just wasn’t clear from the first description. Bill The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions. The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region. It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.
On Nov 17, 2021, at 2:21 PM, Bruce Hoult < bruce@...> wrote: On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote: Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts. Yes, that needs to be considered as well. I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers. While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register. Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops. How does this contribute to vectorizing all loops?
Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow. I think this was not included for security reasons rather than ignored.
I don't think there is any additional security implication. I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates - fault flagging

Krste Asanovic
A paranoid hypervisor could restrict ff loads to always reduce to vl=1, or only after x failed probes, for example.
toggle quoted messageShow quoted text
On Nov 17, 2021, at 4:18 PM, Jonathan Behrens < behrensj@...> wrote:
The security concern was being able to probe addresses to find
accessible regions without free of being killed on touching a
prohibited region. It was noted that this is still present even
for unit-stride in supervisor mode when using translation to
arbitrarily probe supervisor physical space. However, I believe
these security concerns are manageable through control mechanisms
at higher privilege levels
Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.
Jonathan
On 2021-11-17 5:36 p.m., Krste Asanovic
wrote:
The primary reason was lack of encoding space for
non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding
dificulties, see below -
The security concern was being able to probe addresses to find
accessible regions without free of being killed on touching a
prohibited region. It was noted that this is still present even
for unit-stride in supervisor mode when using translation to
arbitrarily probe supervisor physical space. However, I believe
these security concerns are manageable through control mechanisms
at higher privilege levels.
Krste
On Nov 17, 2021, at 2:21 PM, Bruce Hoult
< bruce@...> wrote:
On Thu, Nov 18, 2021 at 10:33 AM Bill
Huffman <huffman@...>
wrote:
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November
17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>;
Grigorios Magklis <grigorios.magklis@...>;
tech-vector-ext@...
Subject: Re: [RISC-V]
[tech-vector-ext] RISC-V Vector Extension
post-public review updates
Don't
forget some code may want to use a
mask in inverted sense for individual
instructions, without explicitly
creating a new mask. This was not
listed in the "wish list for 64 bits"
below, but it was in early RVV drafts.
Yes, that needs to be
considered as well.
I'm
not sure how common that really is,
and non-store uses can usually just
use a vmerge.vmm at the end anyway, at
the expense of possibly using extra
registers.
While
on the subject of future features, and
somewhat related ... the one big thing
I've noticed RVV lacking that SVE has
is a non-faulting version of indexed
loads ("gather") which creates a mask
showing which elements were
accessible. In SVE this goes into a
CSR which can then be moved into a
mask register, but of course with
sufficient encoding bits you could
directly put it into a normal
register.
Traditional
vector code doesn't really need this,
but SVE has an aim to be able to
vectorise all loops.
How does this
contribute to vectorizing all loops?
Because
otherwise you can't safely vectorise loops that do
indirect array accesses (e.g. a[b[i]]) with
data-dependent control flow.
there are two aspects here -
a) checking array indexes are within bounds, which absent proof
that the indexes are always in bound should be checked. We have a viable mechanism for that. The index values are
loaded into a vector register in any event. Set a mask on the compare of the index values with a scalar
bound. b) handle a fault exceptions expeditiously by returning a mask of
would be faults. This should be the exception case and therefore does not need
expediting. As Krste says the OS level can provide a cooperative
mechanism to set a corresponding mask when exceptions occur, if it
is deemed the application [or system code] needs it. But this feature does not mitigate the out of bound array
fetch. Many out of bounds locations can be in accessible memory.
Thus no need for instructions to have this feature whether
supporting all loop constructs or not.
I think this was not
included for security reasons rather
than ignored.
I don't think
there is any additional security implication.
I could be
wrong, as I'm not an expert on SVE, but I
believe that even if the gather operation is
done (somewhat) in parallel or in random order,
the instruction doesn't actually return a mask
indicating all the failed accesses. All mask
bits after the first element that was
inaccessible are also set to false. The
following code will process all the initial
elements and then invert the mask and loop back
and try to process the tail starting from the
first inaccessible element, which will then
actually fault if the loop didn't exit or skip
that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates - fault flagging
Jonathan Behrens <behrensj@...>
The security concern was being able to probe addresses to find
accessible regions without free of being killed on touching a
prohibited region. It was noted that this is still present even
for unit-stride in supervisor mode when using translation to
arbitrarily probe supervisor physical space. However, I believe
these security concerns are manageable through control mechanisms
at higher privilege levels
Could someone say what these control mechanisms are? In particular, it seems like a VS-mode guest operating system could probe the entire guest physical address space using fault-on-first load without triggering any intervention from HS-mode or M-mode.
Jonathan
On 2021-11-17 5:36 p.m., Krste Asanovic
wrote:
The primary reason was lack of encoding space for
non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding
dificulties, see below -
The security concern was being able to probe addresses to find
accessible regions without free of being killed on touching a
prohibited region. It was noted that this is still present even
for unit-stride in supervisor mode when using translation to
arbitrarily probe supervisor physical space. However, I believe
these security concerns are manageable through control mechanisms
at higher privilege levels.
Krste
On Nov 17, 2021, at 2:21 PM, Bruce Hoult
< bruce@...> wrote:
On Thu, Nov 18, 2021 at 10:33 AM Bill
Huffman <huffman@...>
wrote:
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November
17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>;
Grigorios Magklis <grigorios.magklis@...>;
tech-vector-ext@...
Subject: Re: [RISC-V]
[tech-vector-ext] RISC-V Vector Extension
post-public review updates
Don't
forget some code may want to use a
mask in inverted sense for individual
instructions, without explicitly
creating a new mask. This was not
listed in the "wish list for 64 bits"
below, but it was in early RVV drafts.
Yes, that needs to be
considered as well.
I'm
not sure how common that really is,
and non-store uses can usually just
use a vmerge.vmm at the end anyway, at
the expense of possibly using extra
registers.
While
on the subject of future features, and
somewhat related ... the one big thing
I've noticed RVV lacking that SVE has
is a non-faulting version of indexed
loads ("gather") which creates a mask
showing which elements were
accessible. In SVE this goes into a
CSR which can then be moved into a
mask register, but of course with
sufficient encoding bits you could
directly put it into a normal
register.
Traditional
vector code doesn't really need this,
but SVE has an aim to be able to
vectorise all loops.
How does this
contribute to vectorizing all loops?
Because
otherwise you can't safely vectorise loops that do
indirect array accesses (e.g. a[b[i]]) with
data-dependent control flow.
there are two aspects here -
a) checking array indexes are within bounds, which absent proof
that the indexes are always in bound should be checked.
We have a viable mechanism for that. The index values are
loaded into a vector register in any event.
Set a mask on the compare of the index values with a scalar
bound.
b) handle a fault exceptions expeditiously by returning a mask of
would be faults.
This should be the exception case and therefore does not need
expediting.
As Krste says the OS level can provide a cooperative
mechanism to set a corresponding mask when exceptions occur, if it
is deemed the application [or system code] needs it.
But this feature does not mitigate the out of bound array
fetch. Many out of bounds locations can be in accessible memory.
Thus no need for instructions to have this feature whether
supporting all loop constructs or not.
I think this was not
included for security reasons rather
than ignored.
I don't think
there is any additional security implication.
I could be
wrong, as I'm not an expert on SVE, but I
believe that even if the gather operation is
done (somewhat) in parallel or in random order,
the instruction doesn't actually return a mask
indicating all the failed accesses. All mask
bits after the first element that was
inaccessible are also set to false. The
following code will process all the initial
elements and then invert the mask and loop back
and try to process the tail starting from the
first inaccessible element, which will then
actually fault if the loop didn't exit or skip
that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
SVE uses a special dedicated FFR register to hold these first-faulting load mask bits.
RVV just reuses vector length register in a natural way.
toggle quoted messageShow quoted text
On Nov 17, 2021, at 3:33 PM, Bill Huffman < huffman@...> wrote:
Yes, for safely vectorizing loops with indirect references and data dependent control flow. I didn’t see how the mask result did that. Having it be zero for all elements after the one that fails (and presumably the data is zero as well) and having the right way of using that to retry the whole loop makes sense. That just wasn’t clear from the first description. Bill The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions. The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region. It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.
On Nov 17, 2021, at 2:21 PM, Bruce Hoult < bruce@...> wrote: On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote: Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts. Yes, that needs to be considered as well. I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers. While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register. Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops. How does this contribute to vectorizing all loops?
Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow. I think this was not included for security reasons rather than ignored.
I don't think there is any additional security implication. I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates - fault flagging
On 2021-11-17 5:36 p.m., Krste Asanovic
wrote:
The primary reason was lack of encoding space for
non-unit-stride fault-on-first instructions.
However, we did discuss its merit; if it would trump the encoding
dificulties, see below -
The security concern was being able to probe addresses to find
accessible regions without free of being killed on touching a
prohibited region. It was noted that this is still present even
for unit-stride in supervisor mode when using translation to
arbitrarily probe supervisor physical space. However, I believe
these security concerns are manageable through control mechanisms
at higher privilege levels.
Krste
On Nov 17, 2021, at 2:21 PM, Bruce Hoult
< bruce@...> wrote:
On Thu, Nov 18, 2021 at 10:33 AM Bill
Huffman <huffman@...>
wrote:
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November
17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>;
Grigorios Magklis <grigorios.magklis@...>;
tech-vector-ext@...
Subject: Re: [RISC-V]
[tech-vector-ext] RISC-V Vector Extension
post-public review updates
Don't
forget some code may want to use a
mask in inverted sense for individual
instructions, without explicitly
creating a new mask. This was not
listed in the "wish list for 64 bits"
below, but it was in early RVV drafts.
Yes, that needs to be
considered as well.
I'm
not sure how common that really is,
and non-store uses can usually just
use a vmerge.vmm at the end anyway, at
the expense of possibly using extra
registers.
While
on the subject of future features, and
somewhat related ... the one big thing
I've noticed RVV lacking that SVE has
is a non-faulting version of indexed
loads ("gather") which creates a mask
showing which elements were
accessible. In SVE this goes into a
CSR which can then be moved into a
mask register, but of course with
sufficient encoding bits you could
directly put it into a normal
register.
Traditional
vector code doesn't really need this,
but SVE has an aim to be able to
vectorise all loops.
How does this
contribute to vectorizing all loops?
Because
otherwise you can't safely vectorise loops that do
indirect array accesses (e.g. a[b[i]]) with
data-dependent control flow.
there are two aspects here -
a) checking array indexes are within bounds, which absent proof
that the indexes are always in bound should be checked.
We have a viable mechanism for that. The index values are
loaded into a vector register in any event.
Set a mask on the compare of the index values with a scalar
bound.
b) handle a fault exceptions expeditiously by returning a mask of
would be faults.
This should be the exception case and therefore does not need
expediting.
As Krste says the OS level can provide a cooperative
mechanism to set a corresponding mask when exceptions occur, if it
is deemed the application [or system code] needs it.
But this feature does not mitigate the out of bound array
fetch. Many out of bounds locations can be in accessible memory.
Thus no need for instructions to have this feature whether
supporting all loop constructs or not.
I think this was not
included for security reasons rather
than ignored.
I don't think
there is any additional security implication.
I could be
wrong, as I'm not an expert on SVE, but I
believe that even if the gather operation is
done (somewhat) in parallel or in random order,
the instruction doesn't actually return a mask
indicating all the failed accesses. All mask
bits after the first element that was
inaccessible are also set to false. The
following code will process all the initial
elements and then invert the mask and loop back
and try to process the tail starting from the
first inaccessible element, which will then
actually fault if the loop didn't exit or skip
that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates
Yes, for safely vectorizing loops with indirect references and data dependent control flow. I didn’t see how the mask result did that. Having it be zero for all elements after the one that fails (and presumably the data is zero as well)
and having the right way of using that to retry the whole loop makes sense. That just wasn’t clear from the first description.
Bill
From: Krste Asanovic <krste@...>
Sent: Wednesday, November 17, 2021 5:36 PM
To: Bruce Hoult <bruce@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region. It was noted that this is still present even for unit-stride in supervisor mode when using
translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.
toggle quoted messageShow quoted text
On Nov 17, 2021, at 2:21 PM, Bruce Hoult <bruce@...> wrote:
On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>;
tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below,
but it was in early RVV drafts.
Yes, that needs to be considered as well.
I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.
While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates
a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.
Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.
How does this contribute to vectorizing all loops?
Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow.
I think this was not included for security reasons rather than ignored.
I don't think there is any additional security implication.
I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't
actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process
the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region. It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.
toggle quoted messageShow quoted text
On Nov 17, 2021, at 2:21 PM, Bruce Hoult < bruce@...> wrote:
On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not
listed in the "wish list for 64 bits" below, but it was in early RVV drafts. Yes, that needs to be considered as well.
I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using
extra registers.
While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version
of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.
Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops. How does this contribute to vectorizing all loops?
Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow. I think this was not included for security reasons rather than ignored.
I don't think there is any additional security implication.
I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates

Bruce Hoult
On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not
listed in the "wish list for 64 bits" below, but it was in early RVV drafts.
Yes, that needs to be considered as well.
I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using
extra registers.
While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version
of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.
Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.
How does this contribute to vectorizing all loops?
Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow. I think this was not included for security reasons rather than ignored.
I don't think there is any additional security implication.
I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.
|
|
Re: RISC-V Vector Extension post-public review updates
On 2021-11-17 4:32 p.m., Bill Huffman
wrote:
From: Bruce
Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios
Magklis <grigorios.magklis@...>;
tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector
Extension post-public review updates
Don't forget some code may want to
use a mask in inverted sense for individual
instructions, without explicitly creating a new mask.
This was not listed in the "wish list for 64 bits"
below, but it was in early RVV drafts.
Yes, that needs to be considered as
well.
I'm not sure how common that really
is, and non-store uses can usually just use a vmerge.vmm
at the end anyway, at the expense of possibly using
extra registers.
While on the subject of future
features, and somewhat related ... the one big thing
I've noticed RVV lacking that SVE has is a non-faulting
version of indexed loads ("gather") which creates a mask
showing which elements were accessible. In SVE this goes
into a CSR which can then be moved into a mask register,
but of course with sufficient encoding bits you could
directly put it into a normal register.
Traditional vector code doesn't
really need this, but SVE has an aim to be able to
vectorise all loops.
How does this contribute to vectorizing
all loops?
I am curious as well.
It makes sense when the whole vector is participating and masking
is the only means to limit processing, but we have vlen.
I think this was not included for
security reasons rather than ignored.
Specifically no First Fault variant was included so that a single
instruction could not capture large swaths of the memory map
information.
Of course no faulting but flagging would be even worse.
On Thu, Nov
18, 2021 at 7:48 AM Krste Asanovic <krste@...>
wrote:
On Nov
17, 2021, at 10:43 AM, Bill Huffman <huffman@...>
wrote:
My
thinking for longer encoding is we would not
add different mask registers, but instead
possibly expand set of architectural vector
registers.
Does
that mean continuing to assume fusing a mask
move with an instruction where desired?
Not
necessarily with a larger mask register specifier.
For example, with 3b mask register specifier, we
could expand to encode v0-v6 as mask sources with
111 meaning unmasked.
|
|
Re: RISC-V Vector Extension post-public review updates
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not
listed in the "wish list for 64 bits" below, but it was in early RVV drafts.
Yes, that needs to be considered as well.
I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using
extra registers.
While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version
of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.
Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.
How does this contribute to vectorizing all loops?
I think this was not included for security reasons rather than ignored.
Bill
toggle quoted messageShow quoted text
On Thu, Nov 18, 2021 at 7:48 AM Krste Asanovic < krste@...> wrote:
On Nov 17, 2021, at 10:43 AM, Bill Huffman <huffman@...> wrote:
My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers.
Does that mean continuing to assume fusing a mask move with an instruction where desired?
Not necessarily with a larger mask register specifier. For example, with 3b mask register specifier, we could expand to encode v0-v6 as mask sources with 111 meaning unmasked.
|
|
Re: RISC-V Vector Extension post-public review updates

Bruce Hoult
Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.
I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.
While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.
Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.
toggle quoted messageShow quoted text
On Thu, Nov 18, 2021 at 7:48 AM Krste Asanovic < krste@...> wrote: On Nov 17, 2021, at 10:43 AM, Bill Huffman < huffman@...> wrote: My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers. Does that mean continuing to assume fusing a mask move with an instruction where desired?
Not necessarily with a larger mask register specifier. For example, with 3b mask register specifier, we could expand to encode v0-v6 as mask sources with 111 meaning unmasked.
Krste
|
|
Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
On Nov 17, 2021, at 10:43 AM, Bill Huffman < huffman@...> wrote: My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers. Does that mean continuing to assume fusing a mask move with an instruction where desired?
Not necessarily with a larger mask register specifier. For example, with 3b mask register specifier, we could expand to encode v0-v6 as mask sources with 111 meaning unmasked.
Krste
|
|
Re: RISC-V Vector Extension post-public review updates
-----Original Message-----
From: krste@... <krste@...>
Sent: Wednesday, November 17, 2021 1:02 PM
To: Bill Huffman <huffman@...>
Cc: Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: RE: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
EXTERNAL MAIL
>>>>> On Tue, 16 Nov 2021 17:15:28 +0000, Bill Huffman <huffman@...> said:
| From: Grigorios Magklis <grigorios.magklis@...>
| Sent: Tuesday, November 16, 2021 12:03 PM
| What is the thinking for when we go to >32-bit encodings with respect
| to vtype and masks? I assume that the longer encoding could encode SEW
| (and LMUL?) as an override of vtype. What about masks though? If we
| enable more than one masks (m0…mN) in 48-bit/64-bit encodings, and we
| want to mix 32-bit and 48-bit /64-bit instructions in the same code,
| do we still specify that e.g. m0==v0 or do we need to explicitly copy
| v0 to e.g. m0 before it can be used with 48-bit/ 64-bit instructions
| (and vice versa when switching from 48-bit/64-bit instructions to
| 32-bit instructions)? It would be nice if we could reclaim v0
| (actually v0 through v7 for LMUL=8) from being a mask to being able to
| hold data, *and* not to have to force the whole code/loop body to use 48-bit/64-bit instructions in order to do this.
| Grigorios
My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers.
Does that mean continuing to assume fusing a mask move with an instruction where desired?
| I don’t think there’s any agreement at this point on what goes into a
| longer instruction, but there are a number of candidates, including at least:
| ● LMUL
| ● SEW
| ● VMA and VTA bits
| ● Register specifier for the mask register
| ● Additional registers – perhaps 128 instead of 32
| ● Possibly a fourth register specifier (not counting mask).
| If I’m counting correctly, that’s already 28 additional bits. That’s
| in the range of the maximum that can be put into a 64-bit instruction
| set. There are probably more candidates and discussion about which
| ones to include will certainly be needed.
😊
Right, even 64 bits will seem tight if all wishes are considered.
Some more experience with actual code and compilers is needed to help tune future extensions.
Agreed.
Bill
Krste
|
|
Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
On Tue, 16 Nov 2021 17:15:28 +0000, Bill Huffman <huffman@...> said:
| From: Grigorios Magklis <grigorios.magklis@...> | Sent: Tuesday, November 16, 2021 12:03 PM | What is the thinking for when we go to >32-bit encodings with respect to vtype | and masks? I assume that the longer encoding could encode SEW (and LMUL?) as | an override of vtype. What about masks though? If we enable more than one | masks (m0…mN) in 48-bit/64-bit encodings, and we want to mix 32-bit and 48-bit | /64-bit instructions in the same code, do we still specify that e.g. m0==v0 or | do we need to explicitly copy v0 to e.g. m0 before it can be used with 48-bit/ | 64-bit instructions (and vice versa when switching from 48-bit/64-bit | instructions to 32-bit instructions)? It would be nice if we could reclaim v0 | (actually v0 through v7 for LMUL=8) from being a mask to being able to hold | data, *and* not to have to force the whole code/loop body to use 48-bit/64-bit | instructions in order to do this. | Grigorios My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers. | I don’t think there’s any agreement at this point on what goes into a longer | instruction, but there are a number of candidates, including at least: | ● LMUL | ● SEW | ● VMA and VTA bits | ● Register specifier for the mask register | ● Additional registers – perhaps 128 instead of 32 | ● Possibly a fourth register specifier (not counting mask). | If I’m counting correctly, that’s already 28 additional bits. That’s in the | range of the maximum that can be put into a 64-bit instruction set. There are | probably more candidates and discussion about which ones to include will | certainly be needed. 😊 Right, even 64 bits will seem tight if all wishes are considered. Some more experience with actual code and compilers is needed to help tune future extensions. Krste
|
|
Re: RISC-V Vector Extension post-public review updates - 32bit opcode decision
On 2021-11-16 12:15 p.m., Bill Huffman
wrote:
On Nov 16,
2021, at 17:31, Bill Huffman <huffman@...>
wrote:
-----Original
Message-----
From: tech-vector-ext@...
<tech-vector-ext@...>
On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...;
tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector
Extension post-public review updates
EXTERNAL
MAIL
>>>>>
On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...>
said:
|| 1)
Mandate all implementations raise an illegal exception
in this
||
case. This is my preferred route, as this would be a
minor errata
|| for
existing implementations (doesn't affect software),
and we would
|| not
reuse this state/encoding for other purposes.
||
|| 2)
Allow either correct execution or illegal exception
(as with
||
misaligned).
||
|| 3)
Consider "reserved", implying implementations that
support it are
||
non-conforming unless we later go with 2).
||
|| I'm
assuming we're going to push to ratify 1) unless I
hear strong
||
objections.
| I
agree that #1 is the least unfortunate of the
alternatives, but I
| want
to raise a flag because I think there are larger
considerations.
|
AFAIK, the vector extensions are unique among proposed
non-privileged
|
extensions in their extensive functional dependency on
machine state
|
other than the instruction.
Yes, absolutely. Many vector models historically have been
co-processors with their own internal status.
RVV integration is also a major accomplishment.
The
task group had a strong consensus
I was a part of that. However, a consensus within a TG does not
make a justification nor provide a rationale.
The ARC has been tasked with that kind of architectural decision,
and to date they have been silent.
We can infer that silence from the ARC is consent. [A motivation
for me to speak up.]
in
retaining a 32-bit encoding for the vector extension,
which led to the separate control state.
The
desire to stick with 32-bit encoding was not only to
avoid adding a new instruction length,
Not that we should minimize the impact from a new instruction
length to additional ratification issues, tool chain, alignment
issues and parceling,
not to mention decode complexities/cost about which some on ARC
are hyperventilate.
but
also to reduce static and dynamic code size.
agreed. >32bit instructions come with a substantial cost.
Usage pattern are paramount to making this decision.
The current understanding is that typical target applications
will readily amortize vtype settings over multiple operations.
Explicitly providing element length information in the load/store
reduces the transition in many use cases.
It
should be noted that fixed-instruction-width RISC
vector architectures (ARM SVE2, IBM VMX) have had to
adopt a prefix model to accomodate vector encodings,
with similar concerns about intermediate control state
The TG has considered "transient" config settings in vtype to
eliminate the need to explicitly flip-flop between vtype states.
It remains a post v1.0 "feature", with the design retaining vtype
as the sole state location for its information.
(variable-length ISAs just have very long vector
instruction encoding).
Yet, RISCV ostensibly has variable-length encoding.
With
obvious bias, I believe the RISC-V solution is cleaner
than these others in this regard.
As do I. especially in encapsulating most persistent control [vs
data ] information in vtype.
Where the design can be faulted is in not saving vcsr in vtype to
minimize context switches concerns.
vstart is essentially transient information that well behaved
applications should ignore.
However, a common opportunity to context switch is when waiting
for resource ad be part of context switch information.
|
Avoiding this kind of dependency seems to have been a
consistent and
|
important goal (one of many, of course) in previous
designs.
| For
example, including a rounding mode in every floating
point
|
instruction, even the FMA group, multiplied the number
of code points
| for
these instructions by 8, even though it is not clear
(at least to
| me)
how important the use cases are. (IMO this might tend
to support
|
ds2horner's proposal to use 48- or 64-bit instructions
for some of the
|
vector capability, but that is off topic for the
present discussion;
I am obviously making this concern a new thread.
Basically, I am hoping these points will be the salient ones for
a response to the Public Review question I raised.
| and
I can see a counter-argument that using machine state
simplifies
|
pipelining setup that might depend on that state.)
A
longer 64-bit encoding was always planned for the
vector extension as it is clear that the set of
desired instruction types could not fit in 32 bits.
vtype is extensible, another of the reasons that this design is
superb.
For example, data-type overriding to substitute for relevant
integer ops complex float allowing it and real float to coexist
through a section of code.
The
main simplification from using the separate control
state was in avoiding the longer instruction width,
not in pipelining, which it actually complicates.
I think
the concern might be unprivileged instructions
depending on unprivileged state, which is much less
common. I think the vector situation is different
than, for example, round mode. The difference for
vectors is that the added state is used for every
vector instruction. It’s part of executing vectors
that the state is set. A restart point is required to
have strided or indexed memory operations and an MMU.
A length is required if we wish to avoid special code
to handle vector lengths that are not a multiple of
the hardware lengths. We can’t avoid some of this
state even with 48-/64-bit instructions. We would
probably avoid SEW and LMUL with longer vector
instructions, but since length has to be set for all
vector instructions in some way, setting SEW and LMUL
isn’t as big an issue as setting round mode for
floating-point operations.
+1
What is the
thinking for when we go to >32-bit encodings with
respect to vtype and masks? I assume that the longer
encoding could encode SEW (and LMUL?) as an override of
vtype. What about masks though? If we enable more than one
masks (m0…mN) in 48-bit/64-bit encodings, and we want to
mix 32-bit and 48-bit/64-bit instructions in the same
code, do we still specify that e.g. m0==v0 or do we need
to explicitly copy v0 to e.g. m0 before it can be used
with 48-bit/64-bit instructions (and vice versa when
switching from 48-bit/64-bit instructions to 32-bit
instructions)?
The salient point of coexistance is probably why we will expand
within 32bit opcode space for the foreseeable future.
It would be
nice if we could reclaim v0 (actually v0 through v7 for
LMUL=8) from being a mask to being able to hold data,
The mask designation could be in vtype while still using 32bit
instruction encoding.
*and* not to
have to force the whole code/loop body to use
48-bit/64-bit instructions in order to do this.
Grigorios
I don’t think there’s any agreement at
this point on what goes into a longer instruction, but there
are a number of candidates, including at least:
- LMUL
- SEW
- VMA and
VTA bits
- Register
specifier for the mask register
- Additional
registers – perhaps 128 instead of 32
Additional register designations 64 or 128 are the most likely
motivator to >32bit instr.
However, I can imagine a windowing mode in which unaligned
register in different LMUL>1 map above the base 32 registers.
Even without modifying vtype this is possible, and with vtype
complex windowing is possible.
- Possibly a
fourth register specifier (not counting mask).
If I’m counting correctly, that’s already
28 additional bits. That’s in the range of the maximum that
can be put into a 64-bit instruction set. There are
probably more candidates and discussion about which ones to
include will certainly be needed. 😊
Bill
For me, the most compelling justification for using 32bit opcodes
is the intentional design to provide vector functionality to
minimal systems.
The design is not just for the super computers but the vision is
that such an integrated vector feature can be used to
auto-vectorize standard code logic.
To be amenable to the lowest of the low.
It is this accomplishment above all others that I am most
appreciative to the TG.
Thank you all.
|
|
Re: RISC-V Vector Extension post-public review updates
From: Grigorios Magklis <grigorios.magklis@...>
Sent: Tuesday, November 16, 2021 12:03 PM
To: Bill Huffman <huffman@...>; Krste Asanovic <krste@...>; ghost <ghost@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
On Nov 16, 2021, at 17:31, Bill Huffman <huffman@...> wrote:
-----Original Message-----
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...;
tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
EXTERNAL MAIL
>>>>> On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...> said:
|| 1) Mandate all implementations raise an illegal exception in this
|| case. This is my preferred route, as this would be a minor errata
|| for existing implementations (doesn't affect software), and we would
|| not reuse this state/encoding for other purposes.
||
|| 2) Allow either correct execution or illegal exception (as with
|| misaligned).
||
|| 3) Consider "reserved", implying implementations that support it are
|| non-conforming unless we later go with 2).
||
|| I'm assuming we're going to push to ratify 1) unless I hear strong
|| objections.
| I agree that #1 is the least unfortunate of the alternatives, but I
| want to raise a flag because I think there are larger considerations.
| AFAIK, the vector extensions are unique among proposed non-privileged
| extensions in their extensive functional dependency on machine state
| other than the instruction.
The task group had a strong consensus in retaining a 32-bit encoding for the vector extension, which led to the separate control state.
The desire to stick with 32-bit encoding was not only to avoid adding a new instruction length, but also to reduce static and dynamic code size. It should be noted that fixed-instruction-width RISC vector architectures
(ARM SVE2, IBM VMX) have had to adopt a prefix model to accomodate vector encodings, with similar concerns about intermediate control state (variable-length ISAs just have very long vector instruction encoding). With obvious bias, I believe the RISC-V solution
is cleaner than these others in this regard.
| Avoiding this kind of dependency seems to have been a consistent and
| important goal (one of many, of course) in previous designs.
| For example, including a rounding mode in every floating point
| instruction, even the FMA group, multiplied the number of code points
| for these instructions by 8, even though it is not clear (at least to
| me) how important the use cases are. (IMO this might tend to support
| ds2horner's proposal to use 48- or 64-bit instructions for some of the
| vector capability, but that is off topic for the present discussion;
| and I can see a counter-argument that using machine state simplifies
| pipelining setup that might depend on that state.)
A longer 64-bit encoding was always planned for the vector extension as it is clear that the set of desired instruction types could not fit in 32 bits. The main simplification from using the separate control
state was in avoiding the longer instruction width, not in pipelining, which it actually complicates.
I think the concern might be unprivileged instructions depending on unprivileged state, which is much less common. I think the vector situation is different than, for example, round mode. The difference for
vectors is that the added state is used for every vector instruction. It’s part of executing vectors that the state is set. A restart point is required to have strided or indexed memory operations and an MMU. A length is required if we wish to avoid
special code to handle vector lengths that are not a multiple of the hardware lengths. We can’t avoid some of this state even with 48-/64-bit instructions. We would probably avoid SEW and LMUL with longer vector instructions, but since length has to be set
for all vector instructions in some way, setting SEW and LMUL isn’t as big an issue as setting round mode for floating-point operations.
Bill
What is the thinking for when we go to >32-bit encodings with respect to vtype and masks? I assume that the longer encoding could encode SEW (and LMUL?) as an override of vtype. What about masks though? If we enable
more than one masks (m0…mN) in 48-bit/64-bit encodings, and we want to mix 32-bit and 48-bit/64-bit instructions in the same code, do we still specify that e.g. m0==v0 or do we need to explicitly copy v0 to e.g. m0 before it can be used with 48-bit/64-bit
instructions (and vice versa when switching from 48-bit/64-bit instructions to 32-bit instructions)? It would be nice if we could reclaim v0 (actually v0 through v7 for LMUL=8) from being a mask to being able to hold data, *and* not to have to force the whole
code/loop body to use 48-bit/64-bit instructions in order to do this.
Grigorios
I don’t think there’s any agreement at this point on what goes into a longer instruction, but there are a number of candidates, including at least:
- LMUL
- SEW
- VMA and VTA bits
- Register specifier for the mask register
- Additional registers – perhaps 128 instead of 32
- Possibly a fourth register specifier (not counting mask).
If I’m counting correctly, that’s already 28 additional bits. That’s in the range of the maximum that can be put into a 64-bit instruction set. There are probably more candidates and discussion about which ones to include will certainly
be needed. 😊
Bill
| Because of this dependency, it seems to me that the current issue
| creates a currently rare, and undesirable, situation where an illegal
| exception trap depends on a significantly complex interaction between
| an instruction and the machine state. Just something to bear in mind
| for the future.
In some cases, the trap is only dependent on the instruction bits (e.g., vfwadd.wv). In others, it depends on two bits of vtype plus the instruction bits.
Of course, actual hardware implementations have many cases where behavior of unprivileged instructions depends on control state settings in privileged layers in much more complex ways.
Krste
| --
| L Peter Deutsch <ghost@...> :: Aladdin Enterprises ::
| Healdsburg, CA
| Was your vote really counted?
|
https://urldefense.com/v3/__http://www.verifiedvoting.org__;!!EHscmS1y
| giU1lA!SL-ZLgJX3UyHSqPHhjC86qRobWn7UC46C3Dp7NgyS3t1VZoZ-f0HHKimWz9FgSo
| $
|
|
|
Re: RISC-V Vector Extension post-public review updates
On Nov 16, 2021, at 17:31, Bill Huffman < huffman@...> wrote:
-----Original Message-----
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates EXTERNAL MAIL >>>>> On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...> said: || 1) Mandate all implementations raise an illegal exception in this
|| case. This is my preferred route, as this would be a minor errata
|| for existing implementations (doesn't affect software), and we would
|| not reuse this state/encoding for other purposes. || || 2) Allow either correct execution or illegal exception (as with
|| misaligned). || || 3) Consider "reserved", implying implementations that support it are
|| non-conforming unless we later go with 2). || || I'm assuming we're going to push to ratify 1) unless I hear strong
|| objections. | I agree that #1 is the least unfortunate of the alternatives, but I
| want to raise a flag because I think there are larger considerations. | AFAIK, the vector extensions are unique among proposed non-privileged
| extensions in their extensive functional dependency on machine state
| other than the instruction. The task group had a strong consensus in retaining a 32-bit encoding for the vector extension, which led to the separate control state. The desire to stick with 32-bit encoding was not only to avoid adding a new instruction length, but also to reduce static and dynamic code size. It should be noted that fixed-instruction-width RISC vector architectures
(ARM SVE2, IBM VMX) have had to adopt a prefix model to accomodate vector encodings, with similar concerns about intermediate control state (variable-length ISAs just have very long vector instruction encoding). With obvious bias, I believe the RISC-V solution
is cleaner than these others in this regard. | Avoiding this kind of dependency seems to have been a consistent and
| important goal (one of many, of course) in previous designs. | For example, including a rounding mode in every floating point
| instruction, even the FMA group, multiplied the number of code points
| for these instructions by 8, even though it is not clear (at least to
| me) how important the use cases are. (IMO this might tend to support
| ds2horner's proposal to use 48- or 64-bit instructions for some of the
| vector capability, but that is off topic for the present discussion;
| and I can see a counter-argument that using machine state simplifies
| pipelining setup that might depend on that state.) A longer 64-bit encoding was always planned for the vector extension as it is clear that the set of desired instruction types could not fit in 32 bits. The main simplification from using the separate control
state was in avoiding the longer instruction width, not in pipelining, which it actually complicates. I think the concern might be unprivileged instructions depending on unprivileged state, which is much less common. I think the vector situation is different than, for example, round mode. The difference for
vectors is that the added state is used for every vector instruction. It’s part of executing vectors that the state is set. A restart point is required to have strided or indexed memory operations and an MMU. A length is required if we wish to avoid
special code to handle vector lengths that are not a multiple of the hardware lengths. We can’t avoid some of this state even with 48-/64-bit instructions. We would probably avoid SEW and LMUL with longer vector instructions, but since length has to be set
for all vector instructions in some way, setting SEW and LMUL isn’t as big an issue as setting round mode for floating-point operations. Bill
What is the thinking for when we go to >32-bit encodings with respect to vtype and masks? I assume that the longer encoding could encode SEW (and LMUL?) as an override of vtype. What about masks though? If we enable more than one masks (m0…mN) in 48-bit/64-bit encodings, and we want to mix 32-bit and 48-bit/64-bit instructions in the same code, do we still specify that e.g. m0==v0 or do we need to explicitly copy v0 to e.g. m0 before it can be used with 48-bit/64-bit instructions (and vice versa when switching from 48-bit/64-bit instructions to 32-bit instructions)? It would be nice if we could reclaim v0 (actually v0 through v7 for LMUL=8) from being a mask to being able to hold data, *and* not to have to force the whole code/loop body to use 48-bit/64-bit instructions in order to do this.
Grigorios
| Because of this dependency, it seems to me that the current issue
| creates a currently rare, and undesirable, situation where an illegal
| exception trap depends on a significantly complex interaction between
| an instruction and the machine state. Just something to bear in mind
| for the future. In some cases, the trap is only dependent on the instruction bits (e.g., vfwadd.wv). In others, it depends on two bits of vtype plus the instruction bits. Of course, actual hardware implementations have many cases where behavior of unprivileged instructions depends on control state settings in privileged layers in much more complex ways. Krste | -- | L Peter Deutsch <ghost@...> :: Aladdin Enterprises ::
| Healdsburg, CA | Was your vote really counted?
|
https://urldefense.com/v3/__http://www.verifiedvoting.org__;!!EHscmS1y | giU1lA!SL-ZLgJX3UyHSqPHhjC86qRobWn7UC46C3Dp7NgyS3t1VZoZ-f0HHKimWz9FgSo | $ |
|
|