Re: RISC-V Vector Extension post-public review updates
Yes, for safely vectorizing loops with indirect references and data dependent control flow. I didn’t see how the mask result did that. Having it be zero for all elements after the one that fails (and presumably the data is zero as well) and having the right way of using that to retry the whole loop makes sense. That just wasn’t clear from the first description.
Bill
Sent: Wednesday, November 17, 2021 5:36 PM
To: Bruce Hoult <bruce@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
EXTERNAL MAIL
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region. It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.
Krste
On Nov 17, 2021, at 2:21 PM, Bruce Hoult <bruce@...> wrote:
On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:
From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
EXTERNAL MAIL
Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.
Yes, that needs to be considered as well.
I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.
While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.
Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.
How does this contribute to vectorizing all loops?
Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow.
I think this was not included for security reasons rather than ignored.
I don't think there is any additional security implication.
I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.