toggle quoted messageShow quoted text
Earlier Intel Larrabee design had variant that required loop around unsuccessful gather according to mask bits.
I believe some folks on this list were responsible for that...
On Nov 17, 2021, at 4:36 PM, Bruce Hoult <bruce@...
At one point I thought that in the case of a gather load the FFR could return an arbitrary mask. But reading the documentation again today I think it's constrained to a (possibly empty) run of 1s followed by a (possibly empty) run of 0s, so yes even in the gather load case simply reducing vlen would do the trick.
You are of course expecting that in correct code either there will be no faulting addresses, or else something in the program logic will cause the loop to exit or skip the bad address before looping back and faulting on retrying the bad address.
On Thu, Nov 18, 2021 at 12:43 PM Krste Asanovic <krste@...
SVE uses a special dedicated FFR register to hold these first-faulting load mask bits.
RVV just reuses vector length register in a natural way.
On Nov 17, 2021, at 3:33 PM, Bill Huffman <huffman@...
Yes, for safely vectorizing loops with indirect references and data dependent control flow. I didn’t see how the mask result did that. Having it be zero for all elements after the one that fails (and presumably the data is zero as well) and having the right way of using that to retry the whole loop makes sense. That just wasn’t clear from the first description.
The primary reason was lack of encoding space for non-unit-stride fault-on-first instructions.
The security concern was being able to probe addresses to find accessible regions without free of being killed on touching a prohibited region. It was noted that this is still present even for unit-stride in supervisor mode when using translation to arbitrarily probe supervisor physical space. However, I believe these security concerns are manageable through control mechanisms at higher privilege levels.
On Nov 17, 2021, at 2:21 PM, Bruce Hoult <bruce@...
On Thu, Nov 18, 2021 at 10:33 AM Bill Huffman <huffman@...> wrote:
Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.
Yes, that needs to be considered as well.
I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.
While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.
Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.
How does this contribute to vectorizing all loops?
Because otherwise you can't safely vectorise loops that do indirect array accesses (e.g. a[b[i]]) with data-dependent control flow.
I think this was not included for security reasons rather than ignored.
I don't think there is any additional security implication.
I could be wrong, as I'm not an expert on SVE, but I believe that even if the gather operation is done (somewhat) in parallel or in random order, the instruction doesn't actually return a mask indicating all the failed accesses. All mask bits after the first element that was inaccessible are also set to false. The following code will process all the initial elements and then invert the mask and loop back and try to process the tail starting from the first inaccessible element, which will then actually fault if the loop didn't exit or skip that element based on program logic.