Re: RISC-V Vector Extension post-public review updates


Bill Huffman
 

 

 

From: Grigorios Magklis <grigorios.magklis@...>
Sent: Tuesday, November 16, 2021 12:03 PM
To: Bill Huffman <huffman@...>; Krste Asanovic <krste@...>; ghost <ghost@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 



On Nov 16, 2021, at 17:31, Bill Huffman <huffman@...> wrote:

 

 

 

-----Original Message-----
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 

 

>>>>> On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...> said:

 

|| 1) Mandate all implementations raise an illegal exception in this

|| case.  This is my preferred route, as this would be a minor errata

|| for existing implementations (doesn't affect software), and we would

|| not reuse this state/encoding for other purposes.

||

|| 2) Allow either correct execution or illegal exception (as with

|| misaligned).

||

|| 3) Consider "reserved", implying implementations that support it are

|| non-conforming unless we later go with 2).

||

|| I'm assuming we're going to push to ratify 1) unless I hear strong

|| objections.

 

| I agree that #1 is the least unfortunate of the alternatives, but I

| want to raise a flag because I think there are larger considerations.

 

| AFAIK, the vector extensions are unique among proposed non-privileged

| extensions in their extensive functional dependency on machine state

| other than the instruction.

 

The task group had a strong consensus in retaining a 32-bit encoding for the vector extension, which led to the separate control state.

The desire to stick with 32-bit encoding was not only to avoid adding a new instruction length, but also to reduce static and dynamic code size.  It should be noted that fixed-instruction-width RISC vector architectures (ARM SVE2, IBM VMX) have had to adopt a prefix model to accomodate vector encodings, with similar concerns about intermediate control state (variable-length ISAs just have very long vector instruction encoding). With obvious bias, I believe the RISC-V solution is cleaner than these others in this regard.

 

| Avoiding this kind of dependency seems to have been a consistent and

| important goal (one of many, of course) in previous designs.

| For example, including a rounding mode in every floating point

| instruction, even the FMA group, multiplied the number of code points

| for these instructions by 8, even though it is not clear (at least to

| me) how important the use cases are.  (IMO this might tend to support

| ds2horner's proposal to use 48- or 64-bit instructions for some of the

| vector capability, but that is off topic for the present discussion;

| and I can see a counter-argument that using machine state simplifies

| pipelining setup that might depend on that state.)

 

A longer 64-bit encoding was always planned for the vector extension as it is clear that the set of desired instruction types could not fit in 32 bits.  The main simplification from using the separate control state was in avoiding the longer instruction width, not in pipelining, which it actually complicates.

 

I think the concern might be unprivileged instructions depending on unprivileged state, which is much less common.  I think the vector situation is different than, for example, round mode.  The difference for vectors is that the added state is used for every vector instruction.  It’s part of executing vectors that the state is set.  A restart point is required to have strided or indexed memory operations and an MMU.  A length is required if we wish to avoid special code to handle vector lengths that are not a multiple of the hardware lengths.  We can’t avoid some of this state even with 48-/64-bit instructions.  We would probably avoid SEW and LMUL with longer vector instructions, but since length has to be set for all vector instructions in some way, setting SEW and LMUL isn’t as big an issue as setting round mode for floating-point operations.

 

      Bill

 

What is the thinking for when we go to >32-bit encodings with respect to vtype and masks? I assume that the longer encoding could encode SEW (and LMUL?) as an override of vtype. What about masks though? If we enable more than one masks (m0…mN) in 48-bit/64-bit encodings, and we want to mix 32-bit and 48-bit/64-bit instructions in the same code, do we still specify that e.g. m0==v0 or do we need to explicitly copy v0 to e.g. m0 before it can be used with 48-bit/64-bit instructions (and vice versa when switching from 48-bit/64-bit instructions to 32-bit instructions)? It would be nice if we could reclaim v0 (actually v0 through v7 for LMUL=8) from being a mask to being able to hold data, *and* not to have to force the whole code/loop body to use 48-bit/64-bit instructions in order to do this.

 

Grigorios

 

I don’t think there’s any agreement at this point on what goes into a longer instruction, but there are a number of candidates, including at least:

  • LMUL
  • SEW
  • VMA and VTA bits
  • Register specifier for the mask register
  • Additional registers – perhaps 128 instead of 32
  • Possibly a fourth register specifier (not counting mask).

 

If I’m counting correctly, that’s already 28 additional bits.  That’s in the range of the maximum that can be put into a 64-bit instruction set.  There are probably more candidates and discussion about which ones to include will certainly be needed. 😊

 

     Bill



 

| Because of this dependency, it seems to me that the current issue

| creates a currently rare, and undesirable, situation where an illegal

| exception trap depends on a significantly complex interaction between

| an instruction and the machine state.  Just something to bear in mind

| for the future.

 

In some cases, the trap is only dependent on the instruction bits (e.g., vfwadd.wv).  In others, it depends on two bits of vtype plus the instruction bits.

 

Of course, actual hardware implementations have many cases where behavior of unprivileged instructions depends on control state settings in privileged layers in much more complex ways.

 

Krste

 

 

 

| --

 

| L Peter Deutsch <ghost@...> :: Aladdin Enterprises ::

| Healdsburg, CA

 

|          Was your vote really counted? 

| https://urldefense.com/v3/__http://www.verifiedvoting.org__;!!EHscmS1y

| giU1lA!SL-ZLgJX3UyHSqPHhjC86qRobWn7UC46C3Dp7NgyS3t1VZoZ-f0HHKimWz9FgSo

| $

 

 

|

 

 

 

 

 

 

 

Join {tech-vector-ext@lists.riscv.org to automatically receive all group messages.