Re: RISC-V Vector Extension post-public review updates

Bill Huffman



-----Original Message-----
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates





>>>>> On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...> said:


|| 1) Mandate all implementations raise an illegal exception in this

|| case.  This is my preferred route, as this would be a minor errata

|| for existing implementations (doesn't affect software), and we would

|| not reuse this state/encoding for other purposes.


|| 2) Allow either correct execution or illegal exception (as with

|| misaligned).


|| 3) Consider "reserved", implying implementations that support it are

|| non-conforming unless we later go with 2).


|| I'm assuming we're going to push to ratify 1) unless I hear strong

|| objections.


| I agree that #1 is the least unfortunate of the alternatives, but I

| want to raise a flag because I think there are larger considerations.


| AFAIK, the vector extensions are unique among proposed non-privileged

| extensions in their extensive functional dependency on machine state

| other than the instruction.


The task group had a strong consensus in retaining a 32-bit encoding for the vector extension, which led to the separate control state.

The desire to stick with 32-bit encoding was not only to avoid adding a new instruction length, but also to reduce static and dynamic code size.  It should be noted that fixed-instruction-width RISC vector architectures (ARM SVE2, IBM VMX) have had to adopt a prefix model to accomodate vector encodings, with similar concerns about intermediate control state (variable-length ISAs just have very long vector instruction encoding). With obvious bias, I believe the RISC-V solution is cleaner than these others in this regard.


| Avoiding this kind of dependency seems to have been a consistent and

| important goal (one of many, of course) in previous designs.

| For example, including a rounding mode in every floating point

| instruction, even the FMA group, multiplied the number of code points

| for these instructions by 8, even though it is not clear (at least to

| me) how important the use cases are.  (IMO this might tend to support

| ds2horner's proposal to use 48- or 64-bit instructions for some of the

| vector capability, but that is off topic for the present discussion;

| and I can see a counter-argument that using machine state simplifies

| pipelining setup that might depend on that state.)


A longer 64-bit encoding was always planned for the vector extension as it is clear that the set of desired instruction types could not fit in 32 bits.  The main simplification from using the separate control state was in avoiding the longer instruction width, not in pipelining, which it actually complicates.


I think the concern might be unprivileged instructions depending on unprivileged state, which is much less common.  I think the vector situation is different than, for example, round mode.  The difference for vectors is that the added state is used for every vector instruction.  It’s part of executing vectors that the state is set.  A restart point is required to have strided or indexed memory operations and an MMU.  A length is required if we wish to avoid special code to handle vector lengths that are not a multiple of the hardware lengths.  We can’t avoid some of this state even with 48-/64-bit instructions.  We would probably avoid SEW and LMUL with longer vector instructions, but since length has to be set for all vector instructions in some way, setting SEW and LMUL isn’t as big an issue as setting round mode for floating-point operations.




| Because of this dependency, it seems to me that the current issue

| creates a currently rare, and undesirable, situation where an illegal

| exception trap depends on a significantly complex interaction between

| an instruction and the machine state.  Just something to bear in mind

| for the future.


In some cases, the trap is only dependent on the instruction bits (e.g., vfwadd.wv).  In others, it depends on two bits of vtype plus the instruction bits.


Of course, actual hardware implementations have many cases where behavior of unprivileged instructions depends on control state settings in privileged layers in much more complex ways.






| --


| L Peter Deutsch <ghost@...> :: Aladdin Enterprises ::

| Healdsburg, CA


|          Was your vote really counted? 


| giU1lA!SL-ZLgJX3UyHSqPHhjC86qRobWn7UC46C3Dp7NgyS3t1VZoZ-f0HHKimWz9FgSo

| $










Join to automatically receive all group messages.