Date   

Re: RISC-V Vector Extension post-public review updates

David Horner
 


On 2021-11-17 4:32 p.m., Bill Huffman wrote:

 

 

From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.

 

Yes, that needs to be considered as well.

 

I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.

 

While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.

 

Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.

 

How does this contribute to vectorizing all loops?

I am curious as well.

It makes sense when the whole vector is participating and masking is the only means to limit processing, but we have vlen.

 

I think this was not included for security reasons rather than ignored.

Specifically no First Fault variant was included so that a single instruction could not capture large swaths of the memory map information.

Of course no faulting but flagging would be even worse.

 

     Bill

 

On Thu, Nov 18, 2021 at 7:48 AM Krste Asanovic <krste@...> wrote:

 

On Nov 17, 2021, at 10:43 AM, Bill Huffman <huffman@...> wrote:

-----Original Message-----
From: krste@... <krste@...
Sent: Wednesday, November 17, 2021 1:02 PM
To: Bill Huffman <huffman@...>
Cc: Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: RE: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers.

 

Does that mean continuing to assume fusing a mask move with an instruction where desired?

 

Not necessarily with a larger mask register specifier. For example, with 3b mask register specifier, we could expand to encode v0-v6 as mask sources with 111 meaning unmasked.

 

Krste

 


Re: RISC-V Vector Extension post-public review updates

Bill Huffman
 

 

 

From: Bruce Hoult <bruce@...>
Sent: Wednesday, November 17, 2021 4:24 PM
To: Krste Asanovic <krste@...>
Cc: Bill Huffman <huffman@...>; Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.

 

Yes, that needs to be considered as well.

 

I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.

 

While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.

 

Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.

 

How does this contribute to vectorizing all loops?

 

I think this was not included for security reasons rather than ignored.

 

     Bill

 

On Thu, Nov 18, 2021 at 7:48 AM Krste Asanovic <krste@...> wrote:

 

On Nov 17, 2021, at 10:43 AM, Bill Huffman <huffman@...> wrote:

-----Original Message-----
From: krste@... <krste@...
Sent: Wednesday, November 17, 2021 1:02 PM
To: Bill Huffman <huffman@...>
Cc: Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: RE: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers.

 

Does that mean continuing to assume fusing a mask move with an instruction where desired?

 

Not necessarily with a larger mask register specifier. For example, with 3b mask register specifier, we could expand to encode v0-v6 as mask sources with 111 meaning unmasked.

 

Krste

 


Re: RISC-V Vector Extension post-public review updates

Bruce Hoult
 

Don't forget some code may want to use a mask in inverted sense for individual instructions, without explicitly creating a new mask. This was not listed in the "wish list for 64 bits" below, but it was in early RVV drafts.

I'm not sure how common that really is, and non-store uses can usually just use a vmerge.vmm at the end anyway, at the expense of possibly using extra registers.

While on the subject of future features, and somewhat related ... the one big thing I've noticed RVV lacking that SVE has is a non-faulting version of indexed loads ("gather") which creates a mask showing which elements were accessible. In SVE this goes into a CSR which can then be moved into a mask register, but of course with sufficient encoding bits you could directly put it into a normal register.

Traditional vector code doesn't really need this, but SVE has an aim to be able to vectorise all loops.

On Thu, Nov 18, 2021 at 7:48 AM Krste Asanovic <krste@...> wrote:

On Nov 17, 2021, at 10:43 AM, Bill Huffman <huffman@...> wrote:
-----Original Message-----
From: krste@... <krste@...> 
Sent: Wednesday, November 17, 2021 1:02 PM
To: Bill Huffman <huffman@...>
Cc: Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: RE: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
 
My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers.
 
Does that mean continuing to assume fusing a mask move with an instruction where desired?

Not necessarily with a larger mask register specifier. For example, with 3b mask register specifier, we could expand to encode v0-v6 as mask sources with 111 meaning unmasked.

Krste


Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
 


On Nov 17, 2021, at 10:43 AM, Bill Huffman <huffman@...> wrote:
-----Original Message-----
From: krste@... <krste@...> 
Sent: Wednesday, November 17, 2021 1:02 PM
To: Bill Huffman <huffman@...>
Cc: Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: RE: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates
 
My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers.
 
Does that mean continuing to assume fusing a mask move with an instruction where desired?

Not necessarily with a larger mask register specifier. For example, with 3b mask register specifier, we could expand to encode v0-v6 as mask sources with 111 meaning unmasked.

Krste


Re: RISC-V Vector Extension post-public review updates

Bill Huffman
 

 

 

-----Original Message-----
From: krste@... <krste@...>
Sent: Wednesday, November 17, 2021 1:02 PM
To: Bill Huffman <huffman@...>
Cc: Grigorios Magklis <grigorios.magklis@...>; tech-vector-ext@...
Subject: RE: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 

 

 

>>>>> On Tue, 16 Nov 2021 17:15:28 +0000, Bill Huffman <huffman@...> said:

 

| From: Grigorios Magklis <grigorios.magklis@...>

| Sent: Tuesday, November 16, 2021 12:03 PM

 

| What is the thinking for when we go to >32-bit encodings with respect

| to vtype and masks? I assume that the longer encoding could encode SEW

| (and LMUL?) as an override of vtype. What about masks though? If we

| enable more than one masks (m0…mN) in 48-bit/64-bit encodings, and we

| want to mix 32-bit and 48-bit /64-bit instructions in the same code,

| do we still specify that e.g. m0==v0 or do we need to explicitly copy

| v0 to e.g. m0 before it can be used with 48-bit/ 64-bit instructions

| (and vice versa when switching from 48-bit/64-bit instructions to

| 32-bit instructions)? It would be nice if we could reclaim v0

| (actually v0 through v7 for LMUL=8) from being a mask to being able to

| hold data, *and* not to have to force the whole code/loop body to use 48-bit/64-bit instructions in order to do this.

 

| Grigorios

 

My thinking for longer encoding is we would not add different mask registers, but instead possibly expand set of architectural vector registers.

 

Does that mean continuing to assume fusing a mask move with an instruction where desired?

 

| I don’t think there’s any agreement at this point on what goes into a

| longer instruction, but there are a number of candidates, including at least:

 

|   ● LMUL

|   ● SEW

|   ● VMA and VTA bits

|   ● Register specifier for the mask register

|   ● Additional registers – perhaps 128 instead of 32

|   ● Possibly a fourth register specifier (not counting mask).

 

| If I’m counting correctly, that’s already 28 additional bits.  That’s

| in the range of the maximum that can be put into a 64-bit instruction

| set.  There are probably more candidates and discussion about which

| ones to include will certainly be needed. 😊

 

Right, even 64 bits will seem tight if all wishes are considered.

Some more experience with actual code and compilers is needed to help tune future extensions.

 

Agreed.

 

      Bill

 

Krste


Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
 

On Tue, 16 Nov 2021 17:15:28 +0000, Bill Huffman <huffman@cadence.com> said:
| From: Grigorios Magklis <grigorios.magklis@esperantotech.com>
| Sent: Tuesday, November 16, 2021 12:03 PM

| What is the thinking for when we go to >32-bit encodings with respect to vtype
| and masks? I assume that the longer encoding could encode SEW (and LMUL?) as
| an override of vtype. What about masks though? If we enable more than one
| masks (m0…mN) in 48-bit/64-bit encodings, and we want to mix 32-bit and 48-bit
| /64-bit instructions in the same code, do we still specify that e.g. m0==v0 or
| do we need to explicitly copy v0 to e.g. m0 before it can be used with 48-bit/
| 64-bit instructions (and vice versa when switching from 48-bit/64-bit
| instructions to 32-bit instructions)? It would be nice if we could reclaim v0
| (actually v0 through v7 for LMUL=8) from being a mask to being able to hold
| data, *and* not to have to force the whole code/loop body to use 48-bit/64-bit
| instructions in order to do this.

| Grigorios

My thinking for longer encoding is we would not add different mask
registers, but instead possibly expand set of architectural vector
registers.

| I don’t think there’s any agreement at this point on what goes into a longer
| instruction, but there are a number of candidates, including at least:

| ● LMUL
| ● SEW
| ● VMA and VTA bits
| ● Register specifier for the mask register
| ● Additional registers – perhaps 128 instead of 32
| ● Possibly a fourth register specifier (not counting mask).

| If I’m counting correctly, that’s already 28 additional bits. That’s in the
| range of the maximum that can be put into a 64-bit instruction set. There are
| probably more candidates and discussion about which ones to include will
| certainly be needed. 😊

Right, even 64 bits will seem tight if all wishes are considered.
Some more experience with actual code and compilers is needed to help
tune future extensions.

Krste


Re: RISC-V Vector Extension post-public review updates - 32bit opcode decision

David Horner
 


On 2021-11-16 12:15 p.m., Bill Huffman wrote:

 

 

From: Grigorios Magklis <grigorios.magklis@...>
Sent: Tuesday, November 16, 2021 12:03 PM
To: Bill Huffman <huffman@...>; Krste Asanovic <krste@...>; ghost <ghost@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 



On Nov 16, 2021, at 17:31, Bill Huffman <huffman@...> wrote:

 

 

 

-----Original Message-----
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 

 

>>>>> On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...> said:

 

|| 1) Mandate all implementations raise an illegal exception in this

|| case.  This is my preferred route, as this would be a minor errata

|| for existing implementations (doesn't affect software), and we would

|| not reuse this state/encoding for other purposes.

||

|| 2) Allow either correct execution or illegal exception (as with

|| misaligned).

||

|| 3) Consider "reserved", implying implementations that support it are

|| non-conforming unless we later go with 2).

||

|| I'm assuming we're going to push to ratify 1) unless I hear strong

|| objections.

 

| I agree that #1 is the least unfortunate of the alternatives, but I

| want to raise a flag because I think there are larger considerations.

 

| AFAIK, the vector extensions are unique among proposed non-privileged

| extensions in their extensive functional dependency on machine state

| other than the instruction.

Yes, absolutely. Many vector models historically have been co-processors with their own internal status.

RVV integration is also a major accomplishment.

 

The task group had a strong consensus

I was a part of that. However, a  consensus within a TG does not make a justification nor provide a rationale.

The ARC has been tasked with that kind of architectural decision, and to date they have been silent.

We can infer that silence from the ARC is consent. [A motivation for me to speak up.]

in retaining a 32-bit encoding for the vector extension, which led to the separate control state.


The desire to stick with 32-bit encoding was not only to avoid adding a new instruction length,

Not that we should minimize the impact from a new instruction length to additional ratification issues, tool chain, alignment issues and parceling,

not to mention decode complexities/cost about which some on ARC are hyperventilate.

but also to reduce static and dynamic code size.

agreed. >32bit instructions come with a substantial cost. Usage pattern are paramount to making this decision.

The current understanding is that typical target applications will readily amortize vtype settings over multiple operations.

Explicitly providing element length information in the load/store reduces the transition in many use cases. 

It should be noted that fixed-instruction-width RISC vector architectures (ARM SVE2, IBM VMX) have had to adopt a prefix model to accomodate vector encodings, with similar concerns about intermediate control state

The TG has considered "transient" config settings in vtype to eliminate the need to explicitly flip-flop between vtype states.

It remains a post v1.0 "feature", with the design retaining vtype as the sole state location for its information.

(variable-length ISAs just have very long vector instruction encoding).

Yet,  RISCV ostensibly has variable-length encoding.

With obvious bias, I believe the RISC-V solution is cleaner than these others in this regard.

As do I. especially in encapsulating most persistent control [vs data ] information in vtype.

Where the design can be faulted is in not saving vcsr in vtype to minimize context switches concerns.

vstart is essentially transient information that well behaved applications should ignore.

However, a common opportunity to context switch is when waiting for resource ad be part of context switch information.

 

| Avoiding this kind of dependency seems to have been a consistent and

| important goal (one of many, of course) in previous designs.

| For example, including a rounding mode in every floating point

| instruction, even the FMA group, multiplied the number of code points

| for these instructions by 8, even though it is not clear (at least to

| me) how important the use cases are.  (IMO this might tend to support

| ds2horner's proposal to use 48- or 64-bit instructions for some of the

| vector capability, but that is off topic for the present discussion;

I am obviously making this concern a new thread.

Basically, I am hoping these points will be the salient ones for a response to the Public Review question I raised.

| and I can see a counter-argument that using machine state simplifies

| pipelining setup that might depend on that state.)

 

A longer 64-bit encoding was always planned for the vector extension as it is clear that the set of desired instruction types could not fit in 32 bits.

vtype is extensible, another of the reasons that this design is superb.

For example, data-type overriding to substitute for relevant integer ops complex float allowing it and real float to coexist through a section of code.  

The main simplification from using the separate control state was in avoiding the longer instruction width, not in pipelining, which it actually complicates.

 

I think the concern might be unprivileged instructions depending on unprivileged state, which is much less common.  I think the vector situation is different than, for example, round mode.  The difference for vectors is that the added state is used for every vector instruction.  It’s part of executing vectors that the state is set.  A restart point is required to have strided or indexed memory operations and an MMU.  A length is required if we wish to avoid special code to handle vector lengths that are not a multiple of the hardware lengths.  We can’t avoid some of this state even with 48-/64-bit instructions.  We would probably avoid SEW and LMUL with longer vector instructions, but since length has to be set for all vector instructions in some way, setting SEW and LMUL isn’t as big an issue as setting round mode for floating-point operations.

+1

 

      Bill

 

What is the thinking for when we go to >32-bit encodings with respect to vtype and masks? I assume that the longer encoding could encode SEW (and LMUL?) as an override of vtype. What about masks though? If we enable more than one masks (m0…mN) in 48-bit/64-bit encodings, and we want to mix 32-bit and 48-bit/64-bit instructions in the same code, do we still specify that e.g. m0==v0 or do we need to explicitly copy v0 to e.g. m0 before it can be used with 48-bit/64-bit instructions (and vice versa when switching from 48-bit/64-bit instructions to 32-bit instructions)?

The salient point of coexistance is probably why we will expand within 32bit opcode space for the foreseeable future.


It would be nice if we could reclaim v0 (actually v0 through v7 for LMUL=8) from being a mask to being able to hold data,

The mask designation could be in vtype while still using 32bit instruction encoding.

 

*and* not to have to force the whole code/loop body to use 48-bit/64-bit instructions in order to do this.

 

Grigorios

 

I don’t think there’s any agreement at this point on what goes into a longer instruction, but there are a number of candidates, including at least:

  • LMUL
  • SEW
  • VMA and VTA bits
  • Register specifier for the mask register
  • Additional registers – perhaps 128 instead of 32

Additional register designations 64 or 128 are the most likely motivator to >32bit instr.

However, I can imagine a windowing mode in which unaligned register in different LMUL>1 map above the base 32 registers.

Even without modifying vtype this is possible, and with vtype complex windowing is possible.


  • Possibly a fourth register specifier (not counting mask).

 

If I’m counting correctly, that’s already 28 additional bits.  That’s in the range of the maximum that can be put into a 64-bit instruction set.  There are probably more candidates and discussion about which ones to include will certainly be needed. 😊

 

     Bill

For me, the most compelling justification for using 32bit opcodes is the intentional design to provide vector functionality to minimal systems.

The design is not just for the super computers but the vision is that such an integrated vector feature can be used to auto-vectorize standard code logic.

To be amenable to the lowest of the low.

It is this accomplishment above all others that I am most appreciative to the TG.

Thank you all.





 


Re: RISC-V Vector Extension post-public review updates

Bill Huffman
 

 

 

From: Grigorios Magklis <grigorios.magklis@...>
Sent: Tuesday, November 16, 2021 12:03 PM
To: Bill Huffman <huffman@...>; Krste Asanovic <krste@...>; ghost <ghost@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 



On Nov 16, 2021, at 17:31, Bill Huffman <huffman@...> wrote:

 

 

 

-----Original Message-----
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 

 

>>>>> On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...> said:

 

|| 1) Mandate all implementations raise an illegal exception in this

|| case.  This is my preferred route, as this would be a minor errata

|| for existing implementations (doesn't affect software), and we would

|| not reuse this state/encoding for other purposes.

||

|| 2) Allow either correct execution or illegal exception (as with

|| misaligned).

||

|| 3) Consider "reserved", implying implementations that support it are

|| non-conforming unless we later go with 2).

||

|| I'm assuming we're going to push to ratify 1) unless I hear strong

|| objections.

 

| I agree that #1 is the least unfortunate of the alternatives, but I

| want to raise a flag because I think there are larger considerations.

 

| AFAIK, the vector extensions are unique among proposed non-privileged

| extensions in their extensive functional dependency on machine state

| other than the instruction.

 

The task group had a strong consensus in retaining a 32-bit encoding for the vector extension, which led to the separate control state.

The desire to stick with 32-bit encoding was not only to avoid adding a new instruction length, but also to reduce static and dynamic code size.  It should be noted that fixed-instruction-width RISC vector architectures (ARM SVE2, IBM VMX) have had to adopt a prefix model to accomodate vector encodings, with similar concerns about intermediate control state (variable-length ISAs just have very long vector instruction encoding). With obvious bias, I believe the RISC-V solution is cleaner than these others in this regard.

 

| Avoiding this kind of dependency seems to have been a consistent and

| important goal (one of many, of course) in previous designs.

| For example, including a rounding mode in every floating point

| instruction, even the FMA group, multiplied the number of code points

| for these instructions by 8, even though it is not clear (at least to

| me) how important the use cases are.  (IMO this might tend to support

| ds2horner's proposal to use 48- or 64-bit instructions for some of the

| vector capability, but that is off topic for the present discussion;

| and I can see a counter-argument that using machine state simplifies

| pipelining setup that might depend on that state.)

 

A longer 64-bit encoding was always planned for the vector extension as it is clear that the set of desired instruction types could not fit in 32 bits.  The main simplification from using the separate control state was in avoiding the longer instruction width, not in pipelining, which it actually complicates.

 

I think the concern might be unprivileged instructions depending on unprivileged state, which is much less common.  I think the vector situation is different than, for example, round mode.  The difference for vectors is that the added state is used for every vector instruction.  It’s part of executing vectors that the state is set.  A restart point is required to have strided or indexed memory operations and an MMU.  A length is required if we wish to avoid special code to handle vector lengths that are not a multiple of the hardware lengths.  We can’t avoid some of this state even with 48-/64-bit instructions.  We would probably avoid SEW and LMUL with longer vector instructions, but since length has to be set for all vector instructions in some way, setting SEW and LMUL isn’t as big an issue as setting round mode for floating-point operations.

 

      Bill

 

What is the thinking for when we go to >32-bit encodings with respect to vtype and masks? I assume that the longer encoding could encode SEW (and LMUL?) as an override of vtype. What about masks though? If we enable more than one masks (m0…mN) in 48-bit/64-bit encodings, and we want to mix 32-bit and 48-bit/64-bit instructions in the same code, do we still specify that e.g. m0==v0 or do we need to explicitly copy v0 to e.g. m0 before it can be used with 48-bit/64-bit instructions (and vice versa when switching from 48-bit/64-bit instructions to 32-bit instructions)? It would be nice if we could reclaim v0 (actually v0 through v7 for LMUL=8) from being a mask to being able to hold data, *and* not to have to force the whole code/loop body to use 48-bit/64-bit instructions in order to do this.

 

Grigorios

 

I don’t think there’s any agreement at this point on what goes into a longer instruction, but there are a number of candidates, including at least:

  • LMUL
  • SEW
  • VMA and VTA bits
  • Register specifier for the mask register
  • Additional registers – perhaps 128 instead of 32
  • Possibly a fourth register specifier (not counting mask).

 

If I’m counting correctly, that’s already 28 additional bits.  That’s in the range of the maximum that can be put into a 64-bit instruction set.  There are probably more candidates and discussion about which ones to include will certainly be needed. 😊

 

     Bill



 

| Because of this dependency, it seems to me that the current issue

| creates a currently rare, and undesirable, situation where an illegal

| exception trap depends on a significantly complex interaction between

| an instruction and the machine state.  Just something to bear in mind

| for the future.

 

In some cases, the trap is only dependent on the instruction bits (e.g., vfwadd.wv).  In others, it depends on two bits of vtype plus the instruction bits.

 

Of course, actual hardware implementations have many cases where behavior of unprivileged instructions depends on control state settings in privileged layers in much more complex ways.

 

Krste

 

 

 

| --

 

| L Peter Deutsch <ghost@...> :: Aladdin Enterprises ::

| Healdsburg, CA

 

|          Was your vote really counted? 

| https://urldefense.com/v3/__http://www.verifiedvoting.org__;!!EHscmS1y

| giU1lA!SL-ZLgJX3UyHSqPHhjC86qRobWn7UC46C3Dp7NgyS3t1VZoZ-f0HHKimWz9FgSo

| $

 

 

|

 

 

 

 

 

 

 


Re: RISC-V Vector Extension post-public review updates

Mr Grigorios Magklis
 



On Nov 16, 2021, at 17:31, Bill Huffman <huffman@...> wrote:

 

 

-----Original Message-----
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 

 

>>>>> On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...> said:

 

|| 1) Mandate all implementations raise an illegal exception in this

|| case.  This is my preferred route, as this would be a minor errata

|| for existing implementations (doesn't affect software), and we would

|| not reuse this state/encoding for other purposes.

||

|| 2) Allow either correct execution or illegal exception (as with

|| misaligned).

||

|| 3) Consider "reserved", implying implementations that support it are

|| non-conforming unless we later go with 2).

||

|| I'm assuming we're going to push to ratify 1) unless I hear strong

|| objections.

 

| I agree that #1 is the least unfortunate of the alternatives, but I

| want to raise a flag because I think there are larger considerations.

 

| AFAIK, the vector extensions are unique among proposed non-privileged

| extensions in their extensive functional dependency on machine state

| other than the instruction.

 

The task group had a strong consensus in retaining a 32-bit encoding for the vector extension, which led to the separate control state.

The desire to stick with 32-bit encoding was not only to avoid adding a new instruction length, but also to reduce static and dynamic code size.  It should be noted that fixed-instruction-width RISC vector architectures (ARM SVE2, IBM VMX) have had to adopt a prefix model to accomodate vector encodings, with similar concerns about intermediate control state (variable-length ISAs just have very long vector instruction encoding). With obvious bias, I believe the RISC-V solution is cleaner than these others in this regard.

 

| Avoiding this kind of dependency seems to have been a consistent and

| important goal (one of many, of course) in previous designs.

| For example, including a rounding mode in every floating point

| instruction, even the FMA group, multiplied the number of code points

| for these instructions by 8, even though it is not clear (at least to

| me) how important the use cases are.  (IMO this might tend to support

| ds2horner's proposal to use 48- or 64-bit instructions for some of the

| vector capability, but that is off topic for the present discussion;

| and I can see a counter-argument that using machine state simplifies

| pipelining setup that might depend on that state.)

 

A longer 64-bit encoding was always planned for the vector extension as it is clear that the set of desired instruction types could not fit in 32 bits.  The main simplification from using the separate control state was in avoiding the longer instruction width, not in pipelining, which it actually complicates.

 

I think the concern might be unprivileged instructions depending on unprivileged state, which is much less common.  I think the vector situation is different than, for example, round mode.  The difference for vectors is that the added state is used for every vector instruction.  It’s part of executing vectors that the state is set.  A restart point is required to have strided or indexed memory operations and an MMU.  A length is required if we wish to avoid special code to handle vector lengths that are not a multiple of the hardware lengths.  We can’t avoid some of this state even with 48-/64-bit instructions.  We would probably avoid SEW and LMUL with longer vector instructions, but since length has to be set for all vector instructions in some way, setting SEW and LMUL isn’t as big an issue as setting round mode for floating-point operations.

 

      Bill


What is the thinking for when we go to >32-bit encodings with respect to vtype and masks? I assume that the longer encoding could encode SEW (and LMUL?) as an override of vtype. What about masks though? If we enable more than one masks (m0…mN) in 48-bit/64-bit encodings, and we want to mix 32-bit and 48-bit/64-bit instructions in the same code, do we still specify that e.g. m0==v0 or do we need to explicitly copy v0 to e.g. m0 before it can be used with 48-bit/64-bit instructions (and vice versa when switching from 48-bit/64-bit instructions to 32-bit instructions)? It would be nice if we could reclaim v0 (actually v0 through v7 for LMUL=8) from being a mask to being able to hold data, *and* not to have to force the whole code/loop body to use 48-bit/64-bit instructions in order to do this.

Grigorios

 

| Because of this dependency, it seems to me that the current issue

| creates a currently rare, and undesirable, situation where an illegal

| exception trap depends on a significantly complex interaction between

| an instruction and the machine state.  Just something to bear in mind

| for the future.

 

In some cases, the trap is only dependent on the instruction bits (e.g., vfwadd.wv).  In others, it depends on two bits of vtype plus the instruction bits.

 

Of course, actual hardware implementations have many cases where behavior of unprivileged instructions depends on control state settings in privileged layers in much more complex ways.

 

Krste

 

 

 

| --

 

| L Peter Deutsch <ghost@...> :: Aladdin Enterprises ::

| Healdsburg, CA

 

|          Was your vote really counted? 

| https://urldefense.com/v3/__http://www.verifiedvoting.org__;!!EHscmS1y

| giU1lA!SL-ZLgJX3UyHSqPHhjC86qRobWn7UC46C3Dp7NgyS3t1VZoZ-f0HHKimWz9FgSo

| $

 

 

|

 

 

 

 

 

 



Re: RISC-V Vector Extension post-public review updates

Bill Huffman
 

 

 

-----Original Message-----
From: tech-vector-ext@... <tech-vector-ext@...> On Behalf Of Krste Asanovic
Sent: Tuesday, November 16, 2021 11:13 AM
To: ghost <ghost@...>
Cc: krste@...; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] RISC-V Vector Extension post-public review updates

 

EXTERNAL MAIL

 

 

>>>>> On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@...> said:

 

|| 1) Mandate all implementations raise an illegal exception in this

|| case.  This is my preferred route, as this would be a minor errata

|| for existing implementations (doesn't affect software), and we would

|| not reuse this state/encoding for other purposes.

||

|| 2) Allow either correct execution or illegal exception (as with

|| misaligned).

||

|| 3) Consider "reserved", implying implementations that support it are

|| non-conforming unless we later go with 2).

||

|| I'm assuming we're going to push to ratify 1) unless I hear strong

|| objections.

 

| I agree that #1 is the least unfortunate of the alternatives, but I

| want to raise a flag because I think there are larger considerations.

 

| AFAIK, the vector extensions are unique among proposed non-privileged

| extensions in their extensive functional dependency on machine state

| other than the instruction.

 

The task group had a strong consensus in retaining a 32-bit encoding for the vector extension, which led to the separate control state.

The desire to stick with 32-bit encoding was not only to avoid adding a new instruction length, but also to reduce static and dynamic code size.  It should be noted that fixed-instruction-width RISC vector architectures (ARM SVE2, IBM VMX) have had to adopt a prefix model to accomodate vector encodings, with similar concerns about intermediate control state (variable-length ISAs just have very long vector instruction encoding). With obvious bias, I believe the RISC-V solution is cleaner than these others in this regard.

 

| Avoiding this kind of dependency seems to have been a consistent and

| important goal (one of many, of course) in previous designs.

| For example, including a rounding mode in every floating point

| instruction, even the FMA group, multiplied the number of code points

| for these instructions by 8, even though it is not clear (at least to

| me) how important the use cases are.  (IMO this might tend to support

| ds2horner's proposal to use 48- or 64-bit instructions for some of the

| vector capability, but that is off topic for the present discussion;

| and I can see a counter-argument that using machine state simplifies

| pipelining setup that might depend on that state.)

 

A longer 64-bit encoding was always planned for the vector extension as it is clear that the set of desired instruction types could not fit in 32 bits.  The main simplification from using the separate control state was in avoiding the longer instruction width, not in pipelining, which it actually complicates.

 

I think the concern might be unprivileged instructions depending on unprivileged state, which is much less common.  I think the vector situation is different than, for example, round mode.  The difference for vectors is that the added state is used for every vector instruction.  It’s part of executing vectors that the state is set.  A restart point is required to have strided or indexed memory operations and an MMU.  A length is required if we wish to avoid special code to handle vector lengths that are not a multiple of the hardware lengths.  We can’t avoid some of this state even with 48-/64-bit instructions.  We would probably avoid SEW and LMUL with longer vector instructions, but since length has to be set for all vector instructions in some way, setting SEW and LMUL isn’t as big an issue as setting round mode for floating-point operations.

 

      Bill

 

| Because of this dependency, it seems to me that the current issue

| creates a currently rare, and undesirable, situation where an illegal

| exception trap depends on a significantly complex interaction between

| an instruction and the machine state.  Just something to bear in mind

| for the future.

 

In some cases, the trap is only dependent on the instruction bits (e.g., vfwadd.wv).  In others, it depends on two bits of vtype plus the instruction bits.

 

Of course, actual hardware implementations have many cases where behavior of unprivileged instructions depends on control state settings in privileged layers in much more complex ways.

 

Krste

 

 

 

| --

 

| L Peter Deutsch <ghost@...> :: Aladdin Enterprises ::

| Healdsburg, CA

 

|          Was your vote really counted? 

| https://urldefense.com/v3/__http://www.verifiedvoting.org__;!!EHscmS1y

| giU1lA!SL-ZLgJX3UyHSqPHhjC86qRobWn7UC46C3Dp7NgyS3t1VZoZ-f0HHKimWz9FgSo

| $

 

 

|

 

 

 

 

 

 


Re: RISC-V Vector Extension post-public review updates

Nick Knight
 

[...] it seems to me
that the current issue creates a currently rare, and undesirable, situation
where an illegal exception trap depends on a significantly complex
interaction between an instruction and the machine state.

Almost all vector instructions already have to check their vector operands to make sure the register numbers are compatible with (dynamic) LMUL. This is further complicated in, e.g., the case of mixed-width instructions' source-destination register overlap constraints. The (static EEW) loads/stores also have to do similar checking, since EMUL depends on (dynamic) SEW. I'd argue that the dependence on vtype of the validity of vector register numbers is pervasive, not rare. (I won't argue about un/desirability.)

Best,
Nick

On Tue, Nov 16, 2021, 7:53 AM ghost <ghost@...> wrote:
>  1) Mandate all implementations raise an illegal exception in this
>  case.  This is my preferred route, as this would be a minor errata for
>  existing implementations (doesn't affect software), and we would not
>  reuse
>  this state/encoding for other purposes.
>
>  2) Allow either correct execution or illegal exception (as with
>  misaligned). 
>
>  3) Consider "reserved", implying implementations that support it are
>  non-conforming unless we later go with 2).
>
>  I'm assuming we're going to push to ratify 1) unless I hear strong
>  objections.

I agree that #1 is the least unfortunate of the alternatives, but I want to
raise a flag because I think there are larger considerations.

AFAIK, the vector extensions are unique among proposed non-privileged
extensions in their extensive functional dependency on machine state other
than the instruction.  Avoiding this kind of dependency seems to have been a
consistent and important goal (one of many, of course) in previous designs.
For example, including a rounding mode in every floating point instruction,
even the FMA group, multiplied the number of code points for these
instructions by 8, even though it is not clear (at least to me) how
important the use cases are.  (IMO this might tend to support ds2horner's
proposal to use 48- or 64-bit instructions for some of the vector
capability, but that is off topic for the present discussion; and I can see
a counter-argument that using machine state simplifies pipelining setup that
might depend on that state.)  Because of this dependency, it seems to me
that the current issue creates a currently rare, and undesirable, situation
where an illegal exception trap depends on a significantly complex
interaction between an instruction and the machine state.  Just something to
bear in mind for the future.

--

L Peter Deutsch <ghost@...> :: Aladdin Enterprises :: Healdsburg, CA

         Was your vote really counted?  http://www.verifiedvoting.org






Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
 

On Tue, 16 Nov 2021 07:36:40 -0800 (PST), "ghost" <ghost@major2nd.com> said:
|| 1) Mandate all implementations raise an illegal exception in this
|| case. This is my preferred route, as this would be a minor errata for
|| existing implementations (doesn't affect software), and we would not
|| reuse
|| this state/encoding for other purposes.
||
|| 2) Allow either correct execution or illegal exception (as with
|| misaligned).
||
|| 3) Consider "reserved", implying implementations that support it are
|| non-conforming unless we later go with 2).
||
|| I'm assuming we're going to push to ratify 1) unless I hear strong
|| objections.

| I agree that #1 is the least unfortunate of the alternatives, but I want to
| raise a flag because I think there are larger considerations.

| AFAIK, the vector extensions are unique among proposed non-privileged
| extensions in their extensive functional dependency on machine state other
| than the instruction.

The task group had a strong consensus in retaining a 32-bit encoding
for the vector extension, which led to the separate control state.
The desire to stick with 32-bit encoding was not only to avoid adding
a new instruction length, but also to reduce static and dynamic code
size. It should be noted that fixed-instruction-width RISC vector
architectures (ARM SVE2, IBM VMX) have had to adopt a prefix model to
accomodate vector encodings, with similar concerns about intermediate
control state (variable-length ISAs just have very long vector
instruction encoding). With obvious bias, I believe the RISC-V
solution is cleaner than these others in this regard.

| Avoiding this kind of dependency seems to have been a
| consistent and important goal (one of many, of course) in previous designs.
| For example, including a rounding mode in every floating point instruction,
| even the FMA group, multiplied the number of code points for these
| instructions by 8, even though it is not clear (at least to me) how
| important the use cases are. (IMO this might tend to support ds2horner's
| proposal to use 48- or 64-bit instructions for some of the vector
| capability, but that is off topic for the present discussion; and I can see
| a counter-argument that using machine state simplifies pipelining setup that
| might depend on that state.)

A longer 64-bit encoding was always planned for the vector extension
as it is clear that the set of desired instruction types could not fit
in 32 bits. The main simplification from using the separate control
state was in avoiding the longer instruction width, not in pipelining,
which it actually complicates.

| Because of this dependency, it seems to me
| that the current issue creates a currently rare, and undesirable, situation
| where an illegal exception trap depends on a significantly complex
| interaction between an instruction and the machine state. Just something to
| bear in mind for the future.

In some cases, the trap is only dependent on the instruction bits
(e.g., vfwadd.wv). In others, it depends on two bits of vtype plus
the instruction bits.

Of course, actual hardware implementations have many cases where
behavior of unprivileged instructions depends on control state
settings in privileged layers in much more complex ways.

Krste


| --

| L Peter Deutsch <ghost@major2nd.com> :: Aladdin Enterprises :: Healdsburg, CA

| Was your vote really counted? http://www.verifiedvoting.org


|


Re: RISC-V Vector Extension post-public review updates

ghost
 

1) Mandate all implementations raise an illegal exception in this
case. This is my preferred route, as this would be a minor errata for
existing implementations (doesn't affect software), and we would not
reuse
this state/encoding for other purposes.

2) Allow either correct execution or illegal exception (as with
misaligned).

3) Consider "reserved", implying implementations that support it are
non-conforming unless we later go with 2).

I'm assuming we're going to push to ratify 1) unless I hear strong
objections.
I agree that #1 is the least unfortunate of the alternatives, but I want to
raise a flag because I think there are larger considerations.

AFAIK, the vector extensions are unique among proposed non-privileged
extensions in their extensive functional dependency on machine state other
than the instruction. Avoiding this kind of dependency seems to have been a
consistent and important goal (one of many, of course) in previous designs.
For example, including a rounding mode in every floating point instruction,
even the FMA group, multiplied the number of code points for these
instructions by 8, even though it is not clear (at least to me) how
important the use cases are. (IMO this might tend to support ds2horner's
proposal to use 48- or 64-bit instructions for some of the vector
capability, but that is off topic for the present discussion; and I can see
a counter-argument that using machine state simplifies pipelining setup that
might depend on that state.) Because of this dependency, it seems to me
that the current issue creates a currently rare, and undesirable, situation
where an illegal exception trap depends on a significantly complex
interaction between an instruction and the machine state. Just something to
bear in mind for the future.

--

L Peter Deutsch <ghost@major2nd.com> :: Aladdin Enterprises :: Healdsburg, CA

Was your vote really counted? http://www.verifiedvoting.org


Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
 

On Tue, 16 Nov 2021 10:18:41 +0100, Victor Moya <victor.moya@semidynamics.com> said:
| From an ISA definition point of view it doesn't make sense to forbid properly formed operations to benefit a
| specific hardware implementation. It's an ugly hack.

A common goal in ISA design is avoiding complex implementations for
operations that are not used by software.

| If a given hardware implementation can't handle it in an optimal way and it really doesn't have real software
| use (ie, performance is irrelevant) it can just trigger a slow path (microcode sequence or trap to software
| emulation).

This might add the only microcode or emulation path to a design, and
for an operation that is never used.

| But given that it isn't the first case in the spec it isn't really much of a problem. Between making halfway
| hacks, I think it's better to make it completely illegal (option #1) than to add additional fragmentation
| that may affect compilers with #2 or #3.

Making it reserved allows for future definition as illegal, and is a
smaller deviation from the frozen spec. It does not cause
fragmentation as software, and especially compilers, will not use this
case.

| If the vector specification is required to be optimal for a specific hardware implementation better make it
| explicitly so and not go in roundabout ways..

The specification is explicit in being designed to support wide
implementations that internally rearrange data, which is the class of
machines that this change is aimed at. The current design evolved to
remove the fragmentation that would arise from making data
rearrangement visible, while also not requiring a design with explicit
upper/lower or odd/even steps to handle mixed-width operations.

Krste

| Victor

| On Mon, Nov 15, 2021 at 10:05 PM Krste Asanovic <krste@sifive.com> wrote:

| Apart from requests for more instructions, which can be handled with
| later extensions, there were no real substantive updates.

| I did notice one issue at end of public review, however.

| The current specification allows some instructions to have two vector
| source operands read from the same vector register but with different
| EEW.  For example, a vector indexed store with the index vector and
| data vector overlapping, but different EEW.  Or a widening vector add
| (vwadd.wv) where the two vector sources overlap but have different
| EEW.  This complicates implementations that internally restripe the
| vector data (e.g., internal SLEN), and does not have a valid software
| use (cue folks furiously trying to construct one...).

| The proposal is to allow implementations to raise an illegal
| instruction exception in this case.  I believe this is an important
| and necessary change to accomodate internal striping.  In practice,
| this change has no impact on software.

| We do have a choice of:

| 1) Mandate all implementations raise an illegal exception in this
| case.  This is my preferred route, as this would be a minor errata for
| existing implementations (doesn't affect software), and we would not reuse
| this state/encoding for other purposes.

| 2) Allow either correct execution or illegal exception (as with
| misaligned). 

| 3) Consider "reserved", implying implementations that support it are
| non-conforming unless we later go with 2).

| I'm assuming we're going to push to ratify 1) unless I hear strong objections.

| Krste

|


Re: RISC-V Vector Extension post-public review updates

Victor Moya
 


From an ISA definition point of view it doesn't make sense to forbid properly formed operations to benefit a specific hardware implementation. It's an ugly hack.

If a given hardware implementation can't handle it in an optimal way and it really doesn't have real software use (ie, performance is irrelevant) it can just trigger a slow path (microcode sequence or trap to software emulation).

But given that it isn't the first case in the spec it isn't really much of a problem. Between making halfway hacks, I think it's better to make it completely illegal (option #1) than to add additional fragmentation that may affect compilers with #2 or #3.

If the vector specification is required to be optimal for a specific hardware implementation better make it explicitly so and not go in roundabout ways..

Victor


On Mon, Nov 15, 2021 at 10:05 PM Krste Asanovic <krste@...> wrote:

Apart from requests for more instructions, which can be handled with
later extensions, there were no real substantive updates.

I did notice one issue at end of public review, however.

The current specification allows some instructions to have two vector
source operands read from the same vector register but with different
EEW.  For example, a vector indexed store with the index vector and
data vector overlapping, but different EEW.  Or a widening vector add
(vwadd.wv) where the two vector sources overlap but have different
EEW.  This complicates implementations that internally restripe the
vector data (e.g., internal SLEN), and does not have a valid software
use (cue folks furiously trying to construct one...).

The proposal is to allow implementations to raise an illegal
instruction exception in this case.  I believe this is an important
and necessary change to accomodate internal striping.  In practice,
this change has no impact on software.

We do have a choice of:

1) Mandate all implementations raise an illegal exception in this
case.  This is my preferred route, as this would be a minor errata for
existing implementations (doesn't affect software), and we would not reuse
this state/encoding for other purposes.

2) Allow either correct execution or illegal exception (as with
misaligned). 

3) Consider "reserved", implying implementations that support it are
non-conforming unless we later go with 2).

I'm assuming we're going to push to ratify 1) unless I hear strong objections.

Krste






Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
 

In this case, the trap cause can be determined by looking at the vtype
value and the instruction encoding (most only need to look at
instruction encoding), independent of implementation. No vtype
probing is needed.

(assuming there isn't some non-conforming use of the encoding, which
is out-of-scope for any discussion of standard trap handlers)

Krste

On Mon, 15 Nov 2021 15:49:03 -0800, Guy Lemieux <guy.lemieux@gmail.com> said:
| To determine the trap cause, without such a bit, software will have to examine many possible vtype settings that are unique for each particular
| instruction. The trap handler will be highly customized for each cpu implementation.

| This could be done more easily in a handful of logic gates, without a vastly different flow in the trap handler (which will already know to
| check vill).

| Guy

| On Mon, Nov 15, 2021 at 3:24 PM Krste Asanovic <krste@sifive.com> wrote:

| On Nov 15, 2021, at 3:13 PM, Guy Lemieux <guy.lemieux@gmail.com> wrote:

| On Mon, Nov 15, 2021 at 2:17 PM Bill Huffman <huffman@cadence.com> wrote:

| I'm glad this came up.  I certainly wouldn't want to try to make an implementation work for these cases.  😊

| I lean a bit toward #3, not so much because we might use the space as because I think we've called all the other similar
| corners of opcode space that don't make sense to implement "reserved."  Possibly that's because they might make sense someday
| and this won't.

| I think these encodings are qualitatively different from other nooks and crannies, since their availability is a function of the
| dynamic vtype setting.  So we can rationalize the departure from the normal practice of marking the state reserved.

| Ok, this makes the opcodes virtually useless for other instructions.

| Instead, shouldn't we be setting a bit similar to vill?  I realize vill is only set on illegal vset* instructions; in this case it
| would be a new bit which is only set on executing instructions that are incompatible with the current (but otherwise valid) vtype ?

| Guy

| There’s no benefit to setting vill versus just taking a trap in this case.

| Vill is there so we don’t have to add the first trap on a write of a particular data value, and also to provide a discovery mechanism.

| Krste


Re: RISC-V Vector Extension post-public review updates

Guy Lemieux
 


To determine the trap cause, without such a bit, software will have to examine many possible vtype settings that are unique for each particular instruction. The trap handler will be highly customized for each cpu implementation.

This could be done more easily in a handful of logic gates, without a vastly different flow in the trap handler (which will already know to check vill).

Guy


On Mon, Nov 15, 2021 at 3:24 PM Krste Asanovic <krste@...> wrote:


On Nov 15, 2021, at 3:13 PM, Guy Lemieux <guy.lemieux@...> wrote:

On Mon, Nov 15, 2021 at 2:17 PM Bill Huffman <huffman@...> wrote:
I'm glad this came up.  I certainly wouldn't want to try to make an implementation work for these cases.  😊

I lean a bit toward #3, not so much because we might use the space as because I think we've called all the other similar corners of opcode space that don't make sense to implement "reserved."  Possibly that's because they might make sense someday and this won't.

I think these encodings are qualitatively different from other nooks and crannies, since their availability is a function of the dynamic vtype setting.  So we can rationalize the departure from the normal practice of marking the state reserved.

Ok, this makes the opcodes virtually useless for other instructions.

Instead, shouldn't we be setting a bit similar to vill?  I realize vill is only set on illegal vset* instructions; in this case it would be a new bit which is only set on executing instructions that are incompatible with the current (but otherwise valid) vtype ?

Guy

There’s no benefit to setting vill versus just taking a trap in this case.

Vill is there so we don’t have to add the first trap on a write of a particular data value, and also to provide a discovery mechanism.

Krste



Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
 

I guess simpler examples are anytime you use v0 as mask and a data
source.

These aren't useful use-cases, so existing software shouldn't have
been doing this (except test code).

Krste

On Mon, 15 Nov 2021 15:45:14 -0800, Nick Knight <nick.knight@sifive.com> said:
| On Mon, Nov 15, 2021 at 3:40 PM Krste Asanovic <krste@sifive.com> wrote:
| I'm not sure if C intrinsics can generate this case,

| https://godbolt.org/z/qj6WzYc76
|  

| but there are
| other cases where dynamic value settings can result in illegal
| instruction traps, so the result would be the same that
| implementations will either trap or do something non-conforming.

| Krste

|||||| On Mon, 15 Nov 2021 15:28:25 -0800, Craig Topper <craig.topper@sifive.com> said:

| | On Nov 15, 2021, at 3:24 PM, Krste Asanovic <krste@sifive.com> wrote:
| |         On Nov 15, 2021, at 3:13 PM, Guy Lemieux <guy.lemieux@gmail.com> wrote:

| |             On Mon, Nov 15, 2021 at 2:17 PM Bill Huffman <huffman@cadence.com> wrote:

| |                 I'm glad this came up.  I certainly wouldn't want to try to make an implementation work for these cases.  😊

| |                 I lean a bit toward #3, not so much because we might use the space as because I think we've called all the other similar
| |                 corners of opcode space that don't make sense to implement "reserved."  Possibly that's because they might make sense
| someday
| |                 and this won't.

| |             I think these encodings are qualitatively different from other nooks and crannies, since their availability is a function of
| the
| |             dynamic vtype setting.  So we can rationalize the departure from the normal practice of marking the state reserved.

| |         Ok, this makes the opcodes virtually useless for other instructions.

| |         Instead, shouldn't we be setting a bit similar to vill?  I realize vill is only set on illegal vset* instructions; in this case
| it
| |         would be a new bit which is only set on executing instructions that are incompatible with the current (but otherwise valid) vtype
| ?

| |         Guy

| |     There’s no benefit to setting vill versus just taking a trap in this case.

| |     Vill is there so we don’t have to add the first trap on a write of a particular data value, and also to provide a discovery
| mechanism.

| |     Krste

| | Is it possible to generate one of these cases from C with crazy uses of vreinterpret and vget/vset intrinsics? What should the compiler
| do for
| | such code?

| | Craig

| |     

|


Re: RISC-V Vector Extension post-public review updates

Nick Knight
 

On Mon, Nov 15, 2021 at 3:40 PM Krste Asanovic <krste@...> wrote:

I'm not sure if C intrinsics can generate this case,

 
but there are
other cases where dynamic value settings can result in illegal
instruction traps, so the result would be the same that
implementations will either trap or do something non-conforming.

Krste

>>>>> On Mon, 15 Nov 2021 15:28:25 -0800, Craig Topper <craig.topper@...> said:

| On Nov 15, 2021, at 3:24 PM, Krste Asanovic <krste@...> wrote:
|         On Nov 15, 2021, at 3:13 PM, Guy Lemieux <guy.lemieux@...> wrote:

|             On Mon, Nov 15, 2021 at 2:17 PM Bill Huffman <huffman@...> wrote:

|                 I'm glad this came up.  I certainly wouldn't want to try to make an implementation work for these cases.  😊

|                 I lean a bit toward #3, not so much because we might use the space as because I think we've called all the other similar
|                 corners of opcode space that don't make sense to implement "reserved."  Possibly that's because they might make sense someday
|                 and this won't.

|             I think these encodings are qualitatively different from other nooks and crannies, since their availability is a function of the
|             dynamic vtype setting.  So we can rationalize the departure from the normal practice of marking the state reserved.

|         Ok, this makes the opcodes virtually useless for other instructions.

|         Instead, shouldn't we be setting a bit similar to vill?  I realize vill is only set on illegal vset* instructions; in this case it
|         would be a new bit which is only set on executing instructions that are incompatible with the current (but otherwise valid) vtype ?

|         Guy

|     There’s no benefit to setting vill versus just taking a trap in this case.

|     Vill is there so we don’t have to add the first trap on a write of a particular data value, and also to provide a discovery mechanism.

|     Krste

| Is it possible to generate one of these cases from C with crazy uses of vreinterpret and vget/vset intrinsics? What should the compiler do for
| such code?

| Craig

|     






Re: RISC-V Vector Extension post-public review updates

Krste Asanovic
 

I'm not sure if C intrinsics can generate this case, but there are
other cases where dynamic value settings can result in illegal
instruction traps, so the result would be the same that
implementations will either trap or do something non-conforming.

Krste

On Mon, 15 Nov 2021 15:28:25 -0800, Craig Topper <craig.topper@sifive.com> said:
| On Nov 15, 2021, at 3:24 PM, Krste Asanovic <krste@sifive.com> wrote:
| On Nov 15, 2021, at 3:13 PM, Guy Lemieux <guy.lemieux@gmail.com> wrote:

| On Mon, Nov 15, 2021 at 2:17 PM Bill Huffman <huffman@cadence.com> wrote:

| I'm glad this came up. I certainly wouldn't want to try to make an implementation work for these cases. 😊

| I lean a bit toward #3, not so much because we might use the space as because I think we've called all the other similar
| corners of opcode space that don't make sense to implement "reserved." Possibly that's because they might make sense someday
| and this won't.

| I think these encodings are qualitatively different from other nooks and crannies, since their availability is a function of the
| dynamic vtype setting. So we can rationalize the departure from the normal practice of marking the state reserved.

| Ok, this makes the opcodes virtually useless for other instructions.

| Instead, shouldn't we be setting a bit similar to vill? I realize vill is only set on illegal vset* instructions; in this case it
| would be a new bit which is only set on executing instructions that are incompatible with the current (but otherwise valid) vtype ?

| Guy

| There’s no benefit to setting vill versus just taking a trap in this case.

| Vill is there so we don’t have to add the first trap on a write of a particular data value, and also to provide a discovery mechanism.

| Krste

| Is it possible to generate one of these cases from C with crazy uses of vreinterpret and vget/vset intrinsics? What should the compiler do for
| such code?

| Craig

|

21 - 40 of 761