Date   

Re: Decompress Instruction

lidawei14@...
 

Thanks Krste, that makes sense but the logic is not that straight forward, people usually needs "decompress" when they are using "compress", maybe we can add some comment on this at the "vcompress" section?


Decompress Instruction

Krste Asanovic
 

If the decompress is the inverse of compress, then there will be a
packed vector holding the non-zero elements and a bit mask indicating
which elements should receive the elements after unpacking

7 6 5 4 3 2 1 0 # vid

e d c b a # packed vector of 5 elements
1 0 0 1 1 1 0 1 # mask vector of 8 elements

e 0 0 d c b 0 a # result of decompress

This can be synthesized by using iota and masked vrgather

1 0 0 1 1 1 0 1 # mask vector
4 4 4 3 2 1 1 0 # viota.m
0 0 0 0 0 0 0 0 # zero result register
e 0 0 d c b 0 a # vrgather using viota.m under mask

code is

# v0 holds mask
# v1 holds packed data
# v11 holds decompressed data
viota.m v10, v0 # Calc iota from mask in v0
vmv.v.i v11, 0 # Clear destination
vrgather.vv v11, v1, v10, v0.t # Expand into destination

So decompress is quite fast already.

The reason there is a compress instruction is that it cannot be
synthesized from other instructions in the same way. You could
provide a "compress bit mask into packed indices" instruction, then do
an vrgather, but that is not much simpler than just doing the
compress.

Krste

On Thu, 03 Sep 2020 00:12:51 -0700, "lidawei14 via lists.riscv.org" <lidawei14=huawei.com@...> said:
| Hi all,
| For common AI workloads such as DNNs, data communications between network layers introduce huge pressure
| on capacity and bandwidth of the memory hierarchy.
| For instance, dynamic large activation or feature map data needs to be buffered and communicated across
| multiple layers, which often appears to be sparse (e.g. ReLU).
| People use bit vectors to "compress" the data buffered and "decompress" for the following layer
| computations.

| Here we can see from the spec that "vcompress" has already been included, how about "vdecompress"?

| Thanks,
| Dawei
|


Decompress Instruction

lidawei14@...
 

Hi all,

For common AI workloads such as DNNs, data communications between network layers introduce huge pressure on capacity and bandwidth of the memory hierarchy. 
For instance, dynamic large activation or feature map data needs to be buffered and communicated across multiple layers, which often appears to be sparse (e.g. ReLU).
People use bit vectors to "compress" the data buffered and "decompress" for the following layer computations.

Here we can see from the spec that "vcompress" has already been included,  how about "vdecompress"?

Thanks,
Dawei


Re: EEW and non-indexed loads/stores

Krste Asanovic
 

Correct,
Krste

On Sep 2, 2020, at 11:10 PM, Roger Ferrer Ibanez <roger.ferrer@...> wrote:

Hi all,

I understand the EEW, as explicitly encoded in the load/store instructions applies to the vector of indices for the indexed loads and stores. For instance we can load a vector "SEW=8,LMUL=1" using a vector of indices of "SEW=64,LMUL=8" by making sure vtype has "SEW=8,LMUL=1" and using v{l,s}xei64.

I'd like to confirm I'm understanding correctly the EEW for unit-stride and strided loads and stores.

Say that vtype is such that SEW=16,LMUL=1 and we execute a v{l,s}{,s}e32.v. Now the EEW of the data and address operands is EEW=32 (as encoded in the instruction) so EMUL=(EEW/SEW)*LMUL=(32/16)*1=2. So in this case we're loading/storing a vector SEW=32,LMUL=2.

Is my interpretation correct?

If it is, I assume this is useful in sequences such as the following one

# SEW=16,LMUL=1
vle16.v v1, (t0) # Load a vector of sew=16,lmul=1
vle32.v v2, (t1) # Load a vector of sew=32,lmul=2, cool, no need to change vtype
vwadd.wv v4, v2, v1 # v4_v5(32)[:] ← v2_v3(32)[:] + sign-extend(v1(16)[:])
vse32.v v4, (t1) # Store a vector of sew=32,lmul=2, no need to change vtype either

Thank you,

--
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


http://bsc.es/disclaimer



EEW and non-indexed loads/stores

Roger Ferrer Ibanez
 

Hi all,

I understand the EEW, as explicitly encoded in the load/store instructions applies to the vector of indices for the indexed loads and stores. For instance we can load a vector "SEW=8,LMUL=1" using a vector of indices of "SEW=64,LMUL=8" by making sure vtype has "SEW=8,LMUL=1" and using v{l,s}xei64.

I'd like to confirm I'm understanding correctly the EEW for unit-stride and strided loads and stores.

Say that vtype is such that SEW=16,LMUL=1 and we execute a v{l,s}{,s}e32.v. Now the EEW of the data and address operands is EEW=32 (as encoded in the instruction) so EMUL=(EEW/SEW)*LMUL=(32/16)*1=2. So in this case we're loading/storing a vector SEW=32,LMUL=2.

Is my interpretation correct?

If it is, I assume this is useful in sequences such as the following one

# SEW=16,LMUL=1
vle16.v v1, (t0)    # Load a vector of sew=16,lmul=1
vle32.v v2, (t1)    # Load a vector of sew=32,lmul=2, cool, no need to change vtype
vwadd.wv v4, v2, v1 # v4_v5(32)[:] ← v2_v3(32)[:] + sign-extend(v1(16)[:])
vse32.v v4, (t1)    # Store a vector of sew=32,lmul=2, no need to change vtype either

Thank you,

--
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


http://bsc.es/disclaimer


Re: Signed v Unsigned Immediate: vsaddu.vi

Nick Knight
 

Hi Cohen,

Thanks for your careful reading.

Hopefully this edit clarifies some of the ambiguity: https://github.com/riscv/riscv-v-spec/pull/565

Best,
Nick Knight


On Wed, Sep 2, 2020 at 2:44 PM Andrew Waterman <andrew@...> wrote:
The non-normative text you quoted should be edited to delete the words “it is signed”.

The immediate is sign-extended, but then is treated as an unsigned value. So the operation doesn’t differ based on the argument type.

(This sign-extended-but-unsigned-immediate pattern is also exists for e.g. sltiu in the base ISA and vmsgtu.vi in the vector extension.)

On Wed, Sep 2, 2020 at 1:58 PM CDS <cohen.steed@...> wrote:

From chapter 11, section 1 (#3):



The 5-bit immediate is unsigned when either providing a register index in vrgather or a count for shift, clip, or slide. In all other cases



it is signed and sign extended to SEW bits, even for bitwise and unsigned instructions, notably compare and add.






From chapter 13, section 1:
Saturating forms of integer add and subtract are provided, for both signed and unsigned integers. If the result would overflow the destination, the result is replaced with the closest representable value, and the vxsat bit is set.

This results in a conundrum:
operation  SEW   RS1   RS2
vsaddu.vv  8  0x0ff   0x01
vsaddu.vi  8  0x01f   0x01

These two operations now provide a difference of result.
Taking the maximum unsigned integer value, adding one, causes saturation. The result value for the vector-vector operation would be 0xff and the VXSAT bit would be set. This shouldn't be a surprise.

However, the immediate form is more difficult. The immediate value is sign-extended to SEW size and treated as a signed value. This means the arithmetic is now (-1) + 1 = 0. This does not create a saturation (a value outside expected return parameters). The result value from the vector-immediate operation would be 0x1f and the VXSAT bit would be clear.

This is from the specification, as written, in a strict sense.

From a use-case sense, what is trying to be accomplished, here? Two counter perspectives:
1 - from a use-case perspective, why would a programmer or compiler specifically pick an unsigned operation, only to operate on values using a signed-immediate in a signed format? I'm curious that this case is.
2 - from an architecture/implementation perspective, this is the first time that an engine will have to operate on an instruction differently based on the *source* of the operand. That is, more narrowly, the arithmetic engines are given an operation encoding (usually an "onto" mapping from the opcode space) and operands, but does not care where the operations came from. In other words, the vector engine itself would receive a full bit set in RS1 for both cases, above, for a saturating unsigned (sorta) add. However, the outcome is required to be different?

I would imagine others have run into this situation, and I'd like to know both the intent of having a signed-immediate value for this unsigned operation, as well as the applicability of section 11.1 to this instruction.












Re: Signed v Unsigned Immediate: vsaddu.vi

Andrew Waterman
 

The non-normative text you quoted should be edited to delete the words “it is signed”.

The immediate is sign-extended, but then is treated as an unsigned value. So the operation doesn’t differ based on the argument type.

(This sign-extended-but-unsigned-immediate pattern is also exists for e.g. sltiu in the base ISA and vmsgtu.vi in the vector extension.)

On Wed, Sep 2, 2020 at 1:58 PM CDS <cohen.steed@...> wrote:

From chapter 11, section 1 (#3):



The 5-bit immediate is unsigned when either providing a register index in vrgather or a count for shift, clip, or slide. In all other cases



it is signed and sign extended to SEW bits, even for bitwise and unsigned instructions, notably compare and add.






From chapter 13, section 1:
Saturating forms of integer add and subtract are provided, for both signed and unsigned integers. If the result would overflow the destination, the result is replaced with the closest representable value, and the vxsat bit is set.

This results in a conundrum:
operation  SEW   RS1   RS2
vsaddu.vv  8  0x0ff   0x01
vsaddu.vi  8  0x01f   0x01

These two operations now provide a difference of result.
Taking the maximum unsigned integer value, adding one, causes saturation. The result value for the vector-vector operation would be 0xff and the VXSAT bit would be set. This shouldn't be a surprise.

However, the immediate form is more difficult. The immediate value is sign-extended to SEW size and treated as a signed value. This means the arithmetic is now (-1) + 1 = 0. This does not create a saturation (a value outside expected return parameters). The result value from the vector-immediate operation would be 0x1f and the VXSAT bit would be clear.

This is from the specification, as written, in a strict sense.

From a use-case sense, what is trying to be accomplished, here? Two counter perspectives:
1 - from a use-case perspective, why would a programmer or compiler specifically pick an unsigned operation, only to operate on values using a signed-immediate in a signed format? I'm curious that this case is.
2 - from an architecture/implementation perspective, this is the first time that an engine will have to operate on an instruction differently based on the *source* of the operand. That is, more narrowly, the arithmetic engines are given an operation encoding (usually an "onto" mapping from the opcode space) and operands, but does not care where the operations came from. In other words, the vector engine itself would receive a full bit set in RS1 for both cases, above, for a saturating unsigned (sorta) add. However, the outcome is required to be different?

I would imagine others have run into this situation, and I'd like to know both the intent of having a signed-immediate value for this unsigned operation, as well as the applicability of section 11.1 to this instruction.












Signed v Unsigned Immediate: vsaddu.vi

CDS <cohen.steed@...>
 

From chapter 11, section 1 (#3):

The 5-bit immediate is unsigned when either providing a register index in vrgather or a count for shift, clip, or slide. In all other cases

it is signed and sign extended to SEW bits, even for bitwise and unsigned instructions, notably compare and add.


From chapter 13, section 1:
Saturating forms of integer add and subtract are provided, for both signed and unsigned integers. If the result would overflow the destination, the result is replaced with the closest representable value, and the vxsat bit is set.

This results in a conundrum:
operation  SEW   RS1   RS2
vsaddu.vv  8  0x0ff   0x01
vsaddu.vi  8  0x01f   0x01

These two operations now provide a difference of result.
Taking the maximum unsigned integer value, adding one, causes saturation. The result value for the vector-vector operation would be 0xff and the VXSAT bit would be set. This shouldn't be a surprise.

However, the immediate form is more difficult. The immediate value is sign-extended to SEW size and treated as a signed value. This means the arithmetic is now (-1) + 1 = 0. This does not create a saturation (a value outside expected return parameters). The result value from the vector-immediate operation would be 0x1f and the VXSAT bit would be clear.

This is from the specification, as written, in a strict sense.

From a use-case sense, what is trying to be accomplished, here? Two counter perspectives:
1 - from a use-case perspective, why would a programmer or compiler specifically pick an unsigned operation, only to operate on values using a signed-immediate in a signed format? I'm curious that this case is.
2 - from an architecture/implementation perspective, this is the first time that an engine will have to operate on an instruction differently based on the *source* of the operand. That is, more narrowly, the arithmetic engines are given an operation encoding (usually an "onto" mapping from the opcode space) and operands, but does not care where the operations came from. In other words, the vector engine itself would receive a full bit set in RS1 for both cases, above, for a saturating unsigned (sorta) add. However, the outcome is required to be different?

I would imagine others have run into this situation, and I'd like to know both the intent of having a signed-immediate value for this unsigned operation, as well as the applicability of section 11.1 to this instruction.


Cancelling Vector TG meeting today

Krste Asanovic
 

Sorry for late notice, but I have to cancel the vector tech meeting today,

Krste


Re: GNU toolchain with RVV intrinsic support

David Horner
 

Thank you for the clarification. 
Excellent.

On Mon, Aug 24, 2020, 17:35 Bruce Hoult, <bruce@...> wrote:
On Tue, Aug 25, 2020 at 5:34 AM David Horner <ds2horner@...> wrote:
Thank you very much for this advancement.
I have two concerns, in the body is a response.
.

On 2020-08-21 9:34 a.m., Kito Cheng wrote:
I am pleased to announce that our/SiFive's RVV intrinsic enabled GCC are open-sourced now.

We put the sources on riscv's github, and the RVV intrinsics have been integrated in the riscv-gnu-toolchain, so you can build the RVV intrinsic enabled GNU toolchain as usual.

 $ git clone git@...:riscv/riscv-gnu-toolchain.git -b rvv-intrinsic
 $ <path-to-riscv-gnu-toolchain>/configure --with-arch=rv64gcv_zfh --prefix=<INSTALL-PATH>
 $ make newlib build-qemu
 $ cat rvv_vadd.c
>
> #include <riscv_vector.h>
> #include <stdio.h>
>
> void vec_add_rvv
Shouldn't this be vec_add32_rvv ? It is not a generalized vector add.

The user can call functions anything they want. The example might be better if this was clear by calling it foo() or demo_vector_add() or something.
 
(int *a, int *b, int *c, size_t n) {
>   size_t vl;
>   vint32m2_t va, vb, vc;
>   for (;vl = vsetvl_e32m2 (n);n -= vl) {
>     vb = vle32_v_i32m2 (b);
>     vc = vle32_v_i32m2 (c);
>     va = vadd_vv_i32m2 (vb, vc);
>     vse32_v_i32m2 (a, va);
>     a += vl;
The vector pointer should be advanced by vl * 32.

The variable "a" in an "int *" pointer. When you add an integer to it C automatically scales the integer (vl) by sizeof(int).


Re: GNU toolchain with RVV intrinsic support

Bruce Hoult
 

On Tue, Aug 25, 2020 at 5:34 AM David Horner <ds2horner@...> wrote:
Thank you very much for this advancement.
I have two concerns, in the body is a response.
.

On 2020-08-21 9:34 a.m., Kito Cheng wrote:
I am pleased to announce that our/SiFive's RVV intrinsic enabled GCC are open-sourced now.

We put the sources on riscv's github, and the RVV intrinsics have been integrated in the riscv-gnu-toolchain, so you can build the RVV intrinsic enabled GNU toolchain as usual.

 $ git clone git@...:riscv/riscv-gnu-toolchain.git -b rvv-intrinsic
 $ <path-to-riscv-gnu-toolchain>/configure --with-arch=rv64gcv_zfh --prefix=<INSTALL-PATH>
 $ make newlib build-qemu
 $ cat rvv_vadd.c
>
> #include <riscv_vector.h>
> #include <stdio.h>
>
> void vec_add_rvv
Shouldn't this be vec_add32_rvv ? It is not a generalized vector add.

The user can call functions anything they want. The example might be better if this was clear by calling it foo() or demo_vector_add() or something.
 
(int *a, int *b, int *c, size_t n) {
>   size_t vl;
>   vint32m2_t va, vb, vc;
>   for (;vl = vsetvl_e32m2 (n);n -= vl) {
>     vb = vle32_v_i32m2 (b);
>     vc = vle32_v_i32m2 (c);
>     va = vadd_vv_i32m2 (vb, vc);
>     vse32_v_i32m2 (a, va);
>     a += vl;
The vector pointer should be advanced by vl * 32.

The variable "a" in an "int *" pointer. When you add an integer to it C automatically scales the integer (vl) by sizeof(int).


Re: GNU toolchain with RVV intrinsic support

David Horner
 

Thank you very much for this advancement.
I have two concerns, in the body is a response.
.

On 2020-08-21 9:34 a.m., Kito Cheng wrote:
I am pleased to announce that our/SiFive's RVV intrinsic enabled GCC are open-sourced now.

We put the sources on riscv's github, and the RVV intrinsics have been integrated in the riscv-gnu-toolchain, so you can build the RVV intrinsic enabled GNU toolchain as usual.

 $ git clone git@...:riscv/riscv-gnu-toolchain.git -b rvv-intrinsic
 $ <path-to-riscv-gnu-toolchain>/configure --with-arch=rv64gcv_zfh --prefix=<INSTALL-PATH>
 $ make newlib build-qemu
 $ cat rvv_vadd.c
>
> #include <riscv_vector.h>
> #include <stdio.h>
>
> void vec_add_rvv
Shouldn't this be vec_add32_rvv ? It is not a generalized vector add.
(int *a, int *b, int *c, size_t n) {
>   size_t vl;
>   vint32m2_t va, vb, vc;
>   for (;vl = vsetvl_e32m2 (n);n -= vl) {
>     vb = vle32_v_i32m2 (b);
>     vc = vle32_v_i32m2 (c);
>     va = vadd_vv_i32m2 (vb, vc);
>     vse32_v_i32m2 (a, va);
>     a += vl;
The vector pointer should be advanced by vl * 32.
(I originally thought the vl = vsetvl may have done the by 32 scaling and that n was in bytes,
but I have now convinced myself that the problem is likely the pointer advance,
 and the VLEN is  at least 256 so only one pass of the loop for the below test case.)
>     b += vl;
>     c += vl;
>   }
> }
>
> int x[10] = {1,2,3,4,5,6,7,8,9,0};
> int y[10] = {0,9,8,7,6,5,4,3,2,1};
> int z[10];
>
> int main()
> {
>   int i;
>   vec_add_rvv(z, x, y, 10);

>   for (i=0; i<10; i++)
>     printf ("%d ", z[i]);
>   printf("\n");
>   return 0;
> }

 $ riscv64-unknown-elf-gcc rvv_vadd.c -O2
 $ qemu-riscv64 -cpu rv64,x-v=true,vlen=256,elen=64,vext_spec=v1.0 a.out

It is verified with our internal testsuite and several internal projects, however this project is still a work in progress, and we intend to improve the work continually. Feedback and bug reports are welcome, as well as contributions and pull-requests.

Current status:
- Implement ~95% RVV intrinsic function listed in the intrinsic spec (https://github.com/riscv/rvv-intrinsic-doc)
- FP16 supported for both vector and scalar.
  - fp16 uses __fp16 temporally, this might change in future.
- Fractional LMUL is not implemented yet.
- RV32 is not well supported for scalar-vector operations with SEW=64.
- Function call with vector type is not well supported yet, arguments will be passed/returned in memory in current implementation.
- *NO* auto vectorization support.


Re: V extension groups analogue to the standard groups

mark
 

Just a reminder that we will differentiate between branding (i.e. what we trademark and what members can advertise) and internal use (like uname in linux vs. splash screen, etc.).

the proposed policy is under review in the policies/proposed folder

On Sun, Aug 23, 2020 at 3:26 PM Simon Davidmann Imperas <simond@...> wrote:
thanks - I am OK with whichever you choose.

On Sat, Aug 22, 2020 at 12:30 AM Andrew Waterman <andrew@...> wrote:


On Fri, Aug 21, 2020 at 2:43 PM Simon Davidmann <simond@...> wrote:
A question to clarify. You state:
      RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.  

This means if I understand you that we need to add F to get F hardware in the vector unit - so RV32IV means V with no F hardware, and RV32IFV includes F hardware.

So for consistency...

What does RV32IV means for M hardware multiply - do I need to RV32IMV to get scalar and vector hardware multiply?

I don’t believe the spec explicitly addresses this question, but I agree it makes sense. Alternatively, V could require M, since it doesn’t make much sense to pay for a vector unit but be too stingy to pay for a multiplier. But that might be less consistent. (My recommendation is that RV32IV continue to mean “no multiplier”, even though it’s a silly configuration.)


RV32IV means no F and no M hardware? - so I need to explicitly include the extensions I need as V assumes nothing but I?

My recommendation is to clarify in the spec that RV32IV is a valid config with no FPU in the vector unit, and RV32IFV is also a valid config with an FPU in both scalar and vector.


Or is something assumed for M?

If we choose to define that V implies M, RV32IV and RV32IMV would be synonyms.


thanks

On Thu, Aug 20, 2020 at 8:48 PM Andrew Waterman <andrew@...> wrote:
Quad-widening ops have been moved to a separate extension, Zvqmac.

I believe the intent is that the capital-V V extension supports the same FP datatypes as the scalar ISA, so e.g., RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.

I’m surprised all those hashtags made it past the spam filter!

On Thu, Aug 20, 2020 at 11:42 AM Strauch, Tobias (HENSOLDT Cyber GmbH) <tobias.strauch@...> wrote:

Apologies if this is old stuff already dismissed. But I give it a try anyway.


Wouldn't it make sense to separate more complex vector instructions from more trivial ones? Already with the very first base release ? Vector instructions can also be helpful in small devices #IOT #Edge #GAP8 #RISCY without the need to fully support floating point instructions or without the need for a quad multiply.


The suggestion would be to basically group vector extensions analogue to the standard instructions (I, M, F, D, Q, …), instead of having an already complex base and then subtract or re-define subsets of instructions again ?


Wouldn't that be in-line with the RISC-V philosophy of modularity and simplicity ? The beauty would be that you have a non-vector and a vector group version.


Possible nomenclature based on order:


M: Standard Multiply Divide Instructions (MUL, ...)

V: Very Basic Vector Instructions (VSETVL, ...)

MV: Standard Multiply Divide Instructions and Very Basic Vector Instructions (MUL, VSETVL, ...

VM: Standard Multiply Divide Instructions, Very Basic Vector Instructions and Vector Integer Multiply\Divide Instructions (MUL, VSETVL, VMUL, ...)


F, D, Q analogue to M as suggested.


The V version will not be a 1:1 match with the standard version and will cover additional aspects. But it can be argued, that when you implement the V version (of M, F, D, Q, ...), then you most likely will have the relevant standard counterparts implemented as well anyway.


Kind Regards, Tobias



--
====================================================================
The information contained in this electronic mail message and any attachments hereto
is privileged and confidential information intended only for the use of the individual or 
entity named above or their designee.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.  If you have received this communication in error 
please immediately notify us by return  message or by telephone and delete the 
original message from your mail system.  Thank you.
====================================================================


--
====================================================================
The information contained in this electronic mail message and any attachments hereto
is privileged and confidential information intended only for the use of the individual or 
entity named above or their designee.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.  If you have received this communication in error 
please immediately notify us by return  message or by telephone and delete the 
original message from your mail system.  Thank you.
====================================================================


Re: V extension groups analogue to the standard groups

Simon Davidmann Imperas
 

thanks - I am OK with whichever you choose.


On Sat, Aug 22, 2020 at 12:30 AM Andrew Waterman <andrew@...> wrote:


On Fri, Aug 21, 2020 at 2:43 PM Simon Davidmann <simond@...> wrote:
A question to clarify. You state:
      RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.  

This means if I understand you that we need to add F to get F hardware in the vector unit - so RV32IV means V with no F hardware, and RV32IFV includes F hardware.

So for consistency...

What does RV32IV means for M hardware multiply - do I need to RV32IMV to get scalar and vector hardware multiply?

I don’t believe the spec explicitly addresses this question, but I agree it makes sense. Alternatively, V could require M, since it doesn’t make much sense to pay for a vector unit but be too stingy to pay for a multiplier. But that might be less consistent. (My recommendation is that RV32IV continue to mean “no multiplier”, even though it’s a silly configuration.)


RV32IV means no F and no M hardware? - so I need to explicitly include the extensions I need as V assumes nothing but I?

My recommendation is to clarify in the spec that RV32IV is a valid config with no FPU in the vector unit, and RV32IFV is also a valid config with an FPU in both scalar and vector.


Or is something assumed for M?

If we choose to define that V implies M, RV32IV and RV32IMV would be synonyms.


thanks

On Thu, Aug 20, 2020 at 8:48 PM Andrew Waterman <andrew@...> wrote:
Quad-widening ops have been moved to a separate extension, Zvqmac.

I believe the intent is that the capital-V V extension supports the same FP datatypes as the scalar ISA, so e.g., RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.

I’m surprised all those hashtags made it past the spam filter!

On Thu, Aug 20, 2020 at 11:42 AM Strauch, Tobias (HENSOLDT Cyber GmbH) <tobias.strauch@...> wrote:

Apologies if this is old stuff already dismissed. But I give it a try anyway.


Wouldn't it make sense to separate more complex vector instructions from more trivial ones? Already with the very first base release ? Vector instructions can also be helpful in small devices #IOT #Edge #GAP8 #RISCY without the need to fully support floating point instructions or without the need for a quad multiply.


The suggestion would be to basically group vector extensions analogue to the standard instructions (I, M, F, D, Q, …), instead of having an already complex base and then subtract or re-define subsets of instructions again ?


Wouldn't that be in-line with the RISC-V philosophy of modularity and simplicity ? The beauty would be that you have a non-vector and a vector group version.


Possible nomenclature based on order:


M: Standard Multiply Divide Instructions (MUL, ...)

V: Very Basic Vector Instructions (VSETVL, ...)

MV: Standard Multiply Divide Instructions and Very Basic Vector Instructions (MUL, VSETVL, ...

VM: Standard Multiply Divide Instructions, Very Basic Vector Instructions and Vector Integer Multiply\Divide Instructions (MUL, VSETVL, VMUL, ...)


F, D, Q analogue to M as suggested.


The V version will not be a 1:1 match with the standard version and will cover additional aspects. But it can be argued, that when you implement the V version (of M, F, D, Q, ...), then you most likely will have the relevant standard counterparts implemented as well anyway.


Kind Regards, Tobias



--
====================================================================
The information contained in this electronic mail message and any attachments hereto
is privileged and confidential information intended only for the use of the individual or 
entity named above or their designee.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.  If you have received this communication in error 
please immediately notify us by return  message or by telephone and delete the 
original message from your mail system.  Thank you.
====================================================================


--
====================================================================
The information contained in this electronic mail message and any attachments hereto
is privileged and confidential information intended only for the use of the individual or 
entity named above or their designee.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.  If you have received this communication in error 
please immediately notify us by return  message or by telephone and delete the 
original message from your mail system.  Thank you.
====================================================================


Re: V extension groups analogue to the standard groups

Krste Asanovic
 

Anybody is free to use any subset of supported instructions and
element widths/types. The Z names can be extended down to individual
instructions/width if necessary.

However, we have to guide the software ecosystem where to spend the
available finite effort. So we choose and name some common
combinations to inform software/tool providers what to support, and to
enable compliance testing of those combinations.

We can always add new Z names later for subsets that prove popular.
This can happen after the instruction spec itself is ratified, in a
much lighter-weight process.

Krste


On Sat, 22 Aug 2020 00:32:56 -0700, "Allen Baum" <allen.baum@...> said:
| Works for me.
| -Allen

| On Aug 21, 2020, at 11:41 PM, Andrew Waterman <andrew@...> wrote:

| It's OK for esoteric combinations to require long ISA strings, I think.

|


Re: V extension groups analogue to the standard groups

Allen Baum
 

Works for me.

-Allen

On Aug 21, 2020, at 11:41 PM, Andrew Waterman <andrew@...> wrote:

  It's OK for esoteric combinations to require long ISA strings, I think.


Re: V extension groups analogue to the standard groups

Andrew Waterman
 



On Fri, Aug 21, 2020 at 11:32 PM Allen Baum <allen.baum@...> wrote:
For layout reasons, I can easily imagine a vector unit that has multiply HW for vector registers, but can't easily use them to implement scalar multiply/divide. Whether someone would ever want to implement a system that implements vector multiply/divide but not scalar multiply/divide is, and should be, a completely separate issue; I see no reason why they need to be tied together. IF there are no profiles that have vector mujl without scalar mul, then no one will be implementing that configuration and this is a moot point - but no harm to allow it. There are probably hundreds of configurations that won't be covered by profiles, and we can't obsess about them either. (hundreds are likely a very, very conservative estimate)

Under the scheme I'm promulgating, it's true that you couldn't describe your hypothetical machine as implementing capital-letter "V".  Perhaps it could be an RV32I_Zvbase_Zvm machine or something?  It's OK for esoteric combinations to require long ISA strings, I think.


On Fri, Aug 21, 2020 at 4:30 PM Andrew Waterman <andrew@...> wrote:


On Fri, Aug 21, 2020 at 2:43 PM Simon Davidmann <simond@...> wrote:
A question to clarify. You state:
      RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.  

This means if I understand you that we need to add F to get F hardware in the vector unit - so RV32IV means V with no F hardware, and RV32IFV includes F hardware.

So for consistency...

What does RV32IV means for M hardware multiply - do I need to RV32IMV to get scalar and vector hardware multiply?

I don’t believe the spec explicitly addresses this question, but I agree it makes sense. Alternatively, V could require M, since it doesn’t make much sense to pay for a vector unit but be too stingy to pay for a multiplier. But that might be less consistent. (My recommendation is that RV32IV continue to mean “no multiplier”, even though it’s a silly configuration.)


RV32IV means no F and no M hardware? - so I need to explicitly include the extensions I need as V assumes nothing but I?

My recommendation is to clarify in the spec that RV32IV is a valid config with no FPU in the vector unit, and RV32IFV is also a valid config with an FPU in both scalar and vector.


Or is something assumed for M?

If we choose to define that V implies M, RV32IV and RV32IMV would be synonyms.


thanks

On Thu, Aug 20, 2020 at 8:48 PM Andrew Waterman <andrew@...> wrote:
Quad-widening ops have been moved to a separate extension, Zvqmac.

I believe the intent is that the capital-V V extension supports the same FP datatypes as the scalar ISA, so e.g., RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.

I’m surprised all those hashtags made it past the spam filter!

On Thu, Aug 20, 2020 at 11:42 AM Strauch, Tobias (HENSOLDT Cyber GmbH) <tobias.strauch@...> wrote:

Apologies if this is old stuff already dismissed. But I give it a try anyway.


Wouldn't it make sense to separate more complex vector instructions from more trivial ones? Already with the very first base release ? Vector instructions can also be helpful in small devices #IOT #Edge #GAP8 #RISCY without the need to fully support floating point instructions or without the need for a quad multiply.


The suggestion would be to basically group vector extensions analogue to the standard instructions (I, M, F, D, Q, …), instead of having an already complex base and then subtract or re-define subsets of instructions again ?


Wouldn't that be in-line with the RISC-V philosophy of modularity and simplicity ? The beauty would be that you have a non-vector and a vector group version.


Possible nomenclature based on order:


M: Standard Multiply Divide Instructions (MUL, ...)

V: Very Basic Vector Instructions (VSETVL, ...)

MV: Standard Multiply Divide Instructions and Very Basic Vector Instructions (MUL, VSETVL, ...

VM: Standard Multiply Divide Instructions, Very Basic Vector Instructions and Vector Integer Multiply\Divide Instructions (MUL, VSETVL, VMUL, ...)


F, D, Q analogue to M as suggested.


The V version will not be a 1:1 match with the standard version and will cover additional aspects. But it can be argued, that when you implement the V version (of M, F, D, Q, ...), then you most likely will have the relevant standard counterparts implemented as well anyway.


Kind Regards, Tobias



--
====================================================================
The information contained in this electronic mail message and any attachments hereto
is privileged and confidential information intended only for the use of the individual or 
entity named above or their designee.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.  If you have received this communication in error 
please immediately notify us by return  message or by telephone and delete the 
original message from your mail system.  Thank you.
====================================================================


Re: V extension groups analogue to the standard groups

Allen Baum
 

For layout reasons, I can easily imagine a vector unit that has multiply HW for vector registers, but can't easily use them to implement scalar multiply/divide. Whether someone would ever want to implement a system that implements vector multiply/divide but not scalar multiply/divide is, and should be, a completely separate issue; I see no reason why they need to be tied together. IF there are no profiles that have vector mujl without scalar mul, then no one will be implementing that configuration and this is a moot point - but no harm to allow it. There are probably hundreds of configurations that won't be covered by profiles, and we can't obsess about them either. (hundreds are likely a very, very conservative estimate)

On Fri, Aug 21, 2020 at 4:30 PM Andrew Waterman <andrew@...> wrote:


On Fri, Aug 21, 2020 at 2:43 PM Simon Davidmann <simond@...> wrote:
A question to clarify. You state:
      RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.  

This means if I understand you that we need to add F to get F hardware in the vector unit - so RV32IV means V with no F hardware, and RV32IFV includes F hardware.

So for consistency...

What does RV32IV means for M hardware multiply - do I need to RV32IMV to get scalar and vector hardware multiply?

I don’t believe the spec explicitly addresses this question, but I agree it makes sense. Alternatively, V could require M, since it doesn’t make much sense to pay for a vector unit but be too stingy to pay for a multiplier. But that might be less consistent. (My recommendation is that RV32IV continue to mean “no multiplier”, even though it’s a silly configuration.)


RV32IV means no F and no M hardware? - so I need to explicitly include the extensions I need as V assumes nothing but I?

My recommendation is to clarify in the spec that RV32IV is a valid config with no FPU in the vector unit, and RV32IFV is also a valid config with an FPU in both scalar and vector.


Or is something assumed for M?

If we choose to define that V implies M, RV32IV and RV32IMV would be synonyms.


thanks

On Thu, Aug 20, 2020 at 8:48 PM Andrew Waterman <andrew@...> wrote:
Quad-widening ops have been moved to a separate extension, Zvqmac.

I believe the intent is that the capital-V V extension supports the same FP datatypes as the scalar ISA, so e.g., RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.

I’m surprised all those hashtags made it past the spam filter!

On Thu, Aug 20, 2020 at 11:42 AM Strauch, Tobias (HENSOLDT Cyber GmbH) <tobias.strauch@...> wrote:

Apologies if this is old stuff already dismissed. But I give it a try anyway.


Wouldn't it make sense to separate more complex vector instructions from more trivial ones? Already with the very first base release ? Vector instructions can also be helpful in small devices #IOT #Edge #GAP8 #RISCY without the need to fully support floating point instructions or without the need for a quad multiply.


The suggestion would be to basically group vector extensions analogue to the standard instructions (I, M, F, D, Q, …), instead of having an already complex base and then subtract or re-define subsets of instructions again ?


Wouldn't that be in-line with the RISC-V philosophy of modularity and simplicity ? The beauty would be that you have a non-vector and a vector group version.


Possible nomenclature based on order:


M: Standard Multiply Divide Instructions (MUL, ...)

V: Very Basic Vector Instructions (VSETVL, ...)

MV: Standard Multiply Divide Instructions and Very Basic Vector Instructions (MUL, VSETVL, ...

VM: Standard Multiply Divide Instructions, Very Basic Vector Instructions and Vector Integer Multiply\Divide Instructions (MUL, VSETVL, VMUL, ...)


F, D, Q analogue to M as suggested.


The V version will not be a 1:1 match with the standard version and will cover additional aspects. But it can be argued, that when you implement the V version (of M, F, D, Q, ...), then you most likely will have the relevant standard counterparts implemented as well anyway.


Kind Regards, Tobias



--
====================================================================
The information contained in this electronic mail message and any attachments hereto
is privileged and confidential information intended only for the use of the individual or 
entity named above or their designee.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.  If you have received this communication in error 
please immediately notify us by return  message or by telephone and delete the 
original message from your mail system.  Thank you.
====================================================================


Re: V extension groups analogue to the standard groups

Colin Schmidt
 

We had a long discussion about how to name different portions of the vector ISA at one of the recent meetings.

You can see the issue here: https://github.com/riscv/riscv-v-spec/issues/550
And the meeting minutes here: https://github.com/riscv/riscv-v-spec/blob/master/minutes/20200807-V-minutes.txt

Thanks,
Colin

On Fri, Aug 21, 2020 at 5:49 PM Andrew Waterman <andrew@...> wrote:


On Fri, Aug 21, 2020 at 4:51 PM Guy Lemieux <glemieux@...> wrote:
I think a common embedded and FPGA scenarios  will be F on the scalar side but no F on the vector side. Adding F to V is nontrivial in area, particularly for FPGAs that lack FPUs, yet an integer-only V makes a lot of sense for pixel processing etc. F in the scalar is a nice-to-have for code size and to calculate scalar parameters, eg in OpenCV and OpenVX.

The current nomenclature assumptions don’t allow this, but I think that they should do so.

We definitely want to sanction configurations that have different datatype support on scalar and vector.  The current thinking is that the letter V means "whatever apps-profile processors want", just like what "G" means on the scalar side.  Perhaps the "vector-with-fewer-datatypes-than-scalar" case can be described as Zvbase instead of V?


Guy

On Fri, Aug 21, 2020 at 4:30 PM Andrew Waterman <andrew@...> wrote:


On Fri, Aug 21, 2020 at 2:43 PM Simon Davidmann <simond@...> wrote:
A question to clarify. You state:
      RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.  

This means if I understand you that we need to add F to get F hardware in the vector unit - so RV32IV means V with no F hardware, and RV32IFV includes F hardware.

So for consistency...

What does RV32IV means for M hardware multiply - do I need to RV32IMV to get scalar and vector hardware multiply?

I don’t believe the spec explicitly addresses this question, but I agree it makes sense. Alternatively, V could require M, since it doesn’t make much sense to pay for a vector unit but be too stingy to pay for a multiplier. But that might be less consistent. (My recommendation is that RV32IV continue to mean “no multiplier”, even though it’s a silly configuration.)


RV32IV means no F and no M hardware? - so I need to explicitly include the extensions I need as V assumes nothing but I?

My recommendation is to clarify in the spec that RV32IV is a valid config with no FPU in the vector unit, and RV32IFV is also a valid config with an FPU in both scalar and vector.


Or is something assumed for M?

If we choose to define that V implies M, RV32IV and RV32IMV would be synonyms.


thanks

On Thu, Aug 20, 2020 at 8:48 PM Andrew Waterman <andrew@...> wrote:
Quad-widening ops have been moved to a separate extension, Zvqmac.

I believe the intent is that the capital-V V extension supports the same FP datatypes as the scalar ISA, so e.g., RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.

I’m surprised all those hashtags made it past the spam filter!

On Thu, Aug 20, 2020 at 11:42 AM Strauch, Tobias (HENSOLDT Cyber GmbH) <tobias.strauch@...> wrote:
















Apologies if this is old stuff already dismissed. But I give it a try anyway.








Wouldn't it make sense to separate more complex vector instructions from more trivial ones? Already with the very first base release ? Vector instructions can also be helpful in small devices #IOT #Edge #GAP8

#RISCY without the need to fully support floating point instructions or without the need for a quad multiply.








The suggestion would be to basically group vector extensions analogue to the standard instructions (I, M, F, D, Q, …), instead of having an already complex base and then subtract or re-define subsets of instructions

again ?








Wouldn't that be in-line with the RISC-V philosophy of modularity and simplicity ? The beauty would be that you have a non-vector and a vector group version.








Possible nomenclature based on order:








M: Standard Multiply Divide Instructions (MUL, ...)



V: Very Basic Vector Instructions (VSETVL, ...)



MV: Standard Multiply Divide Instructions and Very Basic Vector Instructions (MUL, VSETVL, ...



VM: Standard Multiply Divide Instructions, Very Basic Vector Instructions and Vector Integer Multiply\Divide Instructions (MUL, VSETVL, VMUL, ...)








F, D, Q analogue to M as suggested.








The V version will not be a 1:1 match with the standard version and will cover additional aspects. But it can be argued, that when you implement the V version (of M, F, D, Q, ...), then you most likely will have

the relevant standard counterparts implemented as well anyway.








Kind Regards, Tobias



































--
====================================================================
The information contained in this electronic mail message and any attachments hereto
is privileged and confidential information intended only for the use of the individual or 
entity named above or their designee.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.  If you have received this communication in error 
please immediately notify us by return  message or by telephone and delete the 
original message from your mail system.  Thank you.
====================================================================











Re: V extension groups analogue to the standard groups

Andrew Waterman
 



On Fri, Aug 21, 2020 at 4:51 PM Guy Lemieux <glemieux@...> wrote:
I think a common embedded and FPGA scenarios  will be F on the scalar side but no F on the vector side. Adding F to V is nontrivial in area, particularly for FPGAs that lack FPUs, yet an integer-only V makes a lot of sense for pixel processing etc. F in the scalar is a nice-to-have for code size and to calculate scalar parameters, eg in OpenCV and OpenVX.

The current nomenclature assumptions don’t allow this, but I think that they should do so.

We definitely want to sanction configurations that have different datatype support on scalar and vector.  The current thinking is that the letter V means "whatever apps-profile processors want", just like what "G" means on the scalar side.  Perhaps the "vector-with-fewer-datatypes-than-scalar" case can be described as Zvbase instead of V?


Guy

On Fri, Aug 21, 2020 at 4:30 PM Andrew Waterman <andrew@...> wrote:


On Fri, Aug 21, 2020 at 2:43 PM Simon Davidmann <simond@...> wrote:
A question to clarify. You state:
      RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.  

This means if I understand you that we need to add F to get F hardware in the vector unit - so RV32IV means V with no F hardware, and RV32IFV includes F hardware.

So for consistency...

What does RV32IV means for M hardware multiply - do I need to RV32IMV to get scalar and vector hardware multiply?

I don’t believe the spec explicitly addresses this question, but I agree it makes sense. Alternatively, V could require M, since it doesn’t make much sense to pay for a vector unit but be too stingy to pay for a multiplier. But that might be less consistent. (My recommendation is that RV32IV continue to mean “no multiplier”, even though it’s a silly configuration.)


RV32IV means no F and no M hardware? - so I need to explicitly include the extensions I need as V assumes nothing but I?

My recommendation is to clarify in the spec that RV32IV is a valid config with no FPU in the vector unit, and RV32IFV is also a valid config with an FPU in both scalar and vector.


Or is something assumed for M?

If we choose to define that V implies M, RV32IV and RV32IMV would be synonyms.


thanks

On Thu, Aug 20, 2020 at 8:48 PM Andrew Waterman <andrew@...> wrote:
Quad-widening ops have been moved to a separate extension, Zvqmac.

I believe the intent is that the capital-V V extension supports the same FP datatypes as the scalar ISA, so e.g., RV32IV doesn’t mandate any FP hardware in the vector unit, whereas RV32IFV means both scalar and vector support single-precision, etc.

I’m surprised all those hashtags made it past the spam filter!

On Thu, Aug 20, 2020 at 11:42 AM Strauch, Tobias (HENSOLDT Cyber GmbH) <tobias.strauch@...> wrote:
















Apologies if this is old stuff already dismissed. But I give it a try anyway.








Wouldn't it make sense to separate more complex vector instructions from more trivial ones? Already with the very first base release ? Vector instructions can also be helpful in small devices #IOT #Edge #GAP8

#RISCY without the need to fully support floating point instructions or without the need for a quad multiply.








The suggestion would be to basically group vector extensions analogue to the standard instructions (I, M, F, D, Q, …), instead of having an already complex base and then subtract or re-define subsets of instructions

again ?








Wouldn't that be in-line with the RISC-V philosophy of modularity and simplicity ? The beauty would be that you have a non-vector and a vector group version.








Possible nomenclature based on order:








M: Standard Multiply Divide Instructions (MUL, ...)



V: Very Basic Vector Instructions (VSETVL, ...)



MV: Standard Multiply Divide Instructions and Very Basic Vector Instructions (MUL, VSETVL, ...



VM: Standard Multiply Divide Instructions, Very Basic Vector Instructions and Vector Integer Multiply\Divide Instructions (MUL, VSETVL, VMUL, ...)








F, D, Q analogue to M as suggested.








The V version will not be a 1:1 match with the standard version and will cover additional aspects. But it can be argued, that when you implement the V version (of M, F, D, Q, ...), then you most likely will have

the relevant standard counterparts implemented as well anyway.








Kind Regards, Tobias



































--
====================================================================
The information contained in this electronic mail message and any attachments hereto
is privileged and confidential information intended only for the use of the individual or 
entity named above or their designee.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.  If you have received this communication in error 
please immediately notify us by return  message or by telephone and delete the 
original message from your mail system.  Thank you.
====================================================================










441 - 460 of 827