|
Re: Vector TG minutes for 2020/12/18 meeting
for vsetivli, with the uimm=00000 encoding, rather than setting vl to 32, how setting it to some other meaning?
one option is to set vl=VLMAX. i have some concerns about software using this safely
for vsetivli, with the uimm=00000 encoding, rather than setting vl to 32, how setting it to some other meaning?
one option is to set vl=VLMAX. i have some concerns about software using this safely
|
By
Guy Lemieux
·
#547
·
|
|
Vector TG minutes for 2020/12/18 meeting
Date: 2020/12/18
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of Attendees: ~10
Current issues on github: https://github.com/riscv/riscv-v-spec
Note: No more
Date: 2020/12/18
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of Attendees: ~10
Current issues on github: https://github.com/riscv/riscv-v-spec
Note: No more
|
By
Krste Asanovic
·
#546
·
|
|
Last vector TG meeting of 2020, usual time, Friday Dec 17
Agenda is hopefully clearing up any remaining major issues before 1.0 draft can go out,
Krste
Agenda is hopefully clearing up any remaining major issues before 1.0 draft can go out,
Krste
|
By
Krste Asanovic
·
#545
·
|
|
Re: Vector Task Group minutes 2020/12/04
I am totally in agreement with Krste. Adding the mask load/store is an improvement but adding the new mask registers is too disruptive and increasing in area.
Thanks, Thang
I am totally in agreement with Krste. Adding the mask load/store is an improvement but adding the new mask registers is too disruptive and increasing in area.
Thanks, Thang
|
By
Thang Tran
·
#544
·
|
|
Re: Vector Task Group minutes 2020/12/04
I'm not contemplating changing mask design (yet again) at this point
in process. I don't see any great advantage to any of these last
round of proposals, as they all have significant downsides for
I'm not contemplating changing mask design (yet again) at this point
in process. I don't see any great advantage to any of these last
round of proposals, as they all have significant downsides for
|
By
Krste Asanovic
·
#543
·
|
|
Re: Vector Task Group minutes 2020/12/04
imho, since are trying to both address the embedded market and the hpc market, we have conflicts wrt logic, power, and cost
addressing the hpc market, 8 extra registers for VM, appropriately
imho, since are trying to both address the embedded market and the hpc market, we have conflicts wrt logic, power, and cost
addressing the hpc market, 8 extra registers for VM, appropriately
|
By
swallach
·
#542
·
|
|
Re: Vector Task Group minutes 2020/12/04
In some cases we have widening computations with large LMUL settings, we will quickly run out of v0-v31 if we also have to keep masks in these registers.
In some cases we have widening computations with large LMUL settings, we will quickly run out of v0-v31 if we also have to keep masks in these registers.
|
By
lidawei14@...
·
#541
·
|
|
Re: Vector Task Group minutes 2020/12/04
Having been a silent observer of this group for what seems like a very long time, but now recently liberated from previous constraints, I will observe that I have seen the use in DSPs of both
Having been a silent observer of this group for what seems like a very long time, but now recently liberated from previous constraints, I will observe that I have seen the use in DSPs of both
|
By
Grant Martin
·
#540
·
|
|
Re: Vector Task Group minutes 2020/12/04
One option is to allow mask generating instructions (compares) to write either to regular vector regs or to vmask and to provide move instructions between vector regs and vmask.
But mask consuming
One option is to allow mask generating instructions (compares) to write either to regular vector regs or to vmask and to provide move instructions between vector regs and vmask.
But mask consuming
|
By
Alex Solomatnikov
·
#539
·
|
|
Re: Vector Task Group minutes 2020/12/04
i guess i am looking at the wrong set of apps.
in any case VM registers NOT in the vector registers permits a robust and performance optimized operations under mask.
wrt extra instructions. i am
i guess i am looking at the wrong set of apps.
in any case VM registers NOT in the vector registers permits a robust and performance optimized operations under mask.
wrt extra instructions. i am
|
By
swallach
·
#538
·
|
|
Re: Vector Task Group minutes 2020/12/04
8 Maks registers are quite needed in modern outer-vectorized loops. Also in graphic shaders. I would say 16 is overkill.
Now, and I am not defending this, if we had to go this route, I would
8 Maks registers are quite needed in modern outer-vectorized loops. Also in graphic shaders. I would say 16 is overkill.
Now, and I am not defending this, if we had to go this route, I would
|
By
Roger Espasa
·
#537
·
|
|
Re: Vector Task Group minutes 2020/12/04
i would also add, that if 8 or 16 registers are needed, why do we only have one register, , V0. if this were true we would need to multi-plex between varius vector registers and V0
i believe i
i would also add, that if 8 or 16 registers are needed, why do we only have one register, , V0. if this were true we would need to multi-plex between varius vector registers and V0
i believe i
|
By
swallach
·
#536
·
|
|
Re: Vector Task Group minutes 2020/12/04
in my experience only only one maybe two vm registers are needed
nested loops under if statements is rare.
http://bsc.es/disclaimer
in my experience only only one maybe two vm registers are needed
nested loops under if statements is rare.
http://bsc.es/disclaimer
|
By
swallach
·
#535
·
|
|
Re: Vector Task Group minutes 2020/12/04
I don’t think a separate mask register will do at all. It would take a mask register file with at least 8 and maybe 16 registers. Lots of compare results need to be kept and operations need to be
I don’t think a separate mask register will do at all. It would take a mask register file with at least 8 and maybe 16 registers. Lots of compare results need to be kept and operations need to be
|
By
Bill Huffman
·
#534
·
|
|
Re: Vector Task Group minutes 2020/12/04
i totally agree. if this is done, then instructions like: count bits, etc can directly apply to the mask register.
also, from a hardware implementation, the VM register can be implemented
i totally agree. if this is done, then instructions like: count bits, etc can directly apply to the mask register.
also, from a hardware implementation, the VM register can be implemented
|
By
swallach
·
#533
·
|
|
Re: Vector Task Group minutes 2020/12/04
If separate loads and stores are introduced for mask, then separate vmask register can be introduced to avoid dual use of v0 (as a regular vector register and as a mask register) and its
If separate loads and stores are introduced for mask, then separate vmask register can be introduced to avoid dual use of v0 (as a regular vector register and as a mask register) and its
|
By
Alex Solomatnikov
·
#532
·
|
|
Re: 答复: [RISC-V] [tech-vector-ext] The scenarios of GEMM for u/int8 data
Hi,David
Can we see the git of your work?
My code has not been upload to git, and I will show it in the mail.
Does this mean the 32 vector registers are not enough,
or that the
Hi,David
Can we see the git of your work?
My code has not been upload to git, and I will show it in the mail.
Does this mean the 32 vector registers are not enough,
or that the
|
By
Linjie Yu
·
#531
·
|
|
Re: The scenarios of GEMM for u/int8 data
Linjie Yu <linjie.ylj@...> 於 2020年12月11日 週五 下午4:34寫道:
Have you consider to use fraction LMUL?
Linjie Yu <linjie.ylj@...> 於 2020年12月11日 週五 下午4:34寫道:
Have you consider to use fraction LMUL?
|
By
Zakk Chen
·
#530
·
|
|
Re: The scenarios of GEMM for u/int8 data
On 2020-12-11 3:34 a.m., Linjie Yu wrote:
Can we see the git of your work?
Does this mean the 32 vector registers are not enough,
or that the number of elements for
On 2020-12-11 3:34 a.m., Linjie Yu wrote:
Can we see the git of your work?
Does this mean the 32 vector registers are not enough,
or that the number of elements for
|
By
David Horner
·
#529
·
|
|
The scenarios of GEMM for u/int8 data
Hi,all
Recently, I optimized the kernel of GEMM for int8 data. I found that there was no good solution to do in by the use of the present vector ISA.
The mainly difficult I meet is: The
Hi,all
Recently, I optimized the kernel of GEMM for int8 data. I found that there was no good solution to do in by the use of the present vector ISA.
The mainly difficult I meet is: The
|
By
Linjie Yu
·
#528
·
|