|
Re: VFRECIP/VFRSQRT instructions
Do we improve accuracy a bit if the step is:
t = 1.0 - r*x; x = x + t*x
instead of:
t = 2.0 - r*x; x = t*x
Bill
On 7/14/20 2:58 PM, Bill Huffman wrote:
Do we improve accuracy a bit if the step is:
t = 1.0 - r*x; x = x + t*x
instead of:
t = 2.0 - r*x; x = t*x
Bill
On 7/14/20 2:58 PM, Bill Huffman wrote:
|
By
Bill Huffman
·
#274
·
|
|
Re: VFRECIP/VFRSQRT instructions
On 7/14/20 2:54 PM, Andrew Waterman wrote:
Sorry, I had scalar on the mind....
Bill
On 7/14/20 2:54 PM, Andrew Waterman wrote:
Sorry, I had scalar on the mind....
Bill
|
By
Bill Huffman
·
#273
·
|
|
Re: VFRECIP/VFRSQRT instructions
Actually, none of the vector instructions use the rs3 field (the vector FMAs are destructive to save encoding space).
There are still several R-type code points left in the vector opcode, but it has
Actually, none of the vector instructions use the rs3 field (the vector FMAs are destructive to save encoding space).
There are still several R-type code points left in the vector opcode, but it has
|
By
Andrew Waterman
·
#272
·
|
|
Re: VFRECIP/VFRSQRT instructions
On 7/14/20 2:30 PM, Andrew Waterman wrote:
Seems like they shouldn't be so big as they don't specify rs3 at all. Are we tight on two register input opcodes?
Bill
On 7/14/20 2:30 PM, Andrew Waterman wrote:
Seems like they shouldn't be so big as they don't specify rs3 at all. Are we tight on two register input opcodes?
Bill
|
By
Bill Huffman
·
#271
·
|
|
Re: VFRECIP/VFRSQRT instructions
FWIW, one of my concerns with adding the "step" instructions is opcode space, since we are already very tight. I suppose a compromise might be to make them destructive. This would have no perf.
FWIW, one of my concerns with adding the "step" instructions is opcode space, since we are already very tight. I suppose a compromise might be to make them destructive. This would have no perf.
|
By
Andrew Waterman
·
#270
·
|
|
Re: VFRECIP/VFRSQRT instructions
I forgot to mention that I added sample vector code for estimating square root: https://github.com/riscv/riscv-v-spec/blob/vfrecip/vector-examples.adoc#square-root-approximation-example
Handling the
I forgot to mention that I added sample vector code for estimating square root: https://github.com/riscv/riscv-v-spec/blob/vfrecip/vector-examples.adoc#square-root-approximation-example
Handling the
|
By
Andrew Waterman
·
#269
·
|
|
Re: VFRECIP/VFRSQRT instructions
Hi Andrew et al,
Thank you for sending the code. I am attaching an updated version of recip.cc, implementing the complete Newton-Raphson sequence, using the proposed reciprocal estimate instructions
Hi Andrew et al,
Thank you for sending the code. I am attaching an updated version of recip.cc, implementing the complete Newton-Raphson sequence, using the proposed reciprocal estimate instructions
|
By
Mr Grigorios Magklis
·
#268
·
|
|
Re: VFRECIP/VFRSQRT instructions
I've updated the proposal to describe the corner cases:
https://github.com/riscv/riscv-v-spec/blob/vfrecip/v-spec.adoc#149-vector-floating-point-reciprocal-estimate-instruction
I've updated the proposal to describe the corner cases:
https://github.com/riscv/riscv-v-spec/blob/vfrecip/v-spec.adoc#149-vector-floating-point-reciprocal-estimate-instruction
|
By
Andrew Waterman
·
#267
·
|
|
Vector TG meeting
We’ll have our regular TG meeting in a few hours per member calendar.
We’ll continue to clean up remaining issues for v1.0,
Krste
We’ll have our regular TG meeting in a few hours per member calendar.
We’ll continue to clean up remaining issues for v1.0,
Krste
|
By
Krste Asanovic
·
#266
·
|
|
Re: Sparse Matrix-Vector Multiply (again) and Bit-Vector Compression
For the code segment given, Blelloch's loop raking approach would be
worth exploring for the V extension. This approach involves large
constant stride accesses to A[] and col[j] array but will keep
For the code segment given, Blelloch's loop raking approach would be
worth exploring for the V extension. This approach involves large
constant stride accesses to A[] and col[j] array but will keep
|
By
Krste Asanovic
·
#265
·
|
|
Re: decide on V1.0 merit - Minutes of 2020/7/3 meeting
I messed up the links: the list of open unlabeled issues is here:
https://github.com/riscv/riscv-v-spec/issues?q=is%3Aissue+is%3Aopen+no%3Alabel
On 2020-07-09 6:28 p.m.,
I messed up the links: the list of open unlabeled issues is here:
https://github.com/riscv/riscv-v-spec/issues?q=is%3Aissue+is%3Aopen+no%3Alabel
On 2020-07-09 6:28 p.m.,
|
By
David Horner
·
#264
·
|
|
Re: decide on V1.0 merit - Minutes of 2020/7/3 meeting
There are 19 open issues that aren't yet labeled.
Does it make sense that those who will be on the call review them with an idea to categorize as for or after V1.0?
That should also
There are 19 open issues that aren't yet labeled.
Does it make sense that those who will be on the call review them with an idea to categorize as for or after V1.0?
That should also
|
By
David Horner
·
#263
·
|
|
Re: Sparse Matrix-Vector Multiply (again) and Bit-Vector Compression
here is dongarra’s take on HPCG. hope this helps.
——————————
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version
here is dongarra’s take on HPCG. hope this helps.
——————————
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version
|
By
swallach
·
#262
·
|
|
Re: VFRECIP/VFRSQRT instructions
I'm following up with detailed semantics in the form of a self-contained C++ program. The `recip` and `rsqrt` functions model the proposed instructions. When the program is invoked with the
I'm following up with detailed semantics in the form of a self-contained C++ program. The `recip` and `rsqrt` functions model the proposed instructions. When the program is invoked with the
|
By
Andrew Waterman
·
#261
·
|
|
Re: Sparse Matrix-Vector Multiply (again) and Bit-Vector Compression
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version of the spec --- please correct me if I'm wrong --- what I called "sketchy type-punning"
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version of the spec --- please correct me if I'm wrong --- what I called "sketchy type-punning"
|
By
Nick Knight
·
#260
·
|
|
Re: Duplicate Counting Instruction
Hi Krste,
Just would like to continue Roger's question on hardware implementation, as you said it can be done with a parallel-prefix-style OR-reduction tree, so can you please explain how we can avoid
Hi Krste,
Just would like to continue Roger's question on hardware implementation, as you said it can be done with a parallel-prefix-style OR-reduction tree, so can you please explain how we can avoid
|
By
lidawei14@...
·
#259
·
|
|
Re: Sparse Matrix-Vector Multiply (again) and Bit-Vector Compression
please share the asm for spmv, the key kernel (s),
in any case, the execution time for operations using a mask, is very implementation/machine dependent
it is a function on how aggressive, in
please share the asm for spmv, the key kernel (s),
in any case, the execution time for operations using a mask, is very implementation/machine dependent
it is a function on how aggressive, in
|
By
swallach
·
#258
·
|
|
Sparse Matrix-Vector Multiply (again) and Bit-Vector Compression
| I am now investigating how to efficiently implement sparse matrix X (dense) vector multiplications (spMV) using RISCV vectors using bit-vector format of
| compressing the sparse matrix. The inner
| I am now investigating how to efficiently implement sparse matrix X (dense) vector multiplications (spMV) using RISCV vectors using bit-vector format of
| compressing the sparse matrix. The inner
|
By
Krste Asanovic
·
#257
·
|
|
Re: Duplicate Counting Instruction
vmhash should be cheap relative to the work you're doing on each loop.
redoing vmhash in each stripmine could lead to better performance as
you find longer non-conflicting index runs, rather than
vmhash should be cheap relative to the work you're doing on each loop.
redoing vmhash in each stripmine could lead to better performance as
you find longer non-conflicting index runs, rather than
|
By
Krste Asanovic
·
#256
·
|
|
Re: Duplicate Counting Instruction
Hi Krste,
I read through your code and thanks for correcting my errors, 'or' is a good idea for multiple duplicates.
Here I'd like to explain why I made things a bit more complicated in my code.
In
Hi Krste,
I read through your code and thanks for correcting my errors, 'or' is a good idea for multiple duplicates.
Here I'd like to explain why I made things a bit more complicated in my code.
In
|
By
lidawei14@...
·
#255
·
|