
Re: VFRECIP/VFRSQRT instructions
Do we improve accuracy a bit if the step is:
t = 1.0  r*x; x = x + t*x
instead of:
t = 2.0  r*x; x = t*x
Bill
On 7/14/20 2:58 PM, Bill Huffman wrote:
Do we improve accuracy a bit if the step is:
t = 1.0  r*x; x = x + t*x
instead of:
t = 2.0  r*x; x = t*x
Bill
On 7/14/20 2:58 PM, Bill Huffman wrote:

By
Bill Huffman
·
#274
·


Re: VFRECIP/VFRSQRT instructions
On 7/14/20 2:54 PM, Andrew Waterman wrote:
Sorry, I had scalar on the mind....
Bill
On 7/14/20 2:54 PM, Andrew Waterman wrote:
Sorry, I had scalar on the mind....
Bill

By
Bill Huffman
·
#273
·


Re: VFRECIP/VFRSQRT instructions
Actually, none of the vector instructions use the rs3 field (the vector FMAs are destructive to save encoding space).
There are still several Rtype code points left in the vector opcode, but it has
Actually, none of the vector instructions use the rs3 field (the vector FMAs are destructive to save encoding space).
There are still several Rtype code points left in the vector opcode, but it has

By
Andrew Waterman
·
#272
·


Re: VFRECIP/VFRSQRT instructions
On 7/14/20 2:30 PM, Andrew Waterman wrote:
Seems like they shouldn't be so big as they don't specify rs3 at all. Are we tight on two register input opcodes?
Bill
On 7/14/20 2:30 PM, Andrew Waterman wrote:
Seems like they shouldn't be so big as they don't specify rs3 at all. Are we tight on two register input opcodes?
Bill

By
Bill Huffman
·
#271
·


Re: VFRECIP/VFRSQRT instructions
FWIW, one of my concerns with adding the "step" instructions is opcode space, since we are already very tight. I suppose a compromise might be to make them destructive. This would have no perf.
FWIW, one of my concerns with adding the "step" instructions is opcode space, since we are already very tight. I suppose a compromise might be to make them destructive. This would have no perf.

By
Andrew Waterman
·
#270
·


Re: VFRECIP/VFRSQRT instructions
I forgot to mention that I added sample vector code for estimating square root: https://github.com/riscv/riscvvspec/blob/vfrecip/vectorexamples.adoc#squarerootapproximationexample
Handling the
I forgot to mention that I added sample vector code for estimating square root: https://github.com/riscv/riscvvspec/blob/vfrecip/vectorexamples.adoc#squarerootapproximationexample
Handling the

By
Andrew Waterman
·
#269
·


Re: VFRECIP/VFRSQRT instructions
Hi Andrew et al,
Thank you for sending the code. I am attaching an updated version of recip.cc, implementing the complete NewtonRaphson sequence, using the proposed reciprocal estimate instructions
Hi Andrew et al,
Thank you for sending the code. I am attaching an updated version of recip.cc, implementing the complete NewtonRaphson sequence, using the proposed reciprocal estimate instructions

By
Mr Grigorios Magklis
·
#268
·


Re: VFRECIP/VFRSQRT instructions
I've updated the proposal to describe the corner cases:
https://github.com/riscv/riscvvspec/blob/vfrecip/vspec.adoc#149vectorfloatingpointreciprocalestimateinstruction
I've updated the proposal to describe the corner cases:
https://github.com/riscv/riscvvspec/blob/vfrecip/vspec.adoc#149vectorfloatingpointreciprocalestimateinstruction

By
Andrew Waterman
·
#267
·


Vector TG meeting
We’ll have our regular TG meeting in a few hours per member calendar.
We’ll continue to clean up remaining issues for v1.0,
Krste
We’ll have our regular TG meeting in a few hours per member calendar.
We’ll continue to clean up remaining issues for v1.0,
Krste

By
Krste Asanovic
·
#266
·


Re: Sparse MatrixVector Multiply (again) and BitVector Compression
For the code segment given, Blelloch's loop raking approach would be
worth exploring for the V extension. This approach involves large
constant stride accesses to A[] and col[j] array but will keep
For the code segment given, Blelloch's loop raking approach would be
worth exploring for the V extension. This approach involves large
constant stride accesses to A[] and col[j] array but will keep

By
Krste Asanovic
·
#265
·


Re: decide on V1.0 merit  Minutes of 2020/7/3 meeting
I messed up the links: the list of open unlabeled issues is here:
https://github.com/riscv/riscvvspec/issues?q=is%3Aissue+is%3Aopen+no%3Alabel
On 20200709 6:28 p.m.,
I messed up the links: the list of open unlabeled issues is here:
https://github.com/riscv/riscvvspec/issues?q=is%3Aissue+is%3Aopen+no%3Alabel
On 20200709 6:28 p.m.,

By
David Horner
·
#264
·


Re: decide on V1.0 merit  Minutes of 2020/7/3 meeting
There are 19 open issues that aren't yet labeled.
Does it make sense that those who will be on the call review them with an idea to categorize as for or after V1.0?
That should also
There are 19 open issues that aren't yet labeled.
Does it make sense that those who will be on the call review them with an idea to categorize as for or after V1.0?
That should also

By
David Horner
·
#263
·


Re: Sparse MatrixVector Multiply (again) and BitVector Compression
here is dongarra’s take on HPCG. hope this helps.
——————————
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version
here is dongarra’s take on HPCG. hope this helps.
——————————
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version

By
swallach
·
#262
·


Re: VFRECIP/VFRSQRT instructions
I'm following up with detailed semantics in the form of a selfcontained C++ program. The `recip` and `rsqrt` functions model the proposed instructions. When the program is invoked with the
I'm following up with detailed semantics in the form of a selfcontained C++ program. The `recip` and `rsqrt` functions model the proposed instructions. When the program is invoked with the

By
Andrew Waterman
·
#261
·


Re: Sparse MatrixVector Multiply (again) and BitVector Compression
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version of the spec  please correct me if I'm wrong  what I called "sketchy typepunning"
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version of the spec  please correct me if I'm wrong  what I called "sketchy typepunning"

By
Nick Knight
·
#260
·


Re: Duplicate Counting Instruction
Hi Krste,
Just would like to continue Roger's question on hardware implementation, as you said it can be done with a parallelprefixstyle ORreduction tree, so can you please explain how we can avoid
Hi Krste,
Just would like to continue Roger's question on hardware implementation, as you said it can be done with a parallelprefixstyle ORreduction tree, so can you please explain how we can avoid

By
lidawei14@...
·
#259
·


Re: Sparse MatrixVector Multiply (again) and BitVector Compression
please share the asm for spmv, the key kernel (s),
in any case, the execution time for operations using a mask, is very implementation/machine dependent
it is a function on how aggressive, in
please share the asm for spmv, the key kernel (s),
in any case, the execution time for operations using a mask, is very implementation/machine dependent
it is a function on how aggressive, in

By
swallach
·
#258
·


Sparse MatrixVector Multiply (again) and BitVector Compression
 I am now investigating how to efficiently implement sparse matrix X (dense) vector multiplications (spMV) using RISCV vectors using bitvector format of
 compressing the sparse matrix. The inner
 I am now investigating how to efficiently implement sparse matrix X (dense) vector multiplications (spMV) using RISCV vectors using bitvector format of
 compressing the sparse matrix. The inner

By
Krste Asanovic
·
#257
·


Re: Duplicate Counting Instruction
vmhash should be cheap relative to the work you're doing on each loop.
redoing vmhash in each stripmine could lead to better performance as
you find longer nonconflicting index runs, rather than
vmhash should be cheap relative to the work you're doing on each loop.
redoing vmhash in each stripmine could lead to better performance as
you find longer nonconflicting index runs, rather than

By
Krste Asanovic
·
#256
·


Re: Duplicate Counting Instruction
Hi Krste,
I read through your code and thanks for correcting my errors, 'or' is a good idea for multiple duplicates.
Here I'd like to explain why I made things a bit more complicated in my code.
In
Hi Krste,
I read through your code and thanks for correcting my errors, 'or' is a good idea for multiple duplicates.
Here I'd like to explain why I made things a bit more complicated in my code.
In

By
lidawei14@...
·
#255
·
