Re: VFRECIP/VFRSQRT instructions
David Horner
This is the program Andrew wrote.
https://github.com/riscv/riscv-v-spec/blob/vfrecip/recip.cc
I will give it a try.
As mentioned Andrew's --verilog directive creates a table; both input and output in range from 0 to 127.
I wouldn't expect the bias to make any significant difference.
I'd be happy to see your tables.
If you want I will send Andrew's program's output for 7x7.
And any of the other listed combinations from my mods to his program.
I will post my mods even though I still get that seg fault with null (or single) command line args.
Then you could run your own.
https://github.com/riscv/riscv-v-spec/blob/vfrecip/recip.cc
On 2020-07-31 4:46 p.m., Bill Huffman
wrote:
That is correct, Andrew's approach assumes the implicit high hidden bit.David,
Because of the errors you get, I'm assuming your "output width" and "input width" do not include the hidden bit. Right?
Andrew chose a range from [xn , (x+1)n) perhaps (xn,(x+1)n] will work better.It's interesting. I did a similar exercise a number of years ago and got a few hundredths of a bit better accuracy from 7/7 tables. It's possible I did it wrong. It's also possible that there's a slight improvement available.
I will give it a try.
If you want to send me the tables I can compare. Mine are in decimal numbers from 128 to 255. I could send you tables as well.
As mentioned Andrew's --verilog directive creates a table; both input and output in range from 0 to 127.
I wouldn't expect the bias to make any significant difference.
I'd be happy to see your tables.
If you want I will send Andrew's program's output for 7x7.
And any of the other listed combinations from my mods to his program.
I will post my mods even though I still get that seg fault with null (or single) command line args.
Then you could run your own.
Bill
On 7/31/20 12:24 PM, David Horner wrote:
EXTERNAL MAIL
The error is relative error.The calculation is unchanged from Andrew's original. (Although I explicitly force double even when it shouldn't matter).The test range is from 0.5 to 1 inclusive.Again I left Andrew's choices unchanged.As you point out the 1 case should not contribute to max error in the reciprocal case as the error should be zero. The 1 case for rsqrt for odd powers of 2 exponent is non zero by definition as sqrt 2 is irrational.Andrew provides test_long which tests all single precision values that are not NaN.
I hope to post my code soon.I get a seg fault when no parm are provided on the command line.Argh. I belive it is relates to handling the argv as a vector of values. It appears the support construct moves some code out of reates out of conditional scope and thus frees even when no explicit allocation is made.
F6
On Fri, Jul 31, 2020, 13:29 Bill Huffman, <huffman@...> wrote:
David,
Are the max errors absolute? Or relative to the recip or rsqrt, which
is presumably in the range (1.0, 2.0]?
That you use [0.5, 1] when you might have meant [0.5, 1) leaves some
question about what is happening with the powers of two (even powers for
rsqrt). Hopefully, they're always precise and there's no issue of error
there.
Bill
On 7/31/20 4:33 AM, David Horner wrote:
> EXTERNAL MAIL
>
>
> The current LUT generator assumes N-by-N look up table.
>
> I will load in my github Andrew's program modified to take input (index
> size) and output (estimate number of bits) arguments.
> (--verilog still generates LUT, and test generates the values below),
>
> Of course, how well the table can be synthesized is more importance than
> LUT dimensions per se.
> I have used yosys with varying results.
>
> I continue to try to profile the accuracy within input segments rather
> than over total float range.
>
>
> Other dimensions are possible with resultant increase of decrease in
> accuracy:
> parameters are: output width / input width array size
>
> 7/7 896
> max recip error on [0.5, 1]: 2^-7.36951
> max rsqrt error on [0.25, 1]: 2^-7.2998
> 8/7 1024
> max recip error on [0.5, 1]: 2^-7.78083
> max rsqrt error on [0.25, 1]: 2^-7.69943
> 9/7 1152
> max recip error on [0.5, 1]: 2^-7.89148
> max rsqrt error on [0.25, 1]: 2^-7.8783
> 10/7 1280
> max recip error on [0.5, 1]: 2^-7.94879
> max rsqrt error on [0.25, 1]: 2^-7.91063
> 11/7 1408 by 1
> max recip error on [0.5, 1]: 2^-7.97629
> max rsqrt error on [0.25, 1]: 2^-8
> 11/7 1408 by 0x1000
> max recip error on [0.5, 1]: 2^-8
> max rsqrt error on [0.25, 1]: 2^-8
> 6/8 1536
> max recip error on [0.5, 1]: 2^-6.90724
> max rsqrt error on [0.25, 1]: 2^-6.88438
> 7/8 1792
> max recip error on [0.5, 1]: 2^-7.78083
> max rsqrt error on [0.25, 1]: 2^-7.73897
> 8/8 2048 by 0X1000
> max recip error on [0.5, 1]: 2^-8.45311
> max rsqrt error on [0.25, 1]: 2^-8.36032
> 10/8 2560 by 0X1000
> max recip error on [0.5, 1]: 2^-8.88626
> max rsqrt error on [0.25, 1]: 2^-8.8425
> 10/8 2560 by 0X1
> max recip error on [0.5, 1]: 2^-8.88626
> max rsqrt error on [0.25, 1]: 2^-8.83164
> 12/8
> max recip error on [0.5, 1]: 2^-8.98041
> max rsqrt error on [0.25, 1]: 2^-8.9616
> 13/8 2560
> max recip error on [0.5, 1]: 2^-9
> max rsqrt error on [0.25, 1]: 2^-9
> 9/9
> max recip error on [0.5, 1]: 2^-9.47252
> max rsqrt error on [0.25, 1]: 2^-9.31953
>
>
> observation:
> 7 input bits has a minimum max error of 2^-8.
> 8 input bits has a minimum max error of 2^-9.
>
>
>
>