#### Re: VFRECIP/VFRSQRT instructions

Bill Huffman

David,

Here are a series of statements leading to my worst case answer:

• For the mantissa range 0xF5_0000 to 0xF5_FFFF, the reciprocal estimate is 0x85_0000
• The largest error is for 0xF5_0000
• The reciprocal of 0xF5_0000, to "infinite" precision is 0x85_BF37.612D
• The relative error, then, is (0x85_0000 - 0x85_BF37.612D)/0x85_BF37.612D => -0x0.016E_0000_0022
• The log2 of the absolute value of that error is: -7.484300

I don't have any errors as large as to have a log2 of -7.36951.  Where did that error come from for you?

Bill

On 7/31/20 2:25 PM, DSHORNER wrote:

EXTERNAL MAIL

This is the program Andrew wrote.
https://github.com/riscv/riscv-v-spec/blob/vfrecip/recip.cc

On 2020-07-31 4:46 p.m., Bill Huffman wrote:

David,

Because of the errors you get, I'm assuming your "output width" and "input width" do not include the hidden bit.  Right?

That is correct, Andrew's approach assumes the implicit high hidden bit.

It's interesting.  I did a similar exercise a number of years ago and got a few hundredths of a bit better accuracy from 7/7 tables.  It's possible I did it wrong.  It's also possible that there's a slight improvement available.

Andrew chose a range from [xn , (x+1)n) perhaps (xn,(x+1)n] will work better.
I will give it a try.

If you want to send me the tables I can compare.  Mine are in decimal numbers from 128 to 255.  I could send you tables as well.

As mentioned Andrew's --verilog directive creates a table;  both input and output in range from 0 to 127.
I wouldn't expect the bias to make any significant difference.

I'd be happy to see your tables.
If you want I will send Andrew's program's output for 7x7.
And any of the other listed combinations from my mods to his program.
I will post my mods even though I still get that seg fault with null (or single) command line args.
Then you could run your own.

Bill

On 7/31/20 12:24 PM, David Horner wrote:
EXTERNAL MAIL

The error is relative error.
The calculation is unchanged from Andrew's original. (Although I explicitly force double even when it shouldn't matter).
The test range is from 0.5 to 1 inclusive.
Again I left Andrew's choices unchanged.
As you point out the 1 case should not contribute to max error in the reciprocal case as the error should be zero. The 1 case for rsqrt for odd powers of 2 exponent is non zero by definition as sqrt  2 is irrational.
Andrew provides test_long which tests all single precision values that are not NaN.

I hope to post my code soon.
I get a seg fault when no parm are provided on the command line.
Argh. I belive it is relates to handling the argv as a vector of values. It appears the support construct moves some code out of reates out of conditional scope and thus frees even when no explicit allocation is made.

F6

On Fri, Jul 31, 2020, 13:29 Bill Huffman, <huffman@...> wrote:
David,

Are the max errors absolute?  Or relative to the recip or rsqrt, which
is presumably in the range (1.0, 2.0]?

That you use [0.5, 1] when you might have meant [0.5, 1) leaves some
question about what is happening with the powers of two (even powers for
rsqrt).  Hopefully, they're always precise and there's no issue of error
there.

Bill

On 7/31/20 4:33 AM, David Horner wrote:
> EXTERNAL MAIL
>
>
> The current LUT generator assumes N-by-N look up table.
>
> I will load in my github Andrew's program modified to  take input (index
> size) and output (estimate number of bits) arguments.
>    (--verilog still generates LUT, and test generates the values below),
>
> Of course, how well the table can be synthesized is more importance than
> LUT dimensions per se.
> I have used yosys with varying results.
>
> I continue to try to profile the accuracy within input segments rather
> than over total float range.
>
>
> Other dimensions are possible with resultant increase of decrease in
> accuracy:
> parameters are: output width / input width  array size
>
> 7/7 896
> max recip error on [0.5, 1]: 2^-7.36951
> max rsqrt error on [0.25, 1]: 2^-7.2998
> 8/7 1024
> max recip error on [0.5, 1]: 2^-7.78083
> max rsqrt error on [0.25, 1]: 2^-7.69943
> 9/7 1152
> max recip error on [0.5, 1]: 2^-7.89148
> max rsqrt error on [0.25, 1]: 2^-7.8783
> 10/7 1280
> max recip error on [0.5, 1]: 2^-7.94879
> max rsqrt error on [0.25, 1]: 2^-7.91063
> 11/7 1408 by 1
> max recip error on [0.5, 1]: 2^-7.97629
> max rsqrt error on [0.25, 1]: 2^-8
> 11/7 1408 by 0x1000
> max recip error on [0.5, 1]: 2^-8
> max rsqrt error on [0.25, 1]: 2^-8
> 6/8 1536
> max recip error on [0.5, 1]: 2^-6.90724
> max rsqrt error on [0.25, 1]: 2^-6.88438
> 7/8 1792
> max recip error on [0.5, 1]: 2^-7.78083
> max rsqrt error on [0.25, 1]: 2^-7.73897
> 8/8 2048 by 0X1000
> max recip error on [0.5, 1]: 2^-8.45311
> max rsqrt error on [0.25, 1]: 2^-8.36032
> 10/8 2560 by 0X1000
> max recip error on [0.5, 1]: 2^-8.88626
> max rsqrt error on [0.25, 1]: 2^-8.8425
> 10/8 2560 by 0X1
> max recip error on [0.5, 1]: 2^-8.88626
> max rsqrt error on [0.25, 1]: 2^-8.83164
> 12/8
> max recip error on [0.5, 1]: 2^-8.98041
> max rsqrt error on [0.25, 1]: 2^-8.9616
> 13/8 2560
> max recip error on [0.5, 1]: 2^-9
> max rsqrt error on [0.25, 1]: 2^-9
> 9/9
> max recip error on [0.5, 1]: 2^-9.47252
> max rsqrt error on [0.25, 1]: 2^-9.31953
>
>
> observation:
> 7 input bits has a minimum max error of 2^-8.
> 8 input bits has a minimum max error of 2^-9.
>
>
>
>

Join {tech-vector-ext@lists.riscv.org to automatically receive all group messages.