#### Re: VFRECIP/VFRSQRT instructions

https://github.com/David-Horner/recip/blob/master/vrecip.cc

What I initially posted was a compilation from days previously, and I pulled in some bogus test results.

Here is a fresh run :

./a.out 7 5 ;./a.out 7 6 ;./a.out 7 7 ;./a.out 7 8 ;./a.out 7 9 ;./a.out 7 10 ;./a.out 7 11 ;./a.out 8 7 ;./a.out 8 ;./a.out 8 9 ;./a.out 9 ;

ip 7 op 5 LUT #bits 640 verilog 0 test/test-long 1

max recip 7x5 error: 2^-5.89148

max rsqrt 7x5 error: 2^-5.98208

ip 7 op 6 LUT #bits 768 verilog 0 test/test-long 1

max recip 7x6 error: 2^-6.79055

max rsqrt 7x6 error: 2^-6.73312

ip 7 op 7 LUT #bits 896 verilog 0 test/test-long 1

max recip 7x7 error: 2^-7.4843

max rsqrt 7x7 error: 2^-7.31422

ip 7 op 8 LUT #bits 1024 verilog 0 test/test-long 1

max recip 7x8 error: 2^-7.77603

max rsqrt 7x8 error: 2^-7.6318

ip 7 op 9 LUT #bits 1152 verilog 0 test/test-long 1

max recip 7x9 error: 2^-7.8889

max rsqrt 7x9 error: 2^-7.87831

ip 7 op 10 LUT #bits 1280 verilog 0 test/test-long 1

max recip 7x10 error: 2^-7.94879

max rsqrt 7x10 error: 2^-7.89712

ip 7 op 11 LUT #bits 1408 verilog 0 test/test-long 1

max recip 7x11 error: 2^-7.97629

max rsqrt 7x11 error: 2^-8

ip 8 op 7 LUT #bits 1792 verilog 0 test/test-long 1

max recip 8x7 error: 2^-7.77602

max rsqrt 8x7 error: 2^-7.72555

estimate width, op=0, out of range reset to default

ip 8 op 8 LUT #bits 2048 verilog 0 test/test-long 1

max recip 8x8 error: 2^-8.45311

max rsqrt 8x8 error: 2^-8.25349

ip 8 op 9 LUT #bits 2304 verilog 0 test/test-long 1

max recip 8x9 error: 2^-8.71923

max rsqrt 8x9 error: 2^-8.67807

estimate width, op=0, out of range reset to default

ip 9 op 9 LUT #bits 4608 verilog 0 test/test-long 1

max recip 9x9 error: 2^-9.43021

max rsqrt 9x9 error: 2^-9.28082

On 2020-08-01 12:27 a.m., Bill Huffman wrote:

It came from some testing I was performing on adjacent index values.David,

Here are a series of statements leading to my worst case answer:

- For the mantissa range 0xF5_0000 to 0xF5_FFFF, the reciprocal estimate is 0x85_0000
- The largest error is for 0xF5_0000
- The reciprocal of 0xF5_0000, to "infinite" precision is 0x85_BF37.612D
- The relative error, then, is (0x85_0000 - 0x85_BF37.612D)/0x85_BF37.612D => -0x0.016E_0000_0022
- The log2 of the absolute value of that error is: -7.484300
I don't have any errors as large as to have a log2 of -7.36951. Where did that error come from for you?

Completely bogus as I mentioned above.

I will instrument the code for more details, but I suspect this code has exactly the same worst case situation (for 7x7).

Bill

On 7/31/20 2:25 PM, DSHORNER wrote:

EXTERNAL MAILThis is the program Andrew wrote.

https://github.com/riscv/riscv-v-spec/blob/vfrecip/recip.cc

On 2020-07-31 4:46 p.m., Bill Huffman wrote:

That is correct, Andrew's approach assumes the implicit high hidden bit.David,

Because of the errors you get, I'm assuming your "output width" and "input width" do not include the hidden bit. Right?

Andrew chose a range from [xn , (x+1)n) perhaps (xn,(x+1)n] will work better.It's interesting. I did a similar exercise a number of years ago and got a few hundredths of a bit better accuracy from 7/7 tables. It's possible I did it wrong. It's also possible that there's a slight improvement available.

I will give it a try.

If you want to send me the tables I can compare. Mine are in decimal numbers from 128 to 255. I could send you tables as well.

As mentioned Andrew's --verilog directive creates a table; both input and output in range from 0 to 127.

I wouldn't expect the bias to make any significant difference.

I'd be happy to see your tables.

If you want I will send Andrew's program's output for 7x7.

And any of the other listed combinations from my mods to his program.

I will post my mods even though I still get that seg fault with null (or single) command line args.

Then you could run your own.