Re: VFRECIP/VFRSQRT instructions
Here is a fresh run :
./a.out 7 5 ;./a.out 7 6 ;./a.out 7 7 ;./a.out 7 8 ;./a.out 7 9 ;./a.out 7 10 ;./a.out 7 11 ;./a.out 8 7 ;./a.out 8 ;./a.out 8 9 ;./a.out 9 ;
ip 7 op 5 LUT #bits 640 verilog 0 test/test-long 1
max recip 7x5 error: 2^-5.89148
max rsqrt 7x5 error: 2^-5.98208
ip 7 op 6 LUT #bits 768 verilog 0 test/test-long 1
max recip 7x6 error: 2^-6.79055
max rsqrt 7x6 error: 2^-6.73312
ip 7 op 7 LUT #bits 896 verilog 0 test/test-long 1
max recip 7x7 error: 2^-7.4843
max rsqrt 7x7 error: 2^-7.31422
ip 7 op 8 LUT #bits 1024 verilog 0 test/test-long 1
max recip 7x8 error: 2^-7.77603
max rsqrt 7x8 error: 2^-7.6318
ip 7 op 9 LUT #bits 1152 verilog 0 test/test-long 1
max recip 7x9 error: 2^-7.8889
max rsqrt 7x9 error: 2^-7.87831
ip 7 op 10 LUT #bits 1280 verilog 0 test/test-long 1
max recip 7x10 error: 2^-7.94879
max rsqrt 7x10 error: 2^-7.89712
ip 7 op 11 LUT #bits 1408 verilog 0 test/test-long 1
max recip 7x11 error: 2^-7.97629
max rsqrt 7x11 error: 2^-8
ip 8 op 7 LUT #bits 1792 verilog 0 test/test-long 1
max recip 8x7 error: 2^-7.77602
max rsqrt 8x7 error: 2^-7.72555
estimate width, op=0, out of range reset to default
ip 8 op 8 LUT #bits 2048 verilog 0 test/test-long 1
max recip 8x8 error: 2^-8.45311
max rsqrt 8x8 error: 2^-8.25349
ip 8 op 9 LUT #bits 2304 verilog 0 test/test-long 1
max recip 8x9 error: 2^-8.71923
max rsqrt 8x9 error: 2^-8.67807
estimate width, op=0, out of range reset to default
ip 9 op 9 LUT #bits 4608 verilog 0 test/test-long 1
max recip 9x9 error: 2^-9.43021
max rsqrt 9x9 error: 2^-9.28082
It came from some testing I was performing on adjacent index values.David,
Here are a series of statements leading to my worst case answer:
- For the mantissa range 0xF5_0000 to 0xF5_FFFF, the reciprocal estimate is 0x85_0000
- The largest error is for 0xF5_0000
- The reciprocal of 0xF5_0000, to "infinite" precision is 0x85_BF37.612D
- The relative error, then, is (0x85_0000 - 0x85_BF37.612D)/0x85_BF37.612D => -0x0.016E_0000_0022
- The log2 of the absolute value of that error is: -7.484300
I don't have any errors as large as to have a log2 of -7.36951. Where did that error come from for you?
Completely bogus as I mentioned above.
I will instrument the code for more details, but I suspect this code has exactly the same worst case situation (for 7x7).
Bill
On 7/31/20 2:25 PM, DSHORNER wrote:
EXTERNAL MAILThis is the program Andrew wrote.
https://github.com/riscv/riscv-v-spec/blob/vfrecip/recip.cc
On 2020-07-31 4:46 p.m., Bill Huffman wrote:
That is correct, Andrew's approach assumes the implicit high hidden bit.David,
Because of the errors you get, I'm assuming your "output width" and "input width" do not include the hidden bit. Right?
Andrew chose a range from [xn , (x+1)n) perhaps (xn,(x+1)n] will work better.It's interesting. I did a similar exercise a number of years ago and got a few hundredths of a bit better accuracy from 7/7 tables. It's possible I did it wrong. It's also possible that there's a slight improvement available.
I will give it a try.
If you want to send me the tables I can compare. Mine are in decimal numbers from 128 to 255. I could send you tables as well.
As mentioned Andrew's --verilog directive creates a table; both input and output in range from 0 to 127.
I wouldn't expect the bias to make any significant difference.
I'd be happy to see your tables.
If you want I will send Andrew's program's output for 7x7.
And any of the other listed combinations from my mods to his program.
I will post my mods even though I still get that seg fault with null (or single) command line args.
Then you could run your own.