Re: VFRECIP/VFRSQRT instructions


Andrew Waterman
 




On Tue, Aug 11, 2020 at 3:35 PM Bill Huffman <huffman@...> wrote:


On 8/11/20 3:00 PM, Andrew Waterman wrote:
EXTERNAL MAIL



On Tue, Aug 11, 2020 at 1:56 PM Bill Huffman <huffman@...> wrote:

Hi Andrew,

I'm looking at the cases where the reciprocal is near the boundary between finite and infinite or between normal and denormal.  Are you trying to get the boundaries approximately right?  Or exactly?  For example, the point at which the reciprocal of a large positive denorm falls over the boundary between MAXPOS and +Inf is different for RUP and RNE.  The setting of OF changes at the same point.  There's yet another point at which OF changes for RDN even though the answer doesn't change.  You don't show UF set anywhere.


There are no cases where UF should be raised because there are no cases where denormalization causes loss of precision.  When the result is subnormal, it is only subnormal by either one or two positions; the denormalized 7-bit significand plus two bits of right-shift fits within all of our formats' significands.  (This property doesn't hold for bfloat16, but that point might be moot if our variant of that format always flushes subnormals to zero.)

Ah, so you're counting the 7-bit (plus hidden bit) result as the absolutely correct answer.  There's no relationship here to the infinite precision reciprocal we're approximating.  This is an instruction that throws away 16 bits of input mantissa, does a table lookup, and gives an answer that's exactly 7 bits (plus hidden bit).  The relationship of this instruction to a reciprocal is one of motivation and not closer than that.

I think that's the answer to the paradigm question I had.  I'll think about that a bit and see what I think of your edge case results and flags then.

Ah, that clarifies your earlier question.  Yeah, LMK what you think. 

      Bill


As to the large positive denorm input case: the only case where this scheme and IEEE (1.0 / x) differ in the finity of the result, or differ in whether OF is raised, is for the exact input 2^-(B+1), depending on the rounding mode.  We always produce a finite result for this case, but there's an arguable reason for it: we're actually computing the reciprocal of some number near the midpoint of the interval ( 2^-(B+1), nextafter(2^-(B+1)) ), the result of which is finite, regardless of the rounding mode.

So, I'm wondering what your paradigm is for the edge cases.  I can see it might not be worth being too complicated since the answer isn't very exact.  The paradigm is further complicated by the idea that the answer may be refined by further steps.  :-)


Yeah... the intent was to have reasonable fidelity.  I think you can argue the 2^-(B+1) case either way, but other ISAs have resolved it the same way I did.  And it's clearly a feature that corner-case detection doesn't depend on the significand (except for its zeroness, that is).


      Bill


On 8/10/20 8:54 PM, Andrew Waterman wrote:
EXTERNAL MAIL


On Mon, Aug 3, 2020 at 5:44 PM Bill Huffman <huffman@...> wrote:


On 8/3/20 1:41 PM, Andrew Waterman wrote:
EXTERNAL MAIL



On Mon, Aug 3, 2020 at 12:40 PM Bill Huffman <huffman@...> wrote:

The recip table matches mine as does the worst case error.

I have one different entry in the square root table.  For entry 77, where you have 36, I have 37.  I'm not sure whether it matters.  Also, ages ago, I got a very small difference in worst case error of 2^-7.317 but I haven't gone back to trace anything down about that.


Thanks for validating against your table, Bill.

With my value for that entry, the worst error on the interval of interest is 2^-7.32041, for input 0x3f1a0000.  With yours, it's 2^-7.3164 for 0x3f1bfffd.

I agree with your computation with a really tiny difference (I get that it just barely rounds to 2^-7.32040).  I can't say why I got 37 when I did it 8-10 years ago - and I don't think I'm going to chase that.  I'm good with 36 at that position in the table.

So, I'm good with the table values below.

     Bill


Presumably the error's slightly smaller for my scheme because I'm picking the output value that minimizes the maximum error on the interval, rather than picking the midpoint or similar.  Of course, the overall worst error is unaffected.

      Bill

On 8/3/20 11:38 AM, DSHORNER wrote:
EXTERNAL MAIL

Now annotated version --detail
https://github.com/David-Horner/recip/blob/master/vrecip.cc

For the 7x7 below notice the biased value does not exceed 21 for recip (5 of 7 bits) and 15 for rsqrt (4 of 7 bits).

ip 7 op 7 LUT #bits 896 verilog 0  test/test-long 1
Recip7x7LUT (input [6:0] in, output reg [6:0] out);
 in[6:0]  corresponds to sig[S-1:S-6]
 out[6:0] corresponds to sig[S-1:S-6]
 biased : ((ipN-1) - in) << (op - ip) // or >> if neg
 base bias 127  left-shift 0 right-shift 0
 0: out = 127 biased 0; lerr 0.00390625 rerr 0.00387573 larg 0.5 rarg 0.503906
 1: out = 125 biased 1; lerr 0.0039978 rerr 0.00372314 larg 0.503906 rarg 0.507812
 2: out = 123 biased 2; lerr 0.00421143 rerr 0.00344849 larg 0.507812 rarg 0.511719
 3: out = 121 biased 3; lerr 0.00454712 rerr 0.00305176 larg 0.511719 rarg 0.515625
 4: out = 119 biased 4; lerr 0.00500488 rerr 0.00253296 larg 0.515625 rarg 0.519531
 5: out = 117 biased 5; lerr 0.00558472 rerr 0.00189209 larg 0.519531 rarg 0.523438
 6: out = 116 biased 5; lerr 0.00219727 rerr 0.00524902 larg 0.523438 rarg 0.527344
 7: out = 114 biased 6; lerr 0.00299072 rerr 0.00439453 larg 0.527344 rarg 0.53125
 8: out = 112 biased 7; lerr 0.00390625 rerr 0.00341797 larg 0.53125 rarg 0.535156
 9: out = 110 biased 8; lerr 0.00494385 rerr 0.00231934 larg 0.535156 rarg 0.539062
 10: out = 109 biased 8; lerr 0.00189209 rerr 0.00534058 larg 0.539062 rarg 0.542969
 11: out = 107 biased 9; lerr 0.00314331 rerr 0.00402832 larg 0.542969 rarg 0.546875
 12: out = 105 biased 10; lerr 0.0045166 rerr 0.00259399 larg 0.546875 rarg 0.550781
 13: out = 104 biased 10; lerr 0.00170898 rerr 0.00537109 larg 0.550781 rarg 0.554688
 14: out = 102 biased 11; lerr 0.0032959 rerr 0.00372314 larg 0.554688 rarg 0.558594
 15: out = 100 biased 12; lerr 0.00500488 rerr 0.00195312 larg 0.558594 rarg 0.5625
 16: out = 99 biased 12; lerr 0.00244141 rerr 0.00448608 larg 0.5625 rarg 0.566406
 17: out = 97 biased 13; lerr 0.00436401 rerr 0.00250244 larg 0.566406 rarg 0.570312
 18: out = 96 biased 13; lerr 0.00195312 rerr 0.00488281 larg 0.570312 rarg 0.574219
 19: out = 94 biased 14; lerr 0.00408936 rerr 0.00268555 larg 0.574219 rarg 0.578125
 20: out = 93 biased 14; lerr 0.00183105 rerr 0.00491333 larg 0.578125 rarg 0.582031
 21: out = 91 biased 15; lerr 0.00418091 rerr 0.00250244 larg 0.582031 rarg 0.585938
 22: out = 90 biased 15; lerr 0.0020752 rerr 0.00457764 larg 0.585938 rarg 0.589844
 23: out = 88 biased 16; lerr 0.00463867 rerr 0.00195312 larg 0.589844 rarg 0.59375
 24: out = 87 biased 16; lerr 0.00268555 rerr 0.00387573 larg 0.59375 rarg 0.597656
 25: out = 85 biased 17; lerr 0.00546265 rerr 0.0010376 larg 0.597656 rarg 0.601562
 26: out = 84 biased 17; lerr 0.00366211 rerr 0.00280762 larg 0.601562 rarg 0.605469
 27: out = 83 biased 17; lerr 0.00192261 rerr 0.0045166 larg 0.605469 rarg 0.609375
 28: out = 81 biased 18; lerr 0.00500488 rerr 0.00137329 larg 0.609375 rarg 0.613281
 29: out = 80 biased 18; lerr 0.00341797 rerr 0.00292969 larg 0.613281 rarg 0.617188
 30: out = 79 biased 18; lerr 0.00189209 rerr 0.00442505 larg 0.617188 rarg 0.621094
 31: out = 77 biased 19; lerr 0.00527954 rerr 0.000976562 larg 0.621094 rarg 0.625
 32: out = 76 biased 19; lerr 0.00390625 rerr 0.00231934 larg 0.625 rarg 0.628906
 33: out = 75 biased 19; lerr 0.00259399 rerr 0.00360107 larg 0.628906 rarg 0.632812
 34: out = 74 biased 19; lerr 0.00134277 rerr 0.00482178 larg 0.632812 rarg 0.636719
 35: out = 72 biased 20; lerr 0.00512695 rerr 0.000976562 larg 0.636719 rarg 0.640625
 36: out = 71 biased 20; lerr 0.00402832 rerr 0.00204468 larg 0.640625 rarg 0.644531
 37: out = 70 biased 20; lerr 0.00299072 rerr 0.00305176 larg 0.644531 rarg 0.648438
 38: out = 69 biased 20; lerr 0.00201416 rerr 0.0039978 larg 0.648438 rarg 0.652344
 39: out = 68 biased 20; lerr 0.00109863 rerr 0.00488281 larg 0.652344 rarg 0.65625
 40: out = 66 biased 21; lerr 0.00537109 rerr 0.000549316 larg 0.65625 rarg 0.660156
 41: out = 65 biased 21; lerr 0.00460815 rerr 0.00128174 larg 0.660156 rarg 0.664062
 42: out = 64 biased 21; lerr 0.00390625 rerr 0.00195312 larg 0.664062 rarg 0.667969
 43: out = 63 biased 21; lerr 0.00326538 rerr 0.00256348 larg 0.667969 rarg 0.671875
 44: out = 62 biased 21; lerr 0.00268555 rerr 0.00311279 larg 0.671875 rarg 0.675781
 45: out = 61 biased 21; lerr 0.00216675 rerr 0.00360107 larg 0.675781 rarg 0.679688
 46: out = 60 biased 21; lerr 0.00170898 rerr 0.00402832 larg 0.679688 rarg 0.683594
 47: out = 59 biased 21; lerr 0.00131226 rerr 0.00439453 larg 0.683594 rarg 0.6875
 48: out = 58 biased 21; lerr 0.000976562 rerr 0.00469971 larg 0.6875 rarg 0.691406
 49: out = 57 biased 21; lerr 0.000701904 rerr 0.00494385 larg 0.691406 rarg 0.695312
 50: out = 56 biased 21; lerr 0.000488281 rerr 0.00512695 larg 0.695312 rarg 0.699219
 51: out = 55 biased 21; lerr 0.000335693 rerr 0.00524902 larg 0.699219 rarg 0.703125
 52: out = 54 biased 21; lerr 0.000244141 rerr 0.00531006 larg 0.703125 rarg 0.707031
 53: out = 53 biased 21; lerr 0.000213623 rerr 0.00531006 larg 0.707031 rarg 0.710938
 54: out = 52 biased 21; lerr 0.000244141 rerr 0.00524902 larg 0.710938 rarg 0.714844
 55: out = 51 biased 21; lerr 0.000335693 rerr 0.00512695 larg 0.714844 rarg 0.71875
 56: out = 50 biased 21; lerr 0.000488281 rerr 0.00494385 larg 0.71875 rarg 0.722656
 57: out = 49 biased 21; lerr 0.000701904 rerr 0.00469971 larg 0.722656 rarg 0.726562
 58: out = 48 biased 21; lerr 0.000976562 rerr 0.00439453 larg 0.726562 rarg 0.730469
 59: out = 47 biased 21; lerr 0.00131226 rerr 0.00402832 larg 0.730469 rarg 0.734375
 60: out = 46 biased 21; lerr 0.00170898 rerr 0.00360107 larg 0.734375 rarg 0.738281
 61: out = 45 biased 21; lerr 0.00216675 rerr 0.00311279 larg 0.738281 rarg 0.742188
 62: out = 44 biased 21; lerr 0.00268555 rerr 0.00256348 larg 0.742188 rarg 0.746094
 63: out = 43 biased 21; lerr 0.00326538 rerr 0.00195312 larg 0.746094 rarg 0.75
 64: out = 42 biased 21; lerr 0.00390625 rerr 0.00128174 larg 0.75 rarg 0.753906
 65: out = 41 biased 21; lerr 0.00460815 rerr 0.000549316 larg 0.753906 rarg 0.757812
 66: out = 40 biased 21; lerr 0.00537109 rerr 0.000244141 larg 0.757812 rarg 0.761719
 67: out = 40 biased 20; lerr 0.000244141 rerr 0.00488281 larg 0.761719 rarg 0.765625
 68: out = 39 biased 20; lerr 0.00109863 rerr 0.0039978 larg 0.765625 rarg 0.769531
 69: out = 38 biased 20; lerr 0.00201416 rerr 0.00305176 larg 0.769531 rarg 0.773438
 70: out = 37 biased 20; lerr 0.00299072 rerr 0.00204468 larg 0.773438 rarg 0.777344
 71: out = 36 biased 20; lerr 0.00402832 rerr 0.000976562 larg 0.777344 rarg 0.78125
 72: out = 35 biased 20; lerr 0.00512695 rerr 0.000152588 larg 0.78125 rarg 0.785156
 73: out = 35 biased 19; lerr 0.000152588 rerr 0.00482178 larg 0.785156 rarg 0.789062
 74: out = 34 biased 19; lerr 0.00134277 rerr 0.00360107 larg 0.789062 rarg 0.792969
 75: out = 33 biased 19; lerr 0.00259399 rerr 0.00231934 larg 0.792969 rarg 0.796875
 76: out = 32 biased 19; lerr 0.00390625 rerr 0.000976562 larg 0.796875 rarg 0.800781
 77: out = 31 biased 19; lerr 0.00527954 rerr 0.000427246 larg 0.800781 rarg 0.804688
 78: out = 31 biased 18; lerr 0.000427246 rerr 0.00442505 larg 0.804688 rarg 0.808594
 79: out = 30 biased 18; lerr 0.00189209 rerr 0.00292969 larg 0.808594 rarg 0.8125
 80: out = 29 biased 18; lerr 0.00341797 rerr 0.00137329 larg 0.8125 rarg 0.816406
 81: out = 28 biased 18; lerr 0.00500488 rerr 0.000244141 larg 0.816406 rarg 0.820312
 82: out = 28 biased 17; lerr 0.000244141 rerr 0.0045166 larg 0.820312 rarg 0.824219
 83: out = 27 biased 17; lerr 0.00192261 rerr 0.00280762 larg 0.824219 rarg 0.828125
 84: out = 26 biased 17; lerr 0.00366211 rerr 0.0010376 larg 0.828125 rarg 0.832031
 85: out = 25 biased 17; lerr 0.00546265 rerr 0.000793457 larg 0.832031 rarg 0.835938
 86: out = 25 biased 16; lerr 0.000793457 rerr 0.00387573 larg 0.835938 rarg 0.839844
 87: out = 24 biased 16; lerr 0.00268555 rerr 0.00195312 larg 0.839844 rarg 0.84375
 88: out = 23 biased 16; lerr 0.00463867 rerr 3.05176E-05 larg 0.84375 rarg 0.847656
 89: out = 23 biased 15; lerr 3.05176E-05 rerr 0.00457764 larg 0.847656 rarg 0.851562
 90: out = 22 biased 15; lerr 0.0020752 rerr 0.00250244 larg 0.851562 rarg 0.855469
 91: out = 21 biased 15; lerr 0.00418091 rerr 0.000366211 larg 0.855469 rarg 0.859375
 92: out = 21 biased 14; lerr 0.000366211 rerr 0.00491333 larg 0.859375 rarg 0.863281
 93: out = 20 biased 14; lerr 0.00183105 rerr 0.00268555 larg 0.863281 rarg 0.867188
 94: out = 19 biased 14; lerr 0.00408936 rerr 0.000396729 larg 0.867188 rarg 0.871094
 95: out = 19 biased 13; lerr 0.000396729 rerr 0.00488281 larg 0.871094 rarg 0.875
 96: out = 18 biased 13; lerr 0.00195312 rerr 0.00250244 larg 0.875 rarg 0.878906
 97: out = 17 biased 13; lerr 0.00436401 rerr 6.10352E-05 larg 0.878906 rarg 0.882812
 98: out = 17 biased 12; lerr 6.10352E-05 rerr 0.00448608 larg 0.882812 rarg 0.886719
 99: out = 16 biased 12; lerr 0.00244141 rerr 0.00195312 larg 0.886719 rarg 0.890625
 100: out = 15 biased 12; lerr 0.00500488 rerr 0.000640869 larg 0.890625 rarg 0.894531
 101: out = 15 biased 11; lerr 0.000640869 rerr 0.00372314 larg 0.894531 rarg 0.898438
 102: out = 14 biased 11; lerr 0.0032959 rerr 0.0010376 larg 0.898438 rarg 0.902344
 103: out = 14 biased 10; lerr 0.0010376 rerr 0.00537109 larg 0.902344 rarg 0.90625
 104: out = 13 biased 10; lerr 0.00170898 rerr 0.00259399 larg 0.90625 rarg 0.910156
 105: out = 12 biased 10; lerr 0.0045166 rerr 0.000244141 larg 0.910156 rarg 0.914062
 106: out = 12 biased 9; lerr 0.000244141 rerr 0.00402832 larg 0.914062 rarg 0.917969
 107: out = 11 biased 9; lerr 0.00314331 rerr 0.00109863 larg 0.917969 rarg 0.921875
 108: out = 11 biased 8; lerr 0.00109863 rerr 0.00534058 larg 0.921875 rarg 0.925781
 109: out = 10 biased 8; lerr 0.00189209 rerr 0.00231934 larg 0.925781 rarg 0.929688
 110: out = 9 biased 8; lerr 0.00494385 rerr 0.000762939 larg 0.929688 rarg 0.933594
 111: out = 9 biased 7; lerr 0.000762939 rerr 0.00341797 larg 0.933594 rarg 0.9375
 112: out = 8 biased 7; lerr 0.00390625 rerr 0.000244141 larg 0.9375 rarg 0.941406
 113: out = 8 biased 6; lerr 0.000244141 rerr 0.00439453 larg 0.941406 rarg 0.945312
 114: out = 7 biased 6; lerr 0.00299072 rerr 0.00112915 larg 0.945312 rarg 0.949219
 115: out = 7 biased 5; lerr 0.00112915 rerr 0.00524902 larg 0.949219 rarg 0.953125
 116: out = 6 biased 5; lerr 0.00219727 rerr 0.00189209 larg 0.953125 rarg 0.957031
 117: out = 5 biased 5; lerr 0.00558472 rerr 0.00152588 larg 0.957031 rarg 0.960938
 118: out = 5 biased 4; lerr 0.00152588 rerr 0.00253296 larg 0.960938 rarg 0.964844
 119: out = 4 biased 4; lerr 0.00500488 rerr 0.000976562 larg 0.964844 rarg 0.96875
 120: out = 4 biased 3; lerr 0.000976562 rerr 0.00305176 larg 0.96875 rarg 0.972656
 121: out = 3 biased 3; lerr 0.00454712 rerr 0.000549316 larg 0.972656 rarg 0.976562
 122: out = 3 biased 2; lerr 0.000549316 rerr 0.00344849 larg 0.976562 rarg 0.980469
 123: out = 2 biased 2; lerr 0.00421143 rerr 0.000244141 larg 0.980469 rarg 0.984375
 124: out = 2 biased 1; lerr 0.000244141 rerr 0.00372314 larg 0.984375 rarg 0.988281
 125: out = 1 biased 1; lerr 0.0039978 rerr 6.10352E-05 larg 0.988281 rarg 0.992188
 126: out = 1 biased 0; lerr 6.10352E-05 rerr 0.00387573 larg 0.992188 rarg 0.996094
 127: out = 0 biased 0; lerr 0.00390625 rerr 0 larg 0.996094 rarg 1

 ... [removed hex data dumping]

RSqrt7x7LUT (input [6:0] in, output reg [6:0] out);
  // in[6] corresponds to exp[0]
  // in[5:0] corresponds to sig[S-1:S-5]
  // out[6:0] corresponds to sig[S-1:S-6]
  // biased : ((ipN-1) - in) << (op - ip)
 0: out 127 biased 0; lerr 0.00390625 rerr 0.00384557 larg 0.25 rarg 0.253906
 1: out 125 biased 1; lerr 0.00402773 rerr 0.00360435 larg 0.253906 rarg 0.257812
 2: out 123 biased 2; lerr 0.00432928 rerr 0.00318533 larg 0.257812 rarg 0.261719
 3: out 121 biased 3; lerr 0.00480818 rerr 0.00259111 larg 0.261719 rarg 0.265625
 4: out 119 biased 4; lerr 0.00546183 rerr 0.00182426 larg 0.265625 rarg 0.269531
 5: out 118 biased 4; lerr 0.0022317 rerr 0.00497249 larg 0.269531 rarg 0.273438
 6: out 116 biased 5; lerr 0.00319802 rerr 0.00389675 larg 0.273438 rarg 0.277344
 7: out 114 biased 6; lerr 0.00433191 rerr 0.00265532 larg 0.277344 rarg 0.28125
 8: out 113 biased 6; lerr 0.00148789 rerr 0.00542232 larg 0.28125 rarg 0.285156
 9: out 111 biased 7; lerr 0.00292144 rerr 0.00388464 larg 0.285156 rarg 0.289062
 10: out 109 biased 8; lerr 0.00451607 rerr 0.0021876 larg 0.289062 rarg 0.292969
 11: out 108 biased 8; lerr 0.00204104 rerr 0.00458999 larg 0.292969 rarg 0.296875
 12: out 106 biased 9; lerr 0.00392348 rerr 0.00260824 larg 0.296875 rarg 0.300781
 13: out 105 biased 9; lerr 0.00167641 rerr 0.00478529 larg 0.300781 rarg 0.304688
 14: out 103 biased 10; lerr 0.00383947 rerr 0.00252584 larg 0.304688 rarg 0.308594
 15: out 102 biased 10; lerr 0.0018141 rerr 0.00448366 larg 0.308594 rarg 0.3125
 16: out 100 biased 11; lerr 0.00425098 rerr 0.00195312 larg 0.3125 rarg 0.316406
 17: out 99 biased 11; lerr 0.00244141 rerr 0.00369747 larg 0.316406 rarg 0.320312
 18: out 97 biased 12; lerr 0.00514568 rerr 0.000902127 larg 0.320312 rarg 0.324219
 19: out 96 biased 12; lerr 0.00354633 rerr 0.00243843 larg 0.324219 rarg 0.328125
 20: out 95 biased 12; lerr 0.00203674 rerr 0.00388594 larg 0.328125 rarg 0.332031
 21: out 93 biased 13; lerr 0.00511752 rerr 0.000717621 larg 0.332031 rarg 0.335938
 22: out 92 biased 13; lerr 0.00381051 rerr 0.00196455 larg 0.335938 rarg 0.339844
 23: out 91 biased 13; lerr 0.00258984 rerr 0.00312603 larg 0.339844 rarg 0.34375
 24: out 90 biased 13; lerr 0.00145446 rerr 0.00420307 larg 0.34375 rarg 0.347656
 25: out 88 biased 14; lerr 0.0050098 rerr 0.000564416 larg 0.347656 rarg 0.351562
 26: out 87 biased 14; lerr 0.00406783 rerr 0.00144985 larg 0.351562 rarg 0.355469
 27: out 86 biased 14; lerr 0.00320806 rerr 0.00225385 larg 0.355469 rarg 0.359375
 28: out 85 biased 14; lerr 0.00242958 rerr 0.00297735 larg 0.359375 rarg 0.363281
 29: out 84 biased 14; lerr 0.00173146 rerr 0.00362122 larg 0.363281 rarg 0.367188
 30: out 83 biased 14; lerr 0.00111284 rerr 0.00418633 larg 0.367188 rarg 0.371094
 31: out 82 biased 14; lerr 0.000572846 rerr 0.00467353 larg 0.371094 rarg 0.375
 32: out 80 biased 15; lerr 0.00489479 rerr 0.00027462 larg 0.375 rarg 0.378906
 33: out 79 biased 15; lerr 0.00453439 rerr 0.000583717 larg 0.378906 rarg 0.382812
 34: out 78 biased 15; lerr 0.00425002 rerr 0.000817442 larg 0.382812 rarg 0.386719
 35: out 77 biased 15; lerr 0.0040409 rerr 0.000976562 larg 0.386719 rarg 0.390625
 36: out 76 biased 15; lerr 0.00390625 rerr 0.00106183 larg 0.390625 rarg 0.394531
 37: out 75 biased 15; lerr 0.00384534 rerr 0.00107398 larg 0.394531 rarg 0.398438
 38: out 74 biased 15; lerr 0.00385742 rerr 0.00101372 larg 0.398438 rarg 0.402344
 39: out 73 biased 15; lerr 0.00394179 rerr 0.00088176 larg 0.402344 rarg 0.40625
 40: out 72 biased 15; lerr 0.00409775 rerr 0.000678786 larg 0.40625 rarg 0.410156
 41: out 71 biased 15; lerr 0.00432461 rerr 0.000405468 larg 0.410156 rarg 0.414062
 42: out 70 biased 15; lerr 0.0046217 rerr 6.24637E-05 larg 0.414062 rarg 0.417969
 43: out 70 biased 14; lerr 6.24637E-05 rerr 0.00472478 larg 0.417969 rarg 0.421875
 44: out 69 biased 14; lerr 0.000349583 rerr 0.00426776 larg 0.421875 rarg 0.425781
 45: out 68 biased 14; lerr 0.000830041 rerr 0.00374284 larg 0.425781 rarg 0.429688
 46: out 67 biased 14; lerr 0.00137829 rerr 0.00315063 larg 0.429688 rarg 0.433594
 47: out 66 biased 14; lerr 0.00199374 rerr 0.00249171 larg 0.433594 rarg 0.4375
 48: out 65 biased 14; lerr 0.00267578 rerr 0.00176667 larg 0.4375 rarg 0.441406
 49: out 64 biased 14; lerr 0.00342383 rerr 0.000976086 larg 0.441406 rarg 0.445312
 50: out 63 biased 14; lerr 0.00423733 rerr 0.000120513 larg 0.445312 rarg 0.449219
 51: out 63 biased 13; lerr 0.000120513 rerr 0.00445945 larg 0.449219 rarg 0.453125
 52: out 62 biased 13; lerr 0.000799499 rerr 0.00349816 larg 0.453125 rarg 0.457031
 53: out 61 biased 13; lerr 0.00178341 rerr 0.00247339 larg 0.457031 rarg 0.460938
 54: out 60 biased 13; lerr 0.0028307 rerr 0.00138568 larg 0.460938 rarg 0.464844
 55: out 59 biased 13; lerr 0.00394084 rerr 0.00023553 larg 0.464844 rarg 0.46875
 56: out 59 biased 12; lerr 0.00023553 rerr 0.00439453 larg 0.46875 rarg 0.472656
 57: out 58 biased 12; lerr 0.000976562 rerr 0.00314314 larg 0.472656 rarg 0.476562
 58: out 57 biased 12; lerr 0.0022501 rerr 0.00183069 larg 0.476562 rarg 0.480469
 59: out 56 biased 12; lerr 0.00358461 rerr 0.000457659 larg 0.480469 rarg 0.484375
 60: out 56 biased 11; lerr 0.000457659 rerr 0.00448366 larg 0.484375 rarg 0.488281
 61: out 55 biased 11; lerr 0.000975489 rerr 0.00301265 larg 0.488281 rarg 0.492188
 62: out 54 biased 11; lerr 0.00246829 rerr 0.00148234 larg 0.492188 rarg 0.496094
 63: out 53 biased 11; lerr 0.00402031 rerr 0.000106817 larg 0.496094 rarg 0.5
 64: out 52 biased 11; lerr 0.00563109 rerr 0.00210731 larg 0.5 rarg 0.507812
 65: out 51 biased 11; lerr 0.00345996 rerr 0.00417648 larg 0.507812 rarg 0.515625
 66: out 50 biased 11; lerr 0.00143345 rerr 0.00610301 larg 0.515625 rarg 0.523438
 67: out 48 biased 12; lerr 0.00520152 rerr 0.00219486 larg 0.523438 rarg 0.53125
 68: out 47 biased 12; lerr 0.00349943 rerr 0.00380104 larg 0.53125 rarg 0.539062
 69: out 46 biased 12; lerr 0.00193497 rerr 0.00527137 larg 0.539062 rarg 0.546875
 70: out 44 biased 13; lerr 0.00628347 rerr 0.000789331 larg 0.546875 rarg 0.554688
 71: out 43 biased 13; lerr 0.00502921 rerr 0.00195312 larg 0.554688 rarg 0.5625
 72: out 42 biased 13; lerr 0.00390625 rerr 0.00298721 larg 0.5625 rarg 0.570312
 73: out 41 biased 13; lerr 0.00291271 rerr 0.00389343 larg 0.570312 rarg 0.578125
 74: out 40 biased 13; lerr 0.00204677 rerr 0.00467353 larg 0.578125 rarg 0.585938
 75: out 39 biased 13; lerr 0.00130667 rerr 0.00532924 larg 0.585938 rarg 0.59375
 76: out 38 biased 13; lerr 0.000690699 rerr 0.00586222 larg 0.59375 rarg 0.601562
 77: out 36 biased 14; lerr 0.0062566 rerr 0.000175461 larg 0.601562 rarg 0.609375
 78: out 35 biased 14; lerr 0.00592317 rerr 0.000428823 larg 0.609375 rarg 0.617188
 79: out 34 biased 14; lerr 0.00570878 rerr 0.000564416 larg 0.617188 rarg 0.625
 80: out 33 biased 14; lerr 0.00561191 rerr 0.000583717 larg 0.625 rarg 0.632812
 81: out 32 biased 14; lerr 0.00563109 rerr 0.000488162 larg 0.632812 rarg 0.640625
 82: out 31 biased 14; lerr 0.00576489 rerr 0.000279149 larg 0.640625 rarg 0.648438
 83: out 30 biased 14; lerr 0.00601191 rerr 4.19626E-05 larg 0.648438 rarg 0.65625
 84: out 30 biased 13; lerr 4.19626E-05 rerr 0.00589256 larg 0.65625 rarg 0.664062
 85: out 29 biased 13; lerr 0.00047385 rerr 0.00538852 larg 0.664062 rarg 0.671875
 86: out 28 biased 13; lerr 0.00101522 rerr 0.00477604 larg 0.671875 rarg 0.679688
 87: out 27 biased 13; lerr 0.00166483 rerr 0.00405633 larg 0.679688 rarg 0.6875
 88: out 26 biased 13; lerr 0.00242145 rerr 0.0032306 larg 0.6875 rarg 0.695312
 89: out 25 biased 13; lerr 0.00328389 rerr 0.0023 larg 0.695312 rarg 0.703125
 90: out 24 biased 13; lerr 0.00425098 rerr 0.00126568 larg 0.703125 rarg 0.710938
 91: out 23 biased 13; lerr 0.0053216 rerr 0.000128738 larg 0.710938 rarg 0.71875
 92: out 23 biased 12; lerr 0.000128738 rerr 0.00554953 larg 0.71875 rarg 0.726562
 93: out 22 biased 12; lerr 0.00110974 rerr 0.00424628 larg 0.726562 rarg 0.734375
 94: out 21 biased 12; lerr 0.0024487 rerr 0.00284339 larg 0.734375 rarg 0.742188
 95: out 20 biased 12; lerr 0.0038871 rerr 0.00134187 larg 0.742188 rarg 0.75
 96: out 19 biased 12; lerr 0.00542395 rerr 0.000257287 larg 0.75 rarg 0.757812
 97: out 19 biased 11; lerr 0.000257287 rerr 0.00488281 larg 0.757812 rarg 0.765625
 98: out 18 biased 11; lerr 0.00195312 rerr 0.00312603 larg 0.765625 rarg 0.773438
 99: out 17 biased 11; lerr 0.0037447 rerr 0.00127425 larg 0.773438 rarg 0.78125
 100: out 16 biased 11; lerr 0.00563109 rerr 0.000671612 larg 0.78125 rarg 0.789062
 101: out 16 biased 10; lerr 0.000671612 rerr 0.00426337 larg 0.789062 rarg 0.796875
 102: out 15 biased 10; lerr 0.00271068 rerr 0.00216607 larg 0.796875 rarg 0.804688
 103: out 14 biased 10; lerr 0.00484208 rerr 2.28884E-05 larg 0.804688 rarg 0.8125
 104: out 14 biased 9; lerr 2.28884E-05 rerr 0.00477319 larg 0.8125 rarg 0.820312
 105: out 13 biased 9; lerr 0.00230268 rerr 0.00243701 larg 0.820312 rarg 0.828125
 106: out 12 biased 9; lerr 0.00467248 rerr 1.1444E-05 larg 0.828125 rarg 0.835938
 107: out 12 biased 8; lerr 1.1444E-05 rerr 0.00467353 larg 0.835938 rarg 0.84375
 108: out 11 biased 8; lerr 0.00250271 rerr 0.00210469 larg 0.84375 rarg 0.851562
 109: out 10 biased 8; lerr 0.0051047 rerr 0.000551376 larg 0.851562 rarg 0.859375
 110: out 10 biased 7; lerr 0.000551376 rerr 0.00398129 larg 0.859375 rarg 0.867188
 111: out 9 biased 7; lerr 0.00329393 rerr 0.00118567 larg 0.867188 rarg 0.875
 112: out 9 biased 6; lerr 0.00118567 rerr 0.00564531 larg 0.875 rarg 0.882812
 113: out 8 biased 6; lerr 0.00169516 rerr 0.00271239 larg 0.882812 rarg 0.890625
 114: out 7 biased 6; lerr 0.0046605 rerr 0.000304507 larg 0.890625 rarg 0.898438
 115: out 7 biased 5; lerr 0.000304507 rerr 0.00403259 larg 0.898438 rarg 0.90625
 116: out 6 biased 5; lerr 0.00340469 rerr 0.00088176 larg 0.90625 rarg 0.914062
 117: out 6 biased 4; lerr 0.00088176 rerr 0.00514993 larg 0.914062 rarg 0.921875
 118: out 5 biased 4; lerr 0.00235119 rerr 0.00186722 larg 0.921875 rarg 0.929688
 119: out 4 biased 4; lerr 0.00566562 rerr 0.00149648 larg 0.929688 rarg 0.9375
 120: out 4 biased 3; lerr 0.00149648 rerr 0.00265532 larg 0.9375 rarg 0.945312
 121: out 3 biased 3; lerr 0.00494055 rerr 0.0008372 larg 0.945312 rarg 0.953125
 122: out 3 biased 2; lerr 0.0008372 rerr 0.00324937 larg 0.953125 rarg 0.960938
 123: out 2 biased 2; lerr 0.00440902 rerr 0.000370094 larg 0.960938 rarg 0.96875
 124: out 2 biased 1; lerr 0.000370094 rerr 0.00365258 larg 0.96875 rarg 0.976562
 125: out 1 biased 1; lerr 0.00406783 rerr 9.20338E-05 larg 0.976562 rarg 0.984375
 126: out 1 biased 0; lerr 9.20338E-05 rerr 0.00386801 larg 0.984375 rarg 0.992188
 127: out 0 biased 0; lerr 0.00391391 rerr 0 larg 0.992188 rarg 1

 ... [removed hex data dumping]

max recip 7x7 error at 0.519531: 0.00558472 or 2^-7.4843
max rsqrt 7x7 error at 0.546875: 0.00628347 or  2^-7.31422


On 2020-08-03 1:17 p.m., Bill Huffman wrote:

I should have said that my results are for the 7/7 case.  And it sounds like we're in agreement then.  We probably have the same table.

      Bill

On 8/2/20 9:50 AM, DSHORNER wrote:
EXTERNAL MAIL

This is the link to the revised code that does n by m LUT


https://github.com/David-Horner/recip/blob/master/vrecip.cc

On 2020-08-01 4:51 p.m., David Horner via lists.riscv.org wrote:


Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.