#### Re: VFRECIP/VFRSQRT instructions

Bill Huffman

Andrew,

I'll start at the top here... and with rsqrt since it's simpler.  I think the table and most of the commentary is fine.  I can follow the operation description.  Sort of.  But I'm trying to figure out how it can be improved. It currently says:

For the non-exceptional cases, the result is computed as follows. Let the normalized input exponent be equal to the input exponent if the input is normal, or 0 minus the number of leading zeros in the significand otherwise. If the input is subnormal, the normalized input significand is given by shifting the input significand left by 1 minus the normalized input exponent, discarding the leading 1 bit. The output exponent equals floor((3*B - 1 - the normalized input exponent) / 2). The output sign equals the input sign.

The following table gives the seven MSBs of the output significand as a function of the LSB of the normalized input exponent and the six MSBs of the normalized input significand; the other bits of the output significand are zero.

I wonder if a high level description given first might help.  For example:

For the non-exceptional cases the low bit of exponent and the six bits of significand (after the leading one) are concatenated and used to address the following table.  The output of the table becomes the seven bits of the result significand (after the leading one) and the remainder of the result signifcand is zero.  Denorm inputs are normalized and the exponent adjusted appropriately before the lookup.  The output exponent is chosen to make the result approximate the reciprocal of the square root of the argument.

More precisely, the result is computed as follows.  .... <your description>

Bill

On 8/12/20 9:19 PM, Andrew Waterman wrote:
EXTERNAL MAIL

On Wed, Aug 12, 2020 at 8:36 PM Bill Huffman <huffman@...> wrote:

On 8/12/20 7:05 PM, Andrew Waterman wrote:
EXTERNAL MAIL

On Wed, Aug 12, 2020 at 6:56 PM Bill Huffman <huffman@...> wrote:

On 8/12/20 4:21 PM, Andrew Waterman wrote:
EXTERNAL MAIL

On Wed, Aug 12, 2020 at 3:37 PM Bill Huffman <huffman@...> wrote:

On 8/12/20 3:32 PM, Andrew Waterman wrote:
EXTERNAL MAIL

On Wed, Aug 12, 2020 at 3:18 PM Bill Huffman <huffman@...> wrote:

On 8/11/20 4:11 PM, Andrew Waterman wrote:
EXTERNAL MAIL

On Tue, Aug 11, 2020 at 3:35 PM Bill Huffman <huffman@...> wrote:

On 8/11/20 3:00 PM, Andrew Waterman wrote:
EXTERNAL MAIL

On Tue, Aug 11, 2020 at 1:56 PM Bill Huffman <huffman@...> wrote:

Hi Andrew,

I'm looking at the cases where the reciprocal is near the boundary between finite and infinite or between normal and denormal.  Are you trying to get the boundaries approximately right?  Or exactly?  For example, the point at which the reciprocal of a large positive denorm falls over the boundary between MAXPOS and +Inf is different for RUP and RNE.  The setting of OF changes at the same point.  There's yet another point at which OF changes for RDN even though the answer doesn't change.  You don't show UF set anywhere.

There are no cases where UF should be raised because there are no cases where denormalization causes loss of precision.  When the result is subnormal, it is only subnormal by either one or two positions; the denormalized 7-bit significand plus two bits of right-shift fits within all of our formats' significands.  (This property doesn't hold for bfloat16, but that point might be moot if our variant of that format always flushes subnormals to zero.)

Ah, so you're counting the 7-bit (plus hidden bit) result as the absolutely correct answer.  There's no relationship here to the infinite precision reciprocal we're approximating.  This is an instruction that throws away 16 bits of input mantissa, does a table lookup, and gives an answer that's exactly 7 bits (plus hidden bit).  The relationship of this instruction to a reciprocal is one of motivation and not closer than that.

I think that's the answer to the paradigm question I had.  I'll think about that a bit and see what I think of your edge case results and flags then.

Ah, that clarifies your earlier question.  Yeah, LMK what you think.

With that re-orientation to what the instruction means, it looks correct.  I have a couple of comments:

• Just above the table you use the concept of the instruction's "domain."  But the idea of its domain does not seem very clear to me.  I lean toward removing the statement and depending on the table.
• In the first normative paragraph after the table, you use the number of leading zeros in the significand.  That assumes that the term "significand" does not include the "hidden" bit, which is zero in the case of interest.  I think a single-precision significand may be considered to be 23 bits by some and 24 bits by others, leading to some confusion about that sentence.   It might work to reference the leading zeros in the represented part of the significand.
• As I read farther, it's pretty confusing.  I worry for most people reading it.  I wonder if there should be a second table referenced where the first table says "estimate of 1/x" and dealing only with the magnitude of the argument.  The second table would have five rows labeled by operand range - as below - and detail each range with regard to exponent and denormalization:
• 2^(-B-1)   =< x < 2^(-B)
• 2^(-B)       =< x < 2^(-B+1)
• 2^(-B+1) =< x < 2^(B-1)
• 2^(B-1)     =< x < 2^(B)
• 2^(B)         =< x < 2^(B+1)
• The reciprocal square root would be a little different but the same idea would apply.

Any of that make sense?

Yeah, let me play around with the presentation a bit.  I'm not sure whether breaking it into two tables or expanding the current table will be clearer, but your suggestion holds either way.  Thanks for being my guinea pig.

I almost suggested expanding the current table.  That makes it quite a bit larger.  But then, it also means there's no need to clarify the relationship between the two tables.  Maybe that's better.  And it doesn't expand the recip sqrt table.

It is pretty big...  I'm just looking at the recip at this point.  I have a couple of thoughts:

Yeah, but big is OK, I think.
Probably so.

I didn't change the rsqrt table at all.  Since the subnormal cases are mostly uninteresting, I think the NOTE that positive subnormal and normal inputs always produce normal outputs suffices.
That's probably OK.  It's much less confusing.  I wonder if two examples for each (recip and rsqrt) would help.  One with a denormal input and the other normal?

I had been hoping that the reference C code would scratch that itch, but you're probably right.  I've added a tiny example and a huge example for each.

• In the "Output" column for the 5 new positive and negative entries, you have ... > y > ... but I think you should have ... >= y > ... because when the input is 127 the table has output 0.  So when the input is near the "left" end of the input range as expressed in the table, the output is all the way at the left end of the output range and needs the "equal."

It's actually correct as-is, because the output value is never exactly a power of 2.  When the input is exactly a power of 2, the result is always slightly larger than the true reciprocal.  (It's the reciprocal of some number near the midpoint of the interval interval ( 2^n, nextafter(2^n) )).

When the input mantissa (including hidden bit) is 0xFF0000, the output mantissa is 0x800000, if I'm reading the table correctly - 127 in leads to zero out.

The second row in the table has input:

-2B+1 < x ≤ -2B (normal)

Table input 127 is near the left end of the range while table input 0 is absolutely at the right end.

The left end is not representable but is just farther from zero than than 0xFF7F_FFFF single-precision.  The right end is 0xFF00_0000 single-precision.  These turn into 127 and 0 as table inputs and into 0 and 127 as table outputs.  Then they're 0x8020_0000 and 0x803F_C000 as single-precision.  So the left end is equal to -2^-(B+1).

and output is listed as:

-2-(B+1) > y > -2-B (subnormal, sig[MSB:MSB-1]=01)

but should allow the equal on the left, shouldn't it?

My mistake.  I was thinking of the fact that power-of-2 inputs never produce power-of-2 outputs.  You're of course right that just-smaller-than-power-of-2 inputs do produce power-of-2 outputs.  Thanks for the correction.

I also reordered the spec so that vfrsqrte7 shows up before vfrece7, since the former is so much simpler to explain.  More sanity-checking appreciated. https://github.com/riscv/riscv-v-spec/blob/vfrecip/v-spec.adoc#149-vector-floating-point-reciprocal-square-root-estimate-instruction

Bill

• The expressions of subnormal are still awkward.  What about (subnormal 01...) or (subnormal 1...) and explain later what that means.  It would be easier to read (and the table would be a bit smaller).

Thanks, I was hoping someone would suggest a better way of expressing that.

Bill

Bill

Bill

Bill

As to the large positive denorm input case: the only case where this scheme and IEEE (1.0 / x) differ in the finity of the result, or differ in whether OF is raised, is for the exact input 2^-(B+1), depending on the rounding mode.  We always produce a finite result for this case, but there's an arguable reason for it: we're actually computing the reciprocal of some number near the midpoint of the interval ( 2^-(B+1), nextafter(2^-(B+1)) ), the result of which is finite, regardless of the rounding mode.

So, I'm wondering what your paradigm is for the edge cases.  I can see it might not be worth being too complicated since the answer isn't very exact.  The paradigm is further complicated by the idea that the answer may be refined by further steps.  :-)

Yeah... the intent was to have reasonable fidelity.  I think you can argue the 2^-(B+1) case either way, but other ISAs have resolved it the same way I did.  And it's clearly a feature that corner-case detection doesn't depend on the significand (except for its zeroness, that is).

Bill

On 8/10/20 8:54 PM, Andrew Waterman wrote:
EXTERNAL MAIL

On Mon, Aug 3, 2020 at 5:44 PM Bill Huffman <huffman@...> wrote:

On 8/3/20 1:41 PM, Andrew Waterman wrote:
EXTERNAL MAIL

On Mon, Aug 3, 2020 at 12:40 PM Bill Huffman <huffman@...> wrote:

The recip table matches mine as does the worst case error.

I have one different entry in the square root table.  For entry 77, where you have 36, I have 37.  I'm not sure whether it matters.  Also, ages ago, I got a very small difference in worst case error of 2^-7.317 but I haven't gone back to trace anything down about that.

Thanks for validating against your table, Bill.

With my value for that entry, the worst error on the interval of interest is 2^-7.32041, for input 0x3f1a0000.  With yours, it's 2^-7.3164 for 0x3f1bfffd.

I agree with your computation with a really tiny difference (I get that it just barely rounds to 2^-7.32040).  I can't say why I got 37 when I did it 8-10 years ago - and I don't think I'm going to chase that.  I'm good with 36 at that position in the table.

So, I'm good with the table values below.

Bill

Presumably the error's slightly smaller for my scheme because I'm picking the output value that minimizes the maximum error on the interval, rather than picking the midpoint or similar.  Of course, the overall worst error is unaffected.

Bill

On 8/3/20 11:38 AM, DSHORNER wrote:
EXTERNAL MAIL

Now annotated version --detail
https://github.com/David-Horner/recip/blob/master/vrecip.cc

For the 7x7 below notice the biased value does not exceed 21 for recip (5 of 7 bits) and 15 for rsqrt (4 of 7 bits).

ip 7 op 7 LUT #bits 896 verilog 0  test/test-long 1
Recip7x7LUT (input [6:0] in, output reg [6:0] out);
in[6:0]  corresponds to sig[S-1:S-6]
out[6:0] corresponds to sig[S-1:S-6]
biased : ((ipN-1) - in) << (op - ip) // or >> if neg
base bias 127  left-shift 0 right-shift 0
0: out = 127 biased 0; lerr 0.00390625 rerr 0.00387573 larg 0.5 rarg 0.503906
1: out = 125 biased 1; lerr 0.0039978 rerr 0.00372314 larg 0.503906 rarg 0.507812
2: out = 123 biased 2; lerr 0.00421143 rerr 0.00344849 larg 0.507812 rarg 0.511719
3: out = 121 biased 3; lerr 0.00454712 rerr 0.00305176 larg 0.511719 rarg 0.515625
4: out = 119 biased 4; lerr 0.00500488 rerr 0.00253296 larg 0.515625 rarg 0.519531
5: out = 117 biased 5; lerr 0.00558472 rerr 0.00189209 larg 0.519531 rarg 0.523438
6: out = 116 biased 5; lerr 0.00219727 rerr 0.00524902 larg 0.523438 rarg 0.527344
7: out = 114 biased 6; lerr 0.00299072 rerr 0.00439453 larg 0.527344 rarg 0.53125
8: out = 112 biased 7; lerr 0.00390625 rerr 0.00341797 larg 0.53125 rarg 0.535156
9: out = 110 biased 8; lerr 0.00494385 rerr 0.00231934 larg 0.535156 rarg 0.539062
10: out = 109 biased 8; lerr 0.00189209 rerr 0.00534058 larg 0.539062 rarg 0.542969
11: out = 107 biased 9; lerr 0.00314331 rerr 0.00402832 larg 0.542969 rarg 0.546875
12: out = 105 biased 10; lerr 0.0045166 rerr 0.00259399 larg 0.546875 rarg 0.550781
13: out = 104 biased 10; lerr 0.00170898 rerr 0.00537109 larg 0.550781 rarg 0.554688
14: out = 102 biased 11; lerr 0.0032959 rerr 0.00372314 larg 0.554688 rarg 0.558594
15: out = 100 biased 12; lerr 0.00500488 rerr 0.00195312 larg 0.558594 rarg 0.5625
16: out = 99 biased 12; lerr 0.00244141 rerr 0.00448608 larg 0.5625 rarg 0.566406
17: out = 97 biased 13; lerr 0.00436401 rerr 0.00250244 larg 0.566406 rarg 0.570312
18: out = 96 biased 13; lerr 0.00195312 rerr 0.00488281 larg 0.570312 rarg 0.574219
19: out = 94 biased 14; lerr 0.00408936 rerr 0.00268555 larg 0.574219 rarg 0.578125
20: out = 93 biased 14; lerr 0.00183105 rerr 0.00491333 larg 0.578125 rarg 0.582031
21: out = 91 biased 15; lerr 0.00418091 rerr 0.00250244 larg 0.582031 rarg 0.585938
22: out = 90 biased 15; lerr 0.0020752 rerr 0.00457764 larg 0.585938 rarg 0.589844
23: out = 88 biased 16; lerr 0.00463867 rerr 0.00195312 larg 0.589844 rarg 0.59375
24: out = 87 biased 16; lerr 0.00268555 rerr 0.00387573 larg 0.59375 rarg 0.597656
25: out = 85 biased 17; lerr 0.00546265 rerr 0.0010376 larg 0.597656 rarg 0.601562
26: out = 84 biased 17; lerr 0.00366211 rerr 0.00280762 larg 0.601562 rarg 0.605469
27: out = 83 biased 17; lerr 0.00192261 rerr 0.0045166 larg 0.605469 rarg 0.609375
28: out = 81 biased 18; lerr 0.00500488 rerr 0.00137329 larg 0.609375 rarg 0.613281
29: out = 80 biased 18; lerr 0.00341797 rerr 0.00292969 larg 0.613281 rarg 0.617188
30: out = 79 biased 18; lerr 0.00189209 rerr 0.00442505 larg 0.617188 rarg 0.621094
31: out = 77 biased 19; lerr 0.00527954 rerr 0.000976562 larg 0.621094 rarg 0.625
32: out = 76 biased 19; lerr 0.00390625 rerr 0.00231934 larg 0.625 rarg 0.628906
33: out = 75 biased 19; lerr 0.00259399 rerr 0.00360107 larg 0.628906 rarg 0.632812
34: out = 74 biased 19; lerr 0.00134277 rerr 0.00482178 larg 0.632812 rarg 0.636719
35: out = 72 biased 20; lerr 0.00512695 rerr 0.000976562 larg 0.636719 rarg 0.640625
36: out = 71 biased 20; lerr 0.00402832 rerr 0.00204468 larg 0.640625 rarg 0.644531
37: out = 70 biased 20; lerr 0.00299072 rerr 0.00305176 larg 0.644531 rarg 0.648438
38: out = 69 biased 20; lerr 0.00201416 rerr 0.0039978 larg 0.648438 rarg 0.652344
39: out = 68 biased 20; lerr 0.00109863 rerr 0.00488281 larg 0.652344 rarg 0.65625
40: out = 66 biased 21; lerr 0.00537109 rerr 0.000549316 larg 0.65625 rarg 0.660156
41: out = 65 biased 21; lerr 0.00460815 rerr 0.00128174 larg 0.660156 rarg 0.664062
42: out = 64 biased 21; lerr 0.00390625 rerr 0.00195312 larg 0.664062 rarg 0.667969
43: out = 63 biased 21; lerr 0.00326538 rerr 0.00256348 larg 0.667969 rarg 0.671875
44: out = 62 biased 21; lerr 0.00268555 rerr 0.00311279 larg 0.671875 rarg 0.675781
45: out = 61 biased 21; lerr 0.00216675 rerr 0.00360107 larg 0.675781 rarg 0.679688
46: out = 60 biased 21; lerr 0.00170898 rerr 0.00402832 larg 0.679688 rarg 0.683594
47: out = 59 biased 21; lerr 0.00131226 rerr 0.00439453 larg 0.683594 rarg 0.6875
48: out = 58 biased 21; lerr 0.000976562 rerr 0.00469971 larg 0.6875 rarg 0.691406
49: out = 57 biased 21; lerr 0.000701904 rerr 0.00494385 larg 0.691406 rarg 0.695312
50: out = 56 biased 21; lerr 0.000488281 rerr 0.00512695 larg 0.695312 rarg 0.699219
51: out = 55 biased 21; lerr 0.000335693 rerr 0.00524902 larg 0.699219 rarg 0.703125
52: out = 54 biased 21; lerr 0.000244141 rerr 0.00531006 larg 0.703125 rarg 0.707031
53: out = 53 biased 21; lerr 0.000213623 rerr 0.00531006 larg 0.707031 rarg 0.710938
54: out = 52 biased 21; lerr 0.000244141 rerr 0.00524902 larg 0.710938 rarg 0.714844
55: out = 51 biased 21; lerr 0.000335693 rerr 0.00512695 larg 0.714844 rarg 0.71875
56: out = 50 biased 21; lerr 0.000488281 rerr 0.00494385 larg 0.71875 rarg 0.722656
57: out = 49 biased 21; lerr 0.000701904 rerr 0.00469971 larg 0.722656 rarg 0.726562
58: out = 48 biased 21; lerr 0.000976562 rerr 0.00439453 larg 0.726562 rarg 0.730469
59: out = 47 biased 21; lerr 0.00131226 rerr 0.00402832 larg 0.730469 rarg 0.734375
60: out = 46 biased 21; lerr 0.00170898 rerr 0.00360107 larg 0.734375 rarg 0.738281
61: out = 45 biased 21; lerr 0.00216675 rerr 0.00311279 larg 0.738281 rarg 0.742188
62: out = 44 biased 21; lerr 0.00268555 rerr 0.00256348 larg 0.742188 rarg 0.746094
63: out = 43 biased 21; lerr 0.00326538 rerr 0.00195312 larg 0.746094 rarg 0.75
64: out = 42 biased 21; lerr 0.00390625 rerr 0.00128174 larg 0.75 rarg 0.753906
65: out = 41 biased 21; lerr 0.00460815 rerr 0.000549316 larg 0.753906 rarg 0.757812
66: out = 40 biased 21; lerr 0.00537109 rerr 0.000244141 larg 0.757812 rarg 0.761719
67: out = 40 biased 20; lerr 0.000244141 rerr 0.00488281 larg 0.761719 rarg 0.765625
68: out = 39 biased 20; lerr 0.00109863 rerr 0.0039978 larg 0.765625 rarg 0.769531
69: out = 38 biased 20; lerr 0.00201416 rerr 0.00305176 larg 0.769531 rarg 0.773438
70: out = 37 biased 20; lerr 0.00299072 rerr 0.00204468 larg 0.773438 rarg 0.777344
71: out = 36 biased 20; lerr 0.00402832 rerr 0.000976562 larg 0.777344 rarg 0.78125
72: out = 35 biased 20; lerr 0.00512695 rerr 0.000152588 larg 0.78125 rarg 0.785156
73: out = 35 biased 19; lerr 0.000152588 rerr 0.00482178 larg 0.785156 rarg 0.789062
74: out = 34 biased 19; lerr 0.00134277 rerr 0.00360107 larg 0.789062 rarg 0.792969
75: out = 33 biased 19; lerr 0.00259399 rerr 0.00231934 larg 0.792969 rarg 0.796875
76: out = 32 biased 19; lerr 0.00390625 rerr 0.000976562 larg 0.796875 rarg 0.800781
77: out = 31 biased 19; lerr 0.00527954 rerr 0.000427246 larg 0.800781 rarg 0.804688
78: out = 31 biased 18; lerr 0.000427246 rerr 0.00442505 larg 0.804688 rarg 0.808594
79: out = 30 biased 18; lerr 0.00189209 rerr 0.00292969 larg 0.808594 rarg 0.8125
80: out = 29 biased 18; lerr 0.00341797 rerr 0.00137329 larg 0.8125 rarg 0.816406
81: out = 28 biased 18; lerr 0.00500488 rerr 0.000244141 larg 0.816406 rarg 0.820312
82: out = 28 biased 17; lerr 0.000244141 rerr 0.0045166 larg 0.820312 rarg 0.824219
83: out = 27 biased 17; lerr 0.00192261 rerr 0.00280762 larg 0.824219 rarg 0.828125
84: out = 26 biased 17; lerr 0.00366211 rerr 0.0010376 larg 0.828125 rarg 0.832031
85: out = 25 biased 17; lerr 0.00546265 rerr 0.000793457 larg 0.832031 rarg 0.835938
86: out = 25 biased 16; lerr 0.000793457 rerr 0.00387573 larg 0.835938 rarg 0.839844
87: out = 24 biased 16; lerr 0.00268555 rerr 0.00195312 larg 0.839844 rarg 0.84375
88: out = 23 biased 16; lerr 0.00463867 rerr 3.05176E-05 larg 0.84375 rarg 0.847656
89: out = 23 biased 15; lerr 3.05176E-05 rerr 0.00457764 larg 0.847656 rarg 0.851562
90: out = 22 biased 15; lerr 0.0020752 rerr 0.00250244 larg 0.851562 rarg 0.855469
91: out = 21 biased 15; lerr 0.00418091 rerr 0.000366211 larg 0.855469 rarg 0.859375
92: out = 21 biased 14; lerr 0.000366211 rerr 0.00491333 larg 0.859375 rarg 0.863281
93: out = 20 biased 14; lerr 0.00183105 rerr 0.00268555 larg 0.863281 rarg 0.867188
94: out = 19 biased 14; lerr 0.00408936 rerr 0.000396729 larg 0.867188 rarg 0.871094
95: out = 19 biased 13; lerr 0.000396729 rerr 0.00488281 larg 0.871094 rarg 0.875
96: out = 18 biased 13; lerr 0.00195312 rerr 0.00250244 larg 0.875 rarg 0.878906
97: out = 17 biased 13; lerr 0.00436401 rerr 6.10352E-05 larg 0.878906 rarg 0.882812
98: out = 17 biased 12; lerr 6.10352E-05 rerr 0.00448608 larg 0.882812 rarg 0.886719
99: out = 16 biased 12; lerr 0.00244141 rerr 0.00195312 larg 0.886719 rarg 0.890625
100: out = 15 biased 12; lerr 0.00500488 rerr 0.000640869 larg 0.890625 rarg 0.894531
101: out = 15 biased 11; lerr 0.000640869 rerr 0.00372314 larg 0.894531 rarg 0.898438
102: out = 14 biased 11; lerr 0.0032959 rerr 0.0010376 larg 0.898438 rarg 0.902344
103: out = 14 biased 10; lerr 0.0010376 rerr 0.00537109 larg 0.902344 rarg 0.90625
104: out = 13 biased 10; lerr 0.00170898 rerr 0.00259399 larg 0.90625 rarg 0.910156
105: out = 12 biased 10; lerr 0.0045166 rerr 0.000244141 larg 0.910156 rarg 0.914062
106: out = 12 biased 9; lerr 0.000244141 rerr 0.00402832 larg 0.914062 rarg 0.917969
107: out = 11 biased 9; lerr 0.00314331 rerr 0.00109863 larg 0.917969 rarg 0.921875
108: out = 11 biased 8; lerr 0.00109863 rerr 0.00534058 larg 0.921875 rarg 0.925781
109: out = 10 biased 8; lerr 0.00189209 rerr 0.00231934 larg 0.925781 rarg 0.929688
110: out = 9 biased 8; lerr 0.00494385 rerr 0.000762939 larg 0.929688 rarg 0.933594
111: out = 9 biased 7; lerr 0.000762939 rerr 0.00341797 larg 0.933594 rarg 0.9375
112: out = 8 biased 7; lerr 0.00390625 rerr 0.000244141 larg 0.9375 rarg 0.941406
113: out = 8 biased 6; lerr 0.000244141 rerr 0.00439453 larg 0.941406 rarg 0.945312
114: out = 7 biased 6; lerr 0.00299072 rerr 0.00112915 larg 0.945312 rarg 0.949219
115: out = 7 biased 5; lerr 0.00112915 rerr 0.00524902 larg 0.949219 rarg 0.953125
116: out = 6 biased 5; lerr 0.00219727 rerr 0.00189209 larg 0.953125 rarg 0.957031
117: out = 5 biased 5; lerr 0.00558472 rerr 0.00152588 larg 0.957031 rarg 0.960938
118: out = 5 biased 4; lerr 0.00152588 rerr 0.00253296 larg 0.960938 rarg 0.964844
119: out = 4 biased 4; lerr 0.00500488 rerr 0.000976562 larg 0.964844 rarg 0.96875
120: out = 4 biased 3; lerr 0.000976562 rerr 0.00305176 larg 0.96875 rarg 0.972656
121: out = 3 biased 3; lerr 0.00454712 rerr 0.000549316 larg 0.972656 rarg 0.976562
122: out = 3 biased 2; lerr 0.000549316 rerr 0.00344849 larg 0.976562 rarg 0.980469
123: out = 2 biased 2; lerr 0.00421143 rerr 0.000244141 larg 0.980469 rarg 0.984375
124: out = 2 biased 1; lerr 0.000244141 rerr 0.00372314 larg 0.984375 rarg 0.988281
125: out = 1 biased 1; lerr 0.0039978 rerr 6.10352E-05 larg 0.988281 rarg 0.992188
126: out = 1 biased 0; lerr 6.10352E-05 rerr 0.00387573 larg 0.992188 rarg 0.996094
127: out = 0 biased 0; lerr 0.00390625 rerr 0 larg 0.996094 rarg 1

... [removed hex data dumping]

RSqrt7x7LUT (input [6:0] in, output reg [6:0] out);
// in[6] corresponds to exp[0]
// in[5:0] corresponds to sig[S-1:S-5]
// out[6:0] corresponds to sig[S-1:S-6]
// biased : ((ipN-1) - in) << (op - ip)
0: out 127 biased 0; lerr 0.00390625 rerr 0.00384557 larg 0.25 rarg 0.253906
1: out 125 biased 1; lerr 0.00402773 rerr 0.00360435 larg 0.253906 rarg 0.257812
2: out 123 biased 2; lerr 0.00432928 rerr 0.00318533 larg 0.257812 rarg 0.261719
3: out 121 biased 3; lerr 0.00480818 rerr 0.00259111 larg 0.261719 rarg 0.265625
4: out 119 biased 4; lerr 0.00546183 rerr 0.00182426 larg 0.265625 rarg 0.269531
5: out 118 biased 4; lerr 0.0022317 rerr 0.00497249 larg 0.269531 rarg 0.273438
6: out 116 biased 5; lerr 0.00319802 rerr 0.00389675 larg 0.273438 rarg 0.277344
7: out 114 biased 6; lerr 0.00433191 rerr 0.00265532 larg 0.277344 rarg 0.28125
8: out 113 biased 6; lerr 0.00148789 rerr 0.00542232 larg 0.28125 rarg 0.285156
9: out 111 biased 7; lerr 0.00292144 rerr 0.00388464 larg 0.285156 rarg 0.289062
10: out 109 biased 8; lerr 0.00451607 rerr 0.0021876 larg 0.289062 rarg 0.292969
11: out 108 biased 8; lerr 0.00204104 rerr 0.00458999 larg 0.292969 rarg 0.296875
12: out 106 biased 9; lerr 0.00392348 rerr 0.00260824 larg 0.296875 rarg 0.300781
13: out 105 biased 9; lerr 0.00167641 rerr 0.00478529 larg 0.300781 rarg 0.304688
14: out 103 biased 10; lerr 0.00383947 rerr 0.00252584 larg 0.304688 rarg 0.308594
15: out 102 biased 10; lerr 0.0018141 rerr 0.00448366 larg 0.308594 rarg 0.3125
16: out 100 biased 11; lerr 0.00425098 rerr 0.00195312 larg 0.3125 rarg 0.316406
17: out 99 biased 11; lerr 0.00244141 rerr 0.00369747 larg 0.316406 rarg 0.320312
18: out 97 biased 12; lerr 0.00514568 rerr 0.000902127 larg 0.320312 rarg 0.324219
19: out 96 biased 12; lerr 0.00354633 rerr 0.00243843 larg 0.324219 rarg 0.328125
20: out 95 biased 12; lerr 0.00203674 rerr 0.00388594 larg 0.328125 rarg 0.332031
21: out 93 biased 13; lerr 0.00511752 rerr 0.000717621 larg 0.332031 rarg 0.335938
22: out 92 biased 13; lerr 0.00381051 rerr 0.00196455 larg 0.335938 rarg 0.339844
23: out 91 biased 13; lerr 0.00258984 rerr 0.00312603 larg 0.339844 rarg 0.34375
24: out 90 biased 13; lerr 0.00145446 rerr 0.00420307 larg 0.34375 rarg 0.347656
25: out 88 biased 14; lerr 0.0050098 rerr 0.000564416 larg 0.347656 rarg 0.351562
26: out 87 biased 14; lerr 0.00406783 rerr 0.00144985 larg 0.351562 rarg 0.355469
27: out 86 biased 14; lerr 0.00320806 rerr 0.00225385 larg 0.355469 rarg 0.359375
28: out 85 biased 14; lerr 0.00242958 rerr 0.00297735 larg 0.359375 rarg 0.363281
29: out 84 biased 14; lerr 0.00173146 rerr 0.00362122 larg 0.363281 rarg 0.367188
30: out 83 biased 14; lerr 0.00111284 rerr 0.00418633 larg 0.367188 rarg 0.371094
31: out 82 biased 14; lerr 0.000572846 rerr 0.00467353 larg 0.371094 rarg 0.375
32: out 80 biased 15; lerr 0.00489479 rerr 0.00027462 larg 0.375 rarg 0.378906
33: out 79 biased 15; lerr 0.00453439 rerr 0.000583717 larg 0.378906 rarg 0.382812
34: out 78 biased 15; lerr 0.00425002 rerr 0.000817442 larg 0.382812 rarg 0.386719
35: out 77 biased 15; lerr 0.0040409 rerr 0.000976562 larg 0.386719 rarg 0.390625
36: out 76 biased 15; lerr 0.00390625 rerr 0.00106183 larg 0.390625 rarg 0.394531
37: out 75 biased 15; lerr 0.00384534 rerr 0.00107398 larg 0.394531 rarg 0.398438
38: out 74 biased 15; lerr 0.00385742 rerr 0.00101372 larg 0.398438 rarg 0.402344
39: out 73 biased 15; lerr 0.00394179 rerr 0.00088176 larg 0.402344 rarg 0.40625
40: out 72 biased 15; lerr 0.00409775 rerr 0.000678786 larg 0.40625 rarg 0.410156
41: out 71 biased 15; lerr 0.00432461 rerr 0.000405468 larg 0.410156 rarg 0.414062
42: out 70 biased 15; lerr 0.0046217 rerr 6.24637E-05 larg 0.414062 rarg 0.417969
43: out 70 biased 14; lerr 6.24637E-05 rerr 0.00472478 larg 0.417969 rarg 0.421875
44: out 69 biased 14; lerr 0.000349583 rerr 0.00426776 larg 0.421875 rarg 0.425781
45: out 68 biased 14; lerr 0.000830041 rerr 0.00374284 larg 0.425781 rarg 0.429688
46: out 67 biased 14; lerr 0.00137829 rerr 0.00315063 larg 0.429688 rarg 0.433594
47: out 66 biased 14; lerr 0.00199374 rerr 0.00249171 larg 0.433594 rarg 0.4375
48: out 65 biased 14; lerr 0.00267578 rerr 0.00176667 larg 0.4375 rarg 0.441406
49: out 64 biased 14; lerr 0.00342383 rerr 0.000976086 larg 0.441406 rarg 0.445312
50: out 63 biased 14; lerr 0.00423733 rerr 0.000120513 larg 0.445312 rarg 0.449219
51: out 63 biased 13; lerr 0.000120513 rerr 0.00445945 larg 0.449219 rarg 0.453125
52: out 62 biased 13; lerr 0.000799499 rerr 0.00349816 larg 0.453125 rarg 0.457031
53: out 61 biased 13; lerr 0.00178341 rerr 0.00247339 larg 0.457031 rarg 0.460938
54: out 60 biased 13; lerr 0.0028307 rerr 0.00138568 larg 0.460938 rarg 0.464844
55: out 59 biased 13; lerr 0.00394084 rerr 0.00023553 larg 0.464844 rarg 0.46875
56: out 59 biased 12; lerr 0.00023553 rerr 0.00439453 larg 0.46875 rarg 0.472656
57: out 58 biased 12; lerr 0.000976562 rerr 0.00314314 larg 0.472656 rarg 0.476562
58: out 57 biased 12; lerr 0.0022501 rerr 0.00183069 larg 0.476562 rarg 0.480469
59: out 56 biased 12; lerr 0.00358461 rerr 0.000457659 larg 0.480469 rarg 0.484375
60: out 56 biased 11; lerr 0.000457659 rerr 0.00448366 larg 0.484375 rarg 0.488281
61: out 55 biased 11; lerr 0.000975489 rerr 0.00301265 larg 0.488281 rarg 0.492188
62: out 54 biased 11; lerr 0.00246829 rerr 0.00148234 larg 0.492188 rarg 0.496094
63: out 53 biased 11; lerr 0.00402031 rerr 0.000106817 larg 0.496094 rarg 0.5
64: out 52 biased 11; lerr 0.00563109 rerr 0.00210731 larg 0.5 rarg 0.507812
65: out 51 biased 11; lerr 0.00345996 rerr 0.00417648 larg 0.507812 rarg 0.515625
66: out 50 biased 11; lerr 0.00143345 rerr 0.00610301 larg 0.515625 rarg 0.523438
67: out 48 biased 12; lerr 0.00520152 rerr 0.00219486 larg 0.523438 rarg 0.53125
68: out 47 biased 12; lerr 0.00349943 rerr 0.00380104 larg 0.53125 rarg 0.539062
69: out 46 biased 12; lerr 0.00193497 rerr 0.00527137 larg 0.539062 rarg 0.546875
70: out 44 biased 13; lerr 0.00628347 rerr 0.000789331 larg 0.546875 rarg 0.554688
71: out 43 biased 13; lerr 0.00502921 rerr 0.00195312 larg 0.554688 rarg 0.5625
72: out 42 biased 13; lerr 0.00390625 rerr 0.00298721 larg 0.5625 rarg 0.570312
73: out 41 biased 13; lerr 0.00291271 rerr 0.00389343 larg 0.570312 rarg 0.578125
74: out 40 biased 13; lerr 0.00204677 rerr 0.00467353 larg 0.578125 rarg 0.585938
75: out 39 biased 13; lerr 0.00130667 rerr 0.00532924 larg 0.585938 rarg 0.59375
76: out 38 biased 13; lerr 0.000690699 rerr 0.00586222 larg 0.59375 rarg 0.601562
77: out 36 biased 14; lerr 0.0062566 rerr 0.000175461 larg 0.601562 rarg 0.609375
78: out 35 biased 14; lerr 0.00592317 rerr 0.000428823 larg 0.609375 rarg 0.617188
79: out 34 biased 14; lerr 0.00570878 rerr 0.000564416 larg 0.617188 rarg 0.625
80: out 33 biased 14; lerr 0.00561191 rerr 0.000583717 larg 0.625 rarg 0.632812
81: out 32 biased 14; lerr 0.00563109 rerr 0.000488162 larg 0.632812 rarg 0.640625
82: out 31 biased 14; lerr 0.00576489 rerr 0.000279149 larg 0.640625 rarg 0.648438
83: out 30 biased 14; lerr 0.00601191 rerr 4.19626E-05 larg 0.648438 rarg 0.65625
84: out 30 biased 13; lerr 4.19626E-05 rerr 0.00589256 larg 0.65625 rarg 0.664062
85: out 29 biased 13; lerr 0.00047385 rerr 0.00538852 larg 0.664062 rarg 0.671875
86: out 28 biased 13; lerr 0.00101522 rerr 0.00477604 larg 0.671875 rarg 0.679688
87: out 27 biased 13; lerr 0.00166483 rerr 0.00405633 larg 0.679688 rarg 0.6875
88: out 26 biased 13; lerr 0.00242145 rerr 0.0032306 larg 0.6875 rarg 0.695312
89: out 25 biased 13; lerr 0.00328389 rerr 0.0023 larg 0.695312 rarg 0.703125
90: out 24 biased 13; lerr 0.00425098 rerr 0.00126568 larg 0.703125 rarg 0.710938
91: out 23 biased 13; lerr 0.0053216 rerr 0.000128738 larg 0.710938 rarg 0.71875
92: out 23 biased 12; lerr 0.000128738 rerr 0.00554953 larg 0.71875 rarg 0.726562
93: out 22 biased 12; lerr 0.00110974 rerr 0.00424628 larg 0.726562 rarg 0.734375
94: out 21 biased 12; lerr 0.0024487 rerr 0.00284339 larg 0.734375 rarg 0.742188
95: out 20 biased 12; lerr 0.0038871 rerr 0.00134187 larg 0.742188 rarg 0.75
96: out 19 biased 12; lerr 0.00542395 rerr 0.000257287 larg 0.75 rarg 0.757812
97: out 19 biased 11; lerr 0.000257287 rerr 0.00488281 larg 0.757812 rarg 0.765625
98: out 18 biased 11; lerr 0.00195312 rerr 0.00312603 larg 0.765625 rarg 0.773438
99: out 17 biased 11; lerr 0.0037447 rerr 0.00127425 larg 0.773438 rarg 0.78125
100: out 16 biased 11; lerr 0.00563109 rerr 0.000671612 larg 0.78125 rarg 0.789062
101: out 16 biased 10; lerr 0.000671612 rerr 0.00426337 larg 0.789062 rarg 0.796875
102: out 15 biased 10; lerr 0.00271068 rerr 0.00216607 larg 0.796875 rarg 0.804688
103: out 14 biased 10; lerr 0.00484208 rerr 2.28884E-05 larg 0.804688 rarg 0.8125
104: out 14 biased 9; lerr 2.28884E-05 rerr 0.00477319 larg 0.8125 rarg 0.820312
105: out 13 biased 9; lerr 0.00230268 rerr 0.00243701 larg 0.820312 rarg 0.828125
106: out 12 biased 9; lerr 0.00467248 rerr 1.1444E-05 larg 0.828125 rarg 0.835938
107: out 12 biased 8; lerr 1.1444E-05 rerr 0.00467353 larg 0.835938 rarg 0.84375
108: out 11 biased 8; lerr 0.00250271 rerr 0.00210469 larg 0.84375 rarg 0.851562
109: out 10 biased 8; lerr 0.0051047 rerr 0.000551376 larg 0.851562 rarg 0.859375
110: out 10 biased 7; lerr 0.000551376 rerr 0.00398129 larg 0.859375 rarg 0.867188
111: out 9 biased 7; lerr 0.00329393 rerr 0.00118567 larg 0.867188 rarg 0.875
112: out 9 biased 6; lerr 0.00118567 rerr 0.00564531 larg 0.875 rarg 0.882812
113: out 8 biased 6; lerr 0.00169516 rerr 0.00271239 larg 0.882812 rarg 0.890625
114: out 7 biased 6; lerr 0.0046605 rerr 0.000304507 larg 0.890625 rarg 0.898438
115: out 7 biased 5; lerr 0.000304507 rerr 0.00403259 larg 0.898438 rarg 0.90625
116: out 6 biased 5; lerr 0.00340469 rerr 0.00088176 larg 0.90625 rarg 0.914062
117: out 6 biased 4; lerr 0.00088176 rerr 0.00514993 larg 0.914062 rarg 0.921875
118: out 5 biased 4; lerr 0.00235119 rerr 0.00186722 larg 0.921875 rarg 0.929688
119: out 4 biased 4; lerr 0.00566562 rerr 0.00149648 larg 0.929688 rarg 0.9375
120: out 4 biased 3; lerr 0.00149648 rerr 0.00265532 larg 0.9375 rarg 0.945312
121: out 3 biased 3; lerr 0.00494055 rerr 0.0008372 larg 0.945312 rarg 0.953125
122: out 3 biased 2; lerr 0.0008372 rerr 0.00324937 larg 0.953125 rarg 0.960938
123: out 2 biased 2; lerr 0.00440902 rerr 0.000370094 larg 0.960938 rarg 0.96875
124: out 2 biased 1; lerr 0.000370094 rerr 0.00365258 larg 0.96875 rarg 0.976562
125: out 1 biased 1; lerr 0.00406783 rerr 9.20338E-05 larg 0.976562 rarg 0.984375
126: out 1 biased 0; lerr 9.20338E-05 rerr 0.00386801 larg 0.984375 rarg 0.992188
127: out 0 biased 0; lerr 0.00391391 rerr 0 larg 0.992188 rarg 1

... [removed hex data dumping]

max recip 7x7 error at 0.519531: 0.00558472 or 2^-7.4843
max rsqrt 7x7 error at 0.546875: 0.00628347 or  2^-7.31422

On 2020-08-03 1:17 p.m., Bill Huffman wrote:

I should have said that my results are for the 7/7 case.  And it sounds like we're in agreement then.  We probably have the same table.

Bill

On 8/2/20 9:50 AM, DSHORNER wrote:
EXTERNAL MAIL

This is the link to the revised code that does n by m LUT

https://github.com/David-Horner/recip/blob/master/vrecip.cc

On 2020-08-01 4:51 p.m., David Horner via lists.riscv.org wrote:

Join {tech-vector-ext@lists.riscv.org to automatically receive all group messages.