swallach
i have a question if one implements square root using a nonrestoring divide approach (which i did at one time) is this ok within the proposed/described different approaches. ? On Aug 14, 2020, at 1:46 AM, Krste Asanovic <krste@...> wrote:
ï»¿ As Andrew says, verification/compliance concerns outweighed allowing more flexible definition. Also, the fixed 7b implementation was seen as being cheap to provide even if more accurate approximations are added later.
Krste
On Thu, 13 Aug 2020 17:37:01 0700, "Andrew Waterman" <andrew@...> said:
 The task group did consider that possibility but concluded that forcing  compatibility is more important. As Bill points out, more precise (or  flexibly precise) variants can be defined in the future, since there's  beaucoup opcode space available for unary operations.
 On Thu, Aug 13, 2020 at 4:55 PM Brian Grayson <brian.grayson@...>  wrote:
 As it stands, I think the spec prevents an implementer from being more  accurate than described, right? Should the spec specify "accurate to at  least 7 bits" instead?  I could envision an embedded implementer who would like just a few bits  more accuracy and fewer (or none) NewtonRaphson iterations for their  specific usecase.  (I've seen architectures that state a minimum accuracy, but leave the  actual accuracy up to the implementer, which is enough for standard  software to do the right thing.)
 Brian
 On Thu, Aug 13, 2020 at 5:11 PM Bill Huffman <huffman@...> wrote:
 On 8/13/20 2:33 PM, Andrew Waterman wrote:
 EXTERNAL MAIL
 On Thu, Aug 13, 2020 at 2:29 PM Bill Huffman <huffman@...>  wrote:
 I think maybe I'm done complaining. :)
 Hopefully because we've converged, not simply due to exhaustion :)
 Happily, yes. :)
 Except that the initial paragraph on recip operation needs the  words "concatenated and" removed.
 Thanks.
 I'm going to merge the pull request now, but additional feedback  is still welcome, of course.
 Sounds good.
 Bill
 Bill
 On 8/13/20 2:11 PM, Andrew Waterman wrote:
 EXTERNAL MAIL
 Good thinking. I've added analogous language for recip,  too.
 On Thu, Aug 13, 2020 at 12:58 PM Bill Huffman <  huffman@...> wrote:
 Andrew,
 I'll start at the top here... and with rsqrt since  it's simpler. I think the table and most of the  commentary is fine. I can follow the operation  description. Sort of. But I'm trying to figure out  how it can be improved. It currently says:
 For the nonexceptional cases, the result is  computed as follows. Let the normalized input  exponent be equal to the input exponent if the  input is normal, or 0 minus the number of leading  zeros in the significand otherwise. If the input  is subnormal, the normalized input significand is  given by shifting the input significand left by 1  minus the normalized input exponent, discarding  the leading 1 bit. The output exponent equals  floor((3*B  1  the normalized input exponent) /  2). The output sign equals the input sign.
 The following table gives the seven MSBs of the  output significand as a function of the LSB of the  normalized input exponent and the six MSBs of the  normalized input significand; the other bits of  the output significand are zero.
 I wonder if a high level description given first might  help. For example:
 For the nonexceptional cases the low bit of exponent  and the six bits of significand (after the leading  one) are concatenated and used to address the  following table. The output of the table becomes the  seven bits of the result significand (after the  leading one) and the remainder of the result  signifcand is zero. Denorm inputs are normalized and  the exponent adjusted appropriately before the  lookup. The output exponent is chosen to make the  result approximate the reciprocal of the square root  of the argument.
 More precisely, the result is computed as follows.  .... <your description>
 Bill
 On 8/12/20 9:19 PM, Andrew Waterman wrote:
 EXTERNAL MAIL
 On Wed, Aug 12, 2020 at 8:36 PM Bill Huffman <  huffman@...> wrote:
 On 8/12/20 7:05 PM, Andrew Waterman wrote:
 EXTERNAL MAIL
 On Wed, Aug 12, 2020 at 6:56 PM Bill  Huffman <huffman@...> wrote:
 On 8/12/20 4:21 PM, Andrew Waterman  wrote:
 EXTERNAL MAIL
 On Wed, Aug 12, 2020 at 3:37 PM Bill  Huffman <huffman@...> wrote:
 On 8/12/20 3:32 PM, Andrew Waterman  wrote:
 EXTERNAL MAIL
 On Wed, Aug 12, 2020 at 3:18 PM Bill  Huffman <huffman@...> wrote:
 On 8/11/20 4:11 PM, Andrew Waterman  wrote:
 EXTERNAL MAIL
 On Tue, Aug 11, 2020 at 3:35 PM Bill  Huffman <huffman@...> wrote:
 On 8/11/20 3:00 PM, Andrew Waterman  wrote:
 EXTERNAL MAIL
 On Tue, Aug 11, 2020 at 1:56 PM Bill  Huffman <huffman@...> wrote:
 Hi Andrew,
 I'm looking at the cases where the  reciprocal is near the boundary  between finite and infinite or between  normal and denormal. Are you trying  to get the boundaries approximately  right? Or exactly? For example, the  point at which the reciprocal of a  large positive denorm falls over the  boundary between MAXPOS and +Inf is  different for RUP and RNE. The  setting of OF changes at the same  point. There's yet another point at  which OF changes for RDN even though  the answer doesn't change. You don't  show UF set anywhere.
 There are no cases where UF should be  raised because there are no cases  where denormalization causes loss of  precision. When the result is  subnormal, it is only subnormal by  either one or two positions; the  denormalized 7bit significand plus  two bits of rightshift fits within  all of our formats' significands.  (This property doesn't hold for  bfloat16, but that point might be moot  if our variant of that format always  flushes subnormals to zero.)
 Ah, so you're counting the 7bit (plus  hidden bit) result as the absolutely  correct answer. There's no  relationship here to the infinite  precision reciprocal we're  approximating. This is an instruction  that throws away 16 bits of input  mantissa, does a table lookup, and  gives an answer that's exactly 7 bits  (plus hidden bit). The relationship  of this instruction to a reciprocal is  one of motivation and not closer than  that.
 I think that's the answer to the  paradigm question I had. I'll think  about that a bit and see what I think  of your edge case results and flags  then.
 Ah, that clarifies your earlier  question. Yeah, LMK what you think.
 With that reorientation to what the  instruction means, it looks correct.  I have a couple of comments:
 â˜… Just above the table you use the  concept of the instruction's  "domain." But the idea of its domain  does not seem very clear to me. I  lean toward removing the statement and  depending on the table.  â˜… In the first normative paragraph  after the table, you use the number of  leading zeros in the significand.  That assumes that the term  "significand" does not include the  "hidden" bit, which is zero in the  case of interest. I think a  singleprecision significand may be  considered to be 23 bits by some and  24 bits by others, leading to some  confusion about that sentence. It  might work to reference the leading  zeros in the represented part of the  significand.  â˜… As I read farther, it's pretty  confusing. I worry for most people  reading it. I wonder if there should  be a second table referenced where the  first table says "estimate of 1/x" and  dealing only with the magnitude of the  argument. The second table would have  five rows labeled by operand range   as below  and detail each range with  regard to exponent and  denormalization:  â—Ž 2^(B1) =< x < 2^(B)  â—Ž 2^(B) =< x < 2^(B+1)  â—Ž 2^(B+1) =< x < 2^(B1)  â—Ž 2^(B1) =< x < 2^(B)  â—Ž 2^(B) =< x < 2^(B+1)  â˜… The reciprocal square root would  be a little different but the same  idea would apply.
 Any of that make sense?
 Yeah, let me play around with the  presentation a bit. I'm not sure  whether breaking it into two tables or  expanding the current table will be  clearer, but your suggestion holds  either way. Thanks for being my  guinea pig.
 I almost suggested expanding the  current table. That makes it quite a  bit larger. But then, it also means  there's no need to clarify the  relationship between the two tables.  Maybe that's better. And it doesn't  expand the recip sqrt table.
 How about this... it's a beast, but I  think it works.  https://github.com/riscv/riscvvspec/blob/vfrecip/vspec.adoc#149vectorfloatingpointreciprocalestimateinstruction
 It is pretty big... I'm just looking  at the recip at this point. I have a  couple of thoughts:
 Yeah, but big is OK, I think.
 Probably so.
 I didn't change the rsqrt table at all.  Since the subnormal cases are mostly  uninteresting, I think the NOTE that  positive subnormal and normal inputs  always produce normal outputs suffices.
 That's probably OK. It's much less  confusing. I wonder if two examples for each  (recip and rsqrt) would help. One with a  denormal input and the other normal?
 I had been hoping that the reference C code would  scratch that itch, but you're probably right.  I've added a tiny example and a huge example for  each.
 â–¡ In the "Output" column for the 5  new positive and negative entries, you  have ... > y > ... but I think you  should have ... >= y > ... because  when the input is 127 the table has  output 0. So when the input is near  the "left" end of the input range as  expressed in the table, the output is  all the way at the left end of the  output range and needs the "equal."
 It's actually correct asis, because the  output value is never exactly a power of  2. When the input is exactly a power of  2, the result is always slightly larger  than the true reciprocal. (It's the  reciprocal of some number near the  midpoint of the interval interval ( 2^n,  nextafter(2^n) )).
 When the input mantissa (including hidden bit)  is 0xFF0000, the output mantissa is 0x800000,  if I'm reading the table correctly  127 in  leads to zero out.
 The second row in the table has input:
 2^B+1 < x â‰¤ 2^B (normal)
 Table input 127 is near the left end of the  range while table input 0 is absolutely at the  right end.
 The left end is not representable but is just  farther from zero than than 0xFF7F_FFFF  singleprecision. The right end is  0xFF00_0000 singleprecision. These turn into  127 and 0 as table inputs and into 0 and 127  as table outputs. Then they're 0x8020_0000  and 0x803F_C000 as singleprecision. So the  left end is equal to 2^(B+1).
 and output is listed as:
 2^(B+1) > y > 2^B (subnormal, sig  [MSB:MSB1]=01)
 but should allow the equal on the left,  shouldn't it?
  My mistake. I was thinking of the fact that  powerof2 inputs never produce powerof2  outputs. You're of course right that  justsmallerthanpowerof2 inputs do produce  powerof2 outputs. Thanks for the correction.
 I also reordered the spec so that vfrsqrte7 shows  up before vfrece7, since the former is so much  simpler to explain. More sanitychecking  appreciated.  https://github.com/riscv/riscvvspec/blob/vfrecip/vspec.adoc#149vectorfloatingpointreciprocalsquarerootestimateinstruction
 Bill
 â–¡ The expressions of subnormal are  still awkward. What about (subnormal  01...) or (subnormal 1...) and explain  later what that means. It would be  easier to read (and the table would be  a bit smaller).
 Thanks, I was hoping someone would suggest  a better way of expressing that.
 Bill

 Bill
 Bill
 Bill
 As to the large positive denorm input  case: the only case where this scheme  and IEEE (1.0 / x) differ in the  finity of the result, or differ in  whether OF is raised, is for the exact  input 2^(B+1), depending on the  rounding mode. We always produce a  finite result for this case, but  there's an arguable reason for it:  we're actually computing the  reciprocal of some number near the  midpoint of the interval ( 2^(B+1),  nextafter(2^(B+1)) ), the result of  which is finite, regardless of the  rounding mode.
 So, I'm wondering what your paradigm  is for the edge cases. I can see it  might not be worth being too  complicated since the answer isn't  very exact. The paradigm is further  complicated by the idea that the  answer may be refined by further  steps. :)
 Yeah... the intent was to have  reasonable fidelity. I think you can  argue the 2^(B+1) case either way,  but other ISAs have resolved it the  same way I did. And it's clearly a  feature that cornercase detection  doesn't depend on the significand  (except for its zeroness, that is).
 Bill
 On 8/10/20 8:54 PM, Andrew Waterman  wrote:
 EXTERNAL MAIL
 I've PRed a full definition of these  instructions. Please sanitycheck my  work:  https://github.com/riscv/riscvvspec/blob/78191da47644053d0605b21628e1f5e7961ad5bf/vspec.adoc#149vectorfloatingpointreciprocalestimateinstruction
 On Mon, Aug 3, 2020 at 5:44 PM Bill  Huffman <huffman@...> wrote:
 On 8/3/20 1:41 PM, Andrew Waterman  wrote:
 EXTERNAL MAIL
 On Mon, Aug 3, 2020 at 12:40 PM Bill  Huffman <huffman@...> wrote:
 The recip table matches mine as does  the worst case error.
 I have one different entry in the  square root table. For entry 77,  where you have 36, I have 37. I'm not  sure whether it matters. Also, ages  ago, I got a very small difference in  worst case error of 2^7.317 but I  haven't gone back to trace anything  down about that.
 Thanks for validating against your  table, Bill.
 With my value for that entry, the  worst error on the interval of  interest is 2^7.32041, for input  0x3f1a0000. With yours, it's 2^  7.3164 for 0x3f1bfffd.
 I agree with your computation with a  really tiny difference (I get that it  just barely rounds to 2^7.32040). I  can't say why I got 37 when I did it  810 years ago  and I don't think I'm  going to chase that. I'm good with 36  at that position in the table.
 So, I'm good with the table values  below.
 Bill
 Presumably the error's slightly  smaller for my scheme because I'm  picking the output value that  minimizes the maximum error on the  interval, rather than picking the  midpoint or similar. Of course, the  overall worst error is unaffected.
 Bill
 On 8/3/20 11:38 AM, DSHORNER wrote:
 EXTERNAL MAIL
 Now annotated version detail  https://github.com/DavidHorner/recip/blob/master/vrecip.cc
 For the 7x7 below notice the biased  value does not exceed 21 for recip (5  of 7 bits) and 15 for rsqrt (4 of 7  bits).
 ip 7 op 7 LUT #bits 896 verilog 0  test/testlong 1  Recip7x7LUT (input [6:0] in, output  reg [6:0] out);  in[6:0] corresponds to sig[S1:S6]  out[6:0] corresponds to sig[S1:S6]  biased : ((ipN1)  in) << (op  ip)  // or >> if neg  base bias 127 leftshift 0  rightshift 0  0: out = 127 biased 0; lerr  0.00390625 rerr 0.00387573 larg 0.5  rarg 0.503906  1: out = 125 biased 1; lerr 0.0039978  rerr 0.00372314 larg 0.503906 rarg  0.507812  2: out = 123 biased 2; lerr  0.00421143 rerr 0.00344849 larg  0.507812 rarg 0.511719  3: out = 121 biased 3; lerr  0.00454712 rerr 0.00305176 larg  0.511719 rarg 0.515625  4: out = 119 biased 4; lerr  0.00500488 rerr 0.00253296 larg  0.515625 rarg 0.519531  5: out = 117 biased 5; lerr  0.00558472 rerr 0.00189209 larg  0.519531 rarg 0.523438  6: out = 116 biased 5; lerr  0.00219727 rerr 0.00524902 larg  0.523438 rarg 0.527344  7: out = 114 biased 6; lerr  0.00299072 rerr 0.00439453 larg  0.527344 rarg 0.53125  8: out = 112 biased 7; lerr  0.00390625 rerr 0.00341797 larg  0.53125 rarg 0.535156  9: out = 110 biased 8; lerr  0.00494385 rerr 0.00231934 larg  0.535156 rarg 0.539062  10: out = 109 biased 8; lerr  0.00189209 rerr 0.00534058 larg  0.539062 rarg 0.542969  11: out = 107 biased 9; lerr  0.00314331 rerr 0.00402832 larg  0.542969 rarg 0.546875  12: out = 105 biased 10; lerr  0.0045166 rerr 0.00259399 larg  0.546875 rarg 0.550781  13: out = 104 biased 10; lerr  0.00170898 rerr 0.00537109 larg  0.550781 rarg 0.554688  14: out = 102 biased 11; lerr  0.0032959 rerr 0.00372314 larg  0.554688 rarg 0.558594  15: out = 100 biased 12; lerr  0.00500488 rerr 0.00195312 larg  0.558594 rarg 0.5625  16: out = 99 biased 12; lerr  0.00244141 rerr 0.00448608 larg 0.5625  rarg 0.566406  17: out = 97 biased 13; lerr  0.00436401 rerr 0.00250244 larg  0.566406 rarg 0.570312  18: out = 96 biased 13; lerr  0.00195312 rerr 0.00488281 larg  0.570312 rarg 0.574219  19: out = 94 biased 14; lerr  0.00408936 rerr 0.00268555 larg  0.574219 rarg 0.578125  20: out = 93 biased 14; lerr  0.00183105 rerr 0.00491333 larg  0.578125 rarg 0.582031  21: out = 91 biased 15; lerr  0.00418091 rerr 0.00250244 larg  0.582031 rarg 0.585938  22: out = 90 biased 15; lerr  0.0020752 rerr 0.00457764 larg  0.585938 rarg 0.589844  23: out = 88 biased 16; lerr  0.00463867 rerr 0.00195312 larg  0.589844 rarg 0.59375  24: out = 87 biased 16; lerr  0.00268555 rerr 0.00387573 larg  0.59375 rarg 0.597656  25: out = 85 biased 17; lerr  0.00546265 rerr 0.0010376 larg  0.597656 rarg 0.601562  26: out = 84 biased 17; lerr  0.00366211 rerr 0.00280762 larg  0.601562 rarg 0.605469  27: out = 83 biased 17; lerr  0.00192261 rerr 0.0045166 larg  0.605469 rarg 0.609375  28: out = 81 biased 18; lerr  0.00500488 rerr 0.00137329 larg  0.609375 rarg 0.613281  29: out = 80 biased 18; lerr  0.00341797 rerr 0.00292969 larg  0.613281 rarg 0.617188  30: out = 79 biased 18; lerr  0.00189209 rerr 0.00442505 larg  0.617188 rarg 0.621094  31: out = 77 biased 19; lerr  0.00527954 rerr 0.000976562 larg  0.621094 rarg 0.625  32: out = 76 biased 19; lerr  0.00390625 rerr 0.00231934 larg 0.625  rarg 0.628906  33: out = 75 biased 19; lerr  0.00259399 rerr 0.00360107 larg  0.628906 rarg 0.632812  34: out = 74 biased 19; lerr  0.00134277 rerr 0.00482178 larg  0.632812 rarg 0.636719  35: out = 72 biased 20; lerr  0.00512695 rerr 0.000976562 larg  0.636719 rarg 0.640625  36: out = 71 biased 20; lerr  0.00402832 rerr 0.00204468 larg  0.640625 rarg 0.644531  37: out = 70 biased 20; lerr  0.00299072 rerr 0.00305176 larg  0.644531 rarg 0.648438  38: out = 69 biased 20; lerr  0.00201416 rerr 0.0039978 larg  0.648438 rarg 0.652344  39: out = 68 biased 20; lerr  0.00109863 rerr 0.00488281 larg  0.652344 rarg 0.65625  40: out = 66 biased 21; lerr  0.00537109 rerr 0.000549316 larg  0.65625 rarg 0.660156  41: out = 65 biased 21; lerr  0.00460815 rerr 0.00128174 larg  0.660156 rarg 0.664062  42: out = 64 biased 21; lerr  0.00390625 rerr 0.00195312 larg  0.664062 rarg 0.667969  43: out = 63 biased 21; lerr  0.00326538 rerr 0.00256348 larg  0.667969 rarg 0.671875  44: out = 62 biased 21; lerr  0.00268555 rerr 0.00311279 larg  0.671875 rarg 0.675781  45: out = 61 biased 21; lerr  0.00216675 rerr 0.00360107 larg  0.675781 rarg 0.679688  46: out = 60 biased 21; lerr  0.00170898 rerr 0.00402832 larg  0.679688 rarg 0.683594  47: out = 59 biased 21; lerr  0.00131226 rerr 0.00439453 larg  0.683594 rarg 0.6875  48: out = 58 biased 21; lerr  0.000976562 rerr 0.00469971 larg  0.6875 rarg 0.691406  49: out = 57 biased 21; lerr  0.000701904 rerr 0.00494385 larg  0.691406 rarg 0.695312  50: out = 56 biased 21; lerr  0.000488281 rerr 0.00512695 larg  0.695312 rarg 0.699219  51: out = 55 biased 21; lerr  0.000335693 rerr 0.00524902 larg  0.699219 rarg 0.703125  52: out = 54 biased 21; lerr  0.000244141 rerr 0.00531006 larg  0.703125 rarg 0.707031  53: out = 53 biased 21; lerr  0.000213623 rerr 0.00531006 larg  0.707031 rarg 0.710938  54: out = 52 biased 21; lerr  0.000244141 rerr 0.00524902 larg  0.710938 rarg 0.714844  55: out = 51 biased 21; lerr  0.000335693 rerr 0.00512695 larg  0.714844 rarg 0.71875  56: out = 50 biased 21; lerr  0.000488281 rerr 0.00494385 larg  0.71875 rarg 0.722656  57: out = 49 biased 21; lerr  0.000701904 rerr 0.00469971 larg  0.722656 rarg 0.726562  58: out = 48 biased 21; lerr  0.000976562 rerr 0.00439453 larg  0.726562 rarg 0.730469  59: out = 47 biased 21; lerr  0.00131226 rerr 0.00402832 larg  0.730469 rarg 0.734375  60: out = 46 biased 21; lerr  0.00170898 rerr 0.00360107 larg  0.734375 rarg 0.738281  61: out = 45 biased 21; lerr  0.00216675 rerr 0.00311279 larg  0.738281 rarg 0.742188  62: out = 44 biased 21; lerr  0.00268555 rerr 0.00256348 larg  0.742188 rarg 0.746094  63: out = 43 biased 21; lerr  0.00326538 rerr 0.00195312 larg  0.746094 rarg 0.75  64: out = 42 biased 21; lerr  0.00390625 rerr 0.00128174 larg 0.75  rarg 0.753906  65: out = 41 biased 21; lerr  0.00460815 rerr 0.000549316 larg  0.753906 rarg 0.757812  66: out = 40 biased 21; lerr  0.00537109 rerr 0.000244141 larg  0.757812 rarg 0.761719  67: out = 40 biased 20; lerr  0.000244141 rerr 0.00488281 larg  0.761719 rarg 0.765625  68: out = 39 biased 20; lerr  0.00109863 rerr 0.0039978 larg  0.765625 rarg 0.769531  69: out = 38 biased 20; lerr  0.00201416 rerr 0.00305176 larg  0.769531 rarg 0.773438  70: out = 37 biased 20; lerr  0.00299072 rerr 0.00204468 larg  0.773438 rarg 0.777344  71: out = 36 biased 20; lerr  0.00402832 rerr 0.000976562 larg  0.777344 rarg 0.78125  72: out = 35 biased 20; lerr  0.00512695 rerr 0.000152588 larg  0.78125 rarg 0.785156  73: out = 35 biased 19; lerr  0.000152588 rerr 0.00482178 larg  0.785156 rarg 0.789062  74: out = 34 biased 19; lerr  0.00134277 rerr 0.00360107 larg  0.789062 rarg 0.792969  75: out = 33 biased 19; lerr  0.00259399 rerr 0.00231934 larg  0.792969 rarg 0.796875  76: out = 32 biased 19; lerr  0.00390625 rerr 0.000976562 larg  0.796875 rarg 0.800781  77: out = 31 biased 19; lerr  0.00527954 rerr 0.000427246 larg  0.800781 rarg 0.804688  78: out = 31 biased 18; lerr  0.000427246 rerr 0.00442505 larg  0.804688 rarg 0.808594  79: out = 30 biased 18; lerr  0.00189209 rerr 0.00292969 larg  0.808594 rarg 0.8125  80: out = 29 biased 18; lerr  0.00341797 rerr 0.00137329 larg 0.8125  rarg 0.816406  81: out = 28 biased 18; lerr  0.00500488 rerr 0.000244141 larg  0.816406 rarg 0.820312  82: out = 28 biased 17; lerr  0.000244141 rerr 0.0045166 larg  0.820312 rarg 0.824219  83: out = 27 biased 17; lerr  0.00192261 rerr 0.00280762 larg  0.824219 rarg 0.828125  84: out = 26 biased 17; lerr  0.00366211 rerr 0.0010376 larg  0.828125 rarg 0.832031  85: out = 25 biased 17; lerr  0.00546265 rerr 0.000793457 larg  0.832031 rarg 0.835938  86: out = 25 biased 16; lerr  0.000793457 rerr 0.00387573 larg  0.835938 rarg 0.839844  87: out = 24 biased 16; lerr  0.00268555 rerr 0.00195312 larg  0.839844 rarg 0.84375  88: out = 23 biased 16; lerr  0.00463867 rerr 3.05176E05 larg  0.84375 rarg 0.847656  89: out = 23 biased 15; lerr  3.05176E05 rerr 0.00457764 larg  0.847656 rarg 0.851562  90: out = 22 biased 15; lerr  0.0020752 rerr 0.00250244 larg  0.851562 rarg 0.855469  91: out = 21 biased 15; lerr  0.00418091 rerr 0.000366211 larg  0.855469 rarg 0.859375  92: out = 21 biased 14; lerr  0.000366211 rerr 0.00491333 larg  0.859375 rarg 0.863281  93: out = 20 biased 14; lerr  0.00183105 rerr 0.00268555 larg  0.863281 rarg 0.867188  94: out = 19 biased 14; lerr  0.00408936 rerr 0.000396729 larg  0.867188 rarg 0.871094  95: out = 19 biased 13; lerr  0.000396729 rerr 0.00488281 larg  0.871094 rarg 0.875  96: out = 18 biased 13; lerr  0.00195312 rerr 0.00250244 larg 0.875  rarg 0.878906  97: out = 17 biased 13; lerr  0.00436401 rerr 6.10352E05 larg  0.878906 rarg 0.882812  98: out = 17 biased 12; lerr  6.10352E05 rerr 0.00448608 larg  0.882812 rarg 0.886719  99: out = 16 biased 12; lerr  0.00244141 rerr 0.00195312 larg  0.886719 rarg 0.890625  100: out = 15 biased 12; lerr  0.00500488 rerr 0.000640869 larg  0.890625 rarg 0.894531  101: out = 15 biased 11; lerr  0.000640869 rerr 0.00372314 larg  0.894531 rarg 0.898438  102: out = 14 biased 11; lerr  0.0032959 rerr 0.0010376 larg 0.898438  rarg 0.902344  103: out = 14 biased 10; lerr  0.0010376 rerr 0.00537109 larg  0.902344 rarg 0.90625  104: out = 13 biased 10; lerr  0.00170898 rerr 0.00259399 larg  0.90625 rarg 0.910156  105: out = 12 biased 10; lerr  0.0045166 rerr 0.000244141 larg  0.910156 rarg 0.914062  106: out = 12 biased 9; lerr  0.000244141 rerr 0.00402832 larg  0.914062 rarg 0.917969  107: out = 11 biased 9; lerr  0.00314331 rerr 0.00109863 larg  0.917969 rarg 0.921875  108: out = 11 biased 8; lerr  0.00109863 rerr 0.00534058 larg  0.921875 rarg 0.925781  109: out = 10 biased 8; lerr  0.00189209 rerr 0.00231934 larg  0.925781 rarg 0.929688  110: out = 9 biased 8; lerr  0.00494385 rerr 0.000762939 larg  0.929688 rarg 0.933594  111: out = 9 biased 7; lerr  0.000762939 rerr 0.00341797 larg  0.933594 rarg 0.9375  112: out = 8 biased 7; lerr  0.00390625 rerr 0.000244141 larg  0.9375 rarg 0.941406  113: out = 8 biased 6; lerr  0.000244141 rerr 0.00439453 larg  0.941406 rarg 0.945312  114: out = 7 biased 6; lerr  0.00299072 rerr 0.00112915 larg  0.945312 rarg 0.949219  115: out = 7 biased 5; lerr  0.00112915 rerr 0.00524902 larg  0.949219 rarg 0.953125  116: out = 6 biased 5; lerr  0.00219727 rerr 0.00189209 larg  0.953125 rarg 0.957031  117: out = 5 biased 5; lerr  0.00558472 rerr 0.00152588 larg  0.957031 rarg 0.960938  118: out = 5 biased 4; lerr  0.00152588 rerr 0.00253296 larg  0.960938 rarg 0.964844  119: out = 4 biased 4; lerr  0.00500488 rerr 0.000976562 larg  0.964844 rarg 0.96875  120: out = 4 biased 3; lerr  0.000976562 rerr 0.00305176 larg  0.96875 rarg 0.972656  121: out = 3 biased 3; lerr  0.00454712 rerr 0.000549316 larg  0.972656 rarg 0.976562  122: out = 3 biased 2; lerr  0.000549316 rerr 0.00344849 larg  0.976562 rarg 0.980469  123: out = 2 biased 2; lerr  0.00421143 rerr 0.000244141 larg  0.980469 rarg 0.984375  124: out = 2 biased 1; lerr  0.000244141 rerr 0.00372314 larg  0.984375 rarg 0.988281  125: out = 1 biased 1; lerr 0.0039978  rerr 6.10352E05 larg 0.988281 rarg  0.992188  126: out = 1 biased 0; lerr  6.10352E05 rerr 0.00387573 larg  0.992188 rarg 0.996094  127: out = 0 biased 0; lerr  0.00390625 rerr 0 larg 0.996094 rarg 1
 ... [removed hex data dumping]
 RSqrt7x7LUT (input [6:0] in, output  reg [6:0] out);  // in[6] corresponds to exp[0]  // in[5:0] corresponds to sig  [S1:S5]  // out[6:0] corresponds to sig  [S1:S6]  // biased : ((ipN1)  in) << (op   ip)  0: out 127 biased 0; lerr 0.00390625  rerr 0.00384557 larg 0.25 rarg  0.253906  1: out 125 biased 1; lerr 0.00402773  rerr 0.00360435 larg 0.253906 rarg  0.257812  2: out 123 biased 2; lerr 0.00432928  rerr 0.00318533 larg 0.257812 rarg  0.261719  3: out 121 biased 3; lerr 0.00480818  rerr 0.00259111 larg 0.261719 rarg  0.265625  4: out 119 biased 4; lerr 0.00546183  rerr 0.00182426 larg 0.265625 rarg  0.269531  5: out 118 biased 4; lerr 0.0022317  rerr 0.00497249 larg 0.269531 rarg  0.273438  6: out 116 biased 5; lerr 0.00319802  rerr 0.00389675 larg 0.273438 rarg  0.277344  7: out 114 biased 6; lerr 0.00433191  rerr 0.00265532 larg 0.277344 rarg  0.28125  8: out 113 biased 6; lerr 0.00148789  rerr 0.00542232 larg 0.28125 rarg  0.285156  9: out 111 biased 7; lerr 0.00292144  rerr 0.00388464 larg 0.285156 rarg  0.289062  10: out 109 biased 8; lerr 0.00451607  rerr 0.0021876 larg 0.289062 rarg  0.292969  11: out 108 biased 8; lerr 0.00204104  rerr 0.00458999 larg 0.292969 rarg  0.296875  12: out 106 biased 9; lerr 0.00392348  rerr 0.00260824 larg 0.296875 rarg  0.300781  13: out 105 biased 9; lerr 0.00167641  rerr 0.00478529 larg 0.300781 rarg  0.304688  14: out 103 biased 10; lerr  0.00383947 rerr 0.00252584 larg  0.304688 rarg 0.308594  15: out 102 biased 10; lerr 0.0018141  rerr 0.00448366 larg 0.308594 rarg  0.3125  16: out 100 biased 11; lerr  0.00425098 rerr 0.00195312 larg 0.3125  rarg 0.316406  17: out 99 biased 11; lerr 0.00244141  rerr 0.00369747 larg 0.316406 rarg  0.320312  18: out 97 biased 12; lerr 0.00514568  rerr 0.000902127 larg 0.320312 rarg  0.324219  19: out 96 biased 12; lerr 0.00354633  rerr 0.00243843 larg 0.324219 rarg  0.328125  20: out 95 biased 12; lerr 0.00203674  rerr 0.00388594 larg 0.328125 rarg  0.332031  21: out 93 biased 13; lerr 0.00511752  rerr 0.000717621 larg 0.332031 rarg  0.335938  22: out 92 biased 13; lerr 0.00381051  rerr 0.00196455 larg 0.335938 rarg  0.339844  23: out 91 biased 13; lerr 0.00258984  rerr 0.00312603 larg 0.339844 rarg  0.34375  24: out 90 biased 13; lerr 0.00145446  rerr 0.00420307 larg 0.34375 rarg  0.347656  25: out 88 biased 14; lerr 0.0050098  rerr 0.000564416 larg 0.347656 rarg  0.351562  26: out 87 biased 14; lerr 0.00406783  rerr 0.00144985 larg 0.351562 rarg  0.355469  27: out 86 biased 14; lerr 0.00320806  rerr 0.00225385 larg 0.355469 rarg  0.359375  28: out 85 biased 14; lerr 0.00242958  rerr 0.00297735 larg 0.359375 rarg  0.363281  29: out 84 biased 14; lerr 0.00173146  rerr 0.00362122 larg 0.363281 rarg  0.367188  30: out 83 biased 14; lerr 0.00111284  rerr 0.00418633 larg 0.367188 rarg  0.371094  31: out 82 biased 14; lerr  0.000572846 rerr 0.00467353 larg  0.371094 rarg 0.375  32: out 80 biased 15; lerr 0.00489479  rerr 0.00027462 larg 0.375 rarg  0.378906  33: out 79 biased 15; lerr 0.00453439  rerr 0.000583717 larg 0.378906 rarg  0.382812  34: out 78 biased 15; lerr 0.00425002  rerr 0.000817442 larg 0.382812 rarg  0.386719  35: out 77 biased 15; lerr 0.0040409  rerr 0.000976562 larg 0.386719 rarg  0.390625  36: out 76 biased 15; lerr 0.00390625  rerr 0.00106183 larg 0.390625 rarg  0.394531  37: out 75 biased 15; lerr 0.00384534  rerr 0.00107398 larg 0.394531 rarg  0.398438  38: out 74 biased 15; lerr 0.00385742  rerr 0.00101372 larg 0.398438 rarg  0.402344  39: out 73 biased 15; lerr 0.00394179  rerr 0.00088176 larg 0.402344 rarg  0.40625  40: out 72 biased 15; lerr 0.00409775  rerr 0.000678786 larg 0.40625 rarg  0.410156  41: out 71 biased 15; lerr 0.00432461  rerr 0.000405468 larg 0.410156 rarg  0.414062  42: out 70 biased 15; lerr 0.0046217  rerr 6.24637E05 larg 0.414062 rarg  0.417969  43: out 70 biased 14; lerr  6.24637E05 rerr 0.00472478 larg  0.417969 rarg 0.421875  44: out 69 biased 14; lerr  0.000349583 rerr 0.00426776 larg  0.421875 rarg 0.425781  45: out 68 biased 14; lerr  0.000830041 rerr 0.00374284 larg  0.425781 rarg 0.429688  46: out 67 biased 14; lerr 0.00137829  rerr 0.00315063 larg 0.429688 rarg  0.433594  47: out 66 biased 14; lerr 0.00199374  rerr 0.00249171 larg 0.433594 rarg  0.4375  48: out 65 biased 14; lerr 0.00267578  rerr 0.00176667 larg 0.4375 rarg  0.441406  49: out 64 biased 14; lerr 0.00342383  rerr 0.000976086 larg 0.441406 rarg  0.445312  50: out 63 biased 14; lerr 0.00423733  rerr 0.000120513 larg 0.445312 rarg  0.449219  51: out 63 biased 13; lerr  0.000120513 rerr 0.00445945 larg  0.449219 rarg 0.453125  52: out 62 biased 13; lerr  0.000799499 rerr 0.00349816 larg  0.453125 rarg 0.457031  53: out 61 biased 13; lerr 0.00178341  rerr 0.00247339 larg 0.457031 rarg  0.460938  54: out 60 biased 13; lerr 0.0028307  rerr 0.00138568 larg 0.460938 rarg  0.464844  55: out 59 biased 13; lerr 0.00394084  rerr 0.00023553 larg 0.464844 rarg  0.46875  56: out 59 biased 12; lerr 0.00023553  rerr 0.00439453 larg 0.46875 rarg  0.472656  57: out 58 biased 12; lerr  0.000976562 rerr 0.00314314 larg  0.472656 rarg 0.476562  58: out 57 biased 12; lerr 0.0022501  rerr 0.00183069 larg 0.476562 rarg  0.480469  59: out 56 biased 12; lerr 0.00358461  rerr 0.000457659 larg 0.480469 rarg  0.484375  60: out 56 biased 11; lerr  0.000457659 rerr 0.00448366 larg  0.484375 rarg 0.488281  61: out 55 biased 11; lerr  0.000975489 rerr 0.00301265 larg  0.488281 rarg 0.492188  62: out 54 biased 11; lerr 0.00246829  rerr 0.00148234 larg 0.492188 rarg  0.496094  63: out 53 biased 11; lerr 0.00402031  rerr 0.000106817 larg 0.496094 rarg  0.5  64: out 52 biased 11; lerr 0.00563109  rerr 0.00210731 larg 0.5 rarg 0.507812  65: out 51 biased 11; lerr 0.00345996  rerr 0.00417648 larg 0.507812 rarg  0.515625  66: out 50 biased 11; lerr 0.00143345  rerr 0.00610301 larg 0.515625 rarg  0.523438  67: out 48 biased 12; lerr 0.00520152  rerr 0.00219486 larg 0.523438 rarg  0.53125  68: out 47 biased 12; lerr 0.00349943  rerr 0.00380104 larg 0.53125 rarg  0.539062  69: out 46 biased 12; lerr 0.00193497  rerr 0.00527137 larg 0.539062 rarg  0.546875  70: out 44 biased 13; lerr 0.00628347  rerr 0.000789331 larg 0.546875 rarg  0.554688  71: out 43 biased 13; lerr 0.00502921  rerr 0.00195312 larg 0.554688 rarg  0.5625  72: out 42 biased 13; lerr 0.00390625  rerr 0.00298721 larg 0.5625 rarg  0.570312  73: out 41 biased 13; lerr 0.00291271  rerr 0.00389343 larg 0.570312 rarg  0.578125  74: out 40 biased 13; lerr 0.00204677  rerr 0.00467353 larg 0.578125 rarg  0.585938  75: out 39 biased 13; lerr 0.00130667  rerr 0.00532924 larg 0.585938 rarg  0.59375  76: out 38 biased 13; lerr  0.000690699 rerr 0.00586222 larg  0.59375 rarg 0.601562  77: out 36 biased 14; lerr 0.0062566  rerr 0.000175461 larg 0.601562 rarg  0.609375  78: out 35 biased 14; lerr 0.00592317  rerr 0.000428823 larg 0.609375 rarg  0.617188  79: out 34 biased 14; lerr 0.00570878  rerr 0.000564416 larg 0.617188 rarg  0.625  80: out 33 biased 14; lerr 0.00561191  rerr 0.000583717 larg 0.625 rarg  0.632812  81: out 32 biased 14; lerr 0.00563109  rerr 0.000488162 larg 0.632812 rarg  0.640625  82: out 31 biased 14; lerr 0.00576489  rerr 0.000279149 larg 0.640625 rarg  0.648438  83: out 30 biased 14; lerr 0.00601191  rerr 4.19626E05 larg 0.648438 rarg  0.65625  84: out 30 biased 13; lerr  4.19626E05 rerr 0.00589256 larg  0.65625 rarg 0.664062  85: out 29 biased 13; lerr 0.00047385  rerr 0.00538852 larg 0.664062 rarg  0.671875  86: out 28 biased 13; lerr 0.00101522  rerr 0.00477604 larg 0.671875 rarg  0.679688  87: out 27 biased 13; lerr 0.00166483  rerr 0.00405633 larg 0.679688 rarg  0.6875  88: out 26 biased 13; lerr 0.00242145  rerr 0.0032306 larg 0.6875 rarg  0.695312  89: out 25 biased 13; lerr 0.00328389  rerr 0.0023 larg 0.695312 rarg  0.703125  90: out 24 biased 13; lerr 0.00425098  rerr 0.00126568 larg 0.703125 rarg  0.710938  91: out 23 biased 13; lerr 0.0053216  rerr 0.000128738 larg 0.710938 rarg  0.71875  92: out 23 biased 12; lerr  0.000128738 rerr 0.00554953 larg  0.71875 rarg 0.726562  93: out 22 biased 12; lerr 0.00110974  rerr 0.00424628 larg 0.726562 rarg  0.734375  94: out 21 biased 12; lerr 0.0024487  rerr 0.00284339 larg 0.734375 rarg  0.742188  95: out 20 biased 12; lerr 0.0038871  rerr 0.00134187 larg 0.742188 rarg  0.75  96: out 19 biased 12; lerr 0.00542395  rerr 0.000257287 larg 0.75 rarg  0.757812  97: out 19 biased 11; lerr  0.000257287 rerr 0.00488281 larg  0.757812 rarg 0.765625  98: out 18 biased 11; lerr 0.00195312  rerr 0.00312603 larg 0.765625 rarg  0.773438  99: out 17 biased 11; lerr 0.0037447  rerr 0.00127425 larg 0.773438 rarg  0.78125  100: out 16 biased 11; lerr  0.00563109 rerr 0.000671612 larg  0.78125 rarg 0.789062  101: out 16 biased 10; lerr  0.000671612 rerr 0.00426337 larg  0.789062 rarg 0.796875  102: out 15 biased 10; lerr  0.00271068 rerr 0.00216607 larg  0.796875 rarg 0.804688  103: out 14 biased 10; lerr  0.00484208 rerr 2.28884E05 larg  0.804688 rarg 0.8125  104: out 14 biased 9; lerr  2.28884E05 rerr 0.00477319 larg  0.8125 rarg 0.820312  105: out 13 biased 9; lerr 0.00230268  rerr 0.00243701 larg 0.820312 rarg  0.828125  106: out 12 biased 9; lerr 0.00467248  rerr 1.1444E05 larg 0.828125 rarg  0.835938  107: out 12 biased 8; lerr 1.1444E05  rerr 0.00467353 larg 0.835938 rarg  0.84375  108: out 11 biased 8; lerr 0.00250271  rerr 0.00210469 larg 0.84375 rarg  0.851562  109: out 10 biased 8; lerr 0.0051047  rerr 0.000551376 larg 0.851562 rarg  0.859375  110: out 10 biased 7; lerr  0.000551376 rerr 0.00398129 larg  0.859375 rarg 0.867188  111: out 9 biased 7; lerr 0.00329393  rerr 0.00118567 larg 0.867188 rarg  0.875  112: out 9 biased 6; lerr 0.00118567  rerr 0.00564531 larg 0.875 rarg  0.882812  113: out 8 biased 6; lerr 0.00169516  rerr 0.00271239 larg 0.882812 rarg  0.890625  114: out 7 biased 6; lerr 0.0046605  rerr 0.000304507 larg 0.890625 rarg  0.898438  115: out 7 biased 5; lerr 0.000304507  rerr 0.00403259 larg 0.898438 rarg  0.90625  116: out 6 biased 5; lerr 0.00340469  rerr 0.00088176 larg 0.90625 rarg  0.914062  117: out 6 biased 4; lerr 0.00088176  rerr 0.00514993 larg 0.914062 rarg  0.921875  118: out 5 biased 4; lerr 0.00235119  rerr 0.00186722 larg 0.921875 rarg  0.929688  119: out 4 biased 4; lerr 0.00566562  rerr 0.00149648 larg 0.929688 rarg  0.9375  120: out 4 biased 3; lerr 0.00149648  rerr 0.00265532 larg 0.9375 rarg  0.945312  121: out 3 biased 3; lerr 0.00494055  rerr 0.0008372 larg 0.945312 rarg  0.953125  122: out 3 biased 2; lerr 0.0008372  rerr 0.00324937 larg 0.953125 rarg  0.960938  123: out 2 biased 2; lerr 0.00440902  rerr 0.000370094 larg 0.960938 rarg  0.96875  124: out 2 biased 1; lerr 0.000370094  rerr 0.00365258 larg 0.96875 rarg  0.976562  125: out 1 biased 1; lerr 0.00406783  rerr 9.20338E05 larg 0.976562 rarg  0.984375  126: out 1 biased 0; lerr 9.20338E05  rerr 0.00386801 larg 0.984375 rarg  0.992188  127: out 0 biased 0; lerr 0.00391391  rerr 0 larg 0.992188 rarg 1
 ... [removed hex data dumping]
 max recip 7x7 error at 0.519531:  0.00558472 or 2^7.4843  max rsqrt 7x7 error at 0.546875:  0.00628347 or 2^7.31422
 On 20200803 1:17 p.m., Bill Huffman  wrote:
 I should have said that my results are  for the 7/7 case. And it sounds like  we're in agreement then. We probably  have the same table.
 Bill
 On 8/2/20 9:50 AM, DSHORNER wrote:
 EXTERNAL MAIL
 This is the link to the revised code  that does n by m LUT
 https://github.com/DavidHorner/recip/blob/master/vrecip.cc
 On 20200801 4:51 p.m., David Horner  via lists.riscv.org wrote:

http://bsc.es/disclaimer
