Re: VFRECIP/VFRSQRT instructions
As Andrew says, verification/compliance concerns outweighed allowing
more flexible definition. Also, the fixed 7b implementation was seen as being cheap to provide even if more accurate approximations are added later. Krste | The task group did consider that possibility but concluded that forcingOn Thu, 13 Aug 2020 17:37:01 -0700, "Andrew Waterman" <andrew@...> said: | compatibility is more important. As Bill points out, more precise (or | flexibly precise) variants can be defined in the future, since there's | beaucoup opcode space available for unary operations. | On Thu, Aug 13, 2020 at 4:55 PM Brian Grayson <brian.grayson@...> | wrote: | As it stands, I think the spec prevents an implementer from being more | accurate than described, right? Should the spec specify "accurate to at | least 7 bits" instead? | I could envision an embedded implementer who would like just a few bits | more accuracy and fewer (or none) Newton-Raphson iterations for their | specific use-case. | (I've seen architectures that state a minimum accuracy, but leave the | actual accuracy up to the implementer, which is enough for standard | software to do the right thing.) | Brian | On Thu, Aug 13, 2020 at 5:11 PM Bill Huffman <huffman@...> wrote: | On 8/13/20 2:33 PM, Andrew Waterman wrote: | EXTERNAL MAIL | On Thu, Aug 13, 2020 at 2:29 PM Bill Huffman <huffman@...> | wrote: | I think maybe I'm done complaining. :-) | Hopefully because we've converged, not simply due to exhaustion :) | Happily, yes. :-) | Except that the initial paragraph on recip operation needs the | words "concatenated and" removed. | Thanks. | I'm going to merge the pull request now, but additional feedback | is still welcome, of course. | Sounds good. | Bill | Bill | On 8/13/20 2:11 PM, Andrew Waterman wrote: | EXTERNAL MAIL | Good thinking. I've added analogous language for recip, | too. | On Thu, Aug 13, 2020 at 12:58 PM Bill Huffman < | huffman@...> wrote: | Andrew, | I'll start at the top here... and with rsqrt since | it's simpler. I think the table and most of the | commentary is fine. I can follow the operation | description. Sort of. But I'm trying to figure out | how it can be improved. It currently says: | For the non-exceptional cases, the result is | computed as follows. Let the normalized input | exponent be equal to the input exponent if the | input is normal, or 0 minus the number of leading | zeros in the significand otherwise. If the input | is subnormal, the normalized input significand is | given by shifting the input significand left by 1 | minus the normalized input exponent, discarding | the leading 1 bit. The output exponent equals | floor((3*B - 1 - the normalized input exponent) / | 2). The output sign equals the input sign. | The following table gives the seven MSBs of the | output significand as a function of the LSB of the | normalized input exponent and the six MSBs of the | normalized input significand; the other bits of | the output significand are zero. | I wonder if a high level description given first might | help. For example: | For the non-exceptional cases the low bit of exponent | and the six bits of significand (after the leading | one) are concatenated and used to address the | following table. The output of the table becomes the | seven bits of the result significand (after the | leading one) and the remainder of the result | signifcand is zero. Denorm inputs are normalized and | the exponent adjusted appropriately before the | lookup. The output exponent is chosen to make the | result approximate the reciprocal of the square root | of the argument. | More precisely, the result is computed as follows. | .... <your description> | Bill | On 8/12/20 9:19 PM, Andrew Waterman wrote: | EXTERNAL MAIL | On Wed, Aug 12, 2020 at 8:36 PM Bill Huffman < | huffman@...> wrote: | On 8/12/20 7:05 PM, Andrew Waterman wrote: | EXTERNAL MAIL | On Wed, Aug 12, 2020 at 6:56 PM Bill | Huffman <huffman@...> wrote: | On 8/12/20 4:21 PM, Andrew Waterman | wrote: | EXTERNAL MAIL | On Wed, Aug 12, 2020 at 3:37 PM Bill | Huffman <huffman@...> wrote: | On 8/12/20 3:32 PM, Andrew Waterman | wrote: | EXTERNAL MAIL | On Wed, Aug 12, 2020 at 3:18 PM Bill | Huffman <huffman@...> wrote: | On 8/11/20 4:11 PM, Andrew Waterman | wrote: | EXTERNAL MAIL | On Tue, Aug 11, 2020 at 3:35 PM Bill | Huffman <huffman@...> wrote: | On 8/11/20 3:00 PM, Andrew Waterman | wrote: | EXTERNAL MAIL | On Tue, Aug 11, 2020 at 1:56 PM Bill | Huffman <huffman@...> wrote: | Hi Andrew, | I'm looking at the cases where the | reciprocal is near the boundary | between finite and infinite or between | normal and denormal. Are you trying | to get the boundaries approximately | right? Or exactly? For example, the | point at which the reciprocal of a | large positive denorm falls over the | boundary between MAXPOS and +Inf is | different for RUP and RNE. The | setting of OF changes at the same | point. There's yet another point at | which OF changes for RDN even though | the answer doesn't change. You don't | show UF set anywhere. | There are no cases where UF should be | raised because there are no cases | where denormalization causes loss of | precision. When the result is | subnormal, it is only subnormal by | either one or two positions; the | denormalized 7-bit significand plus | two bits of right-shift fits within | all of our formats' significands. | (This property doesn't hold for | bfloat16, but that point might be moot | if our variant of that format always | flushes subnormals to zero.) | Ah, so you're counting the 7-bit (plus | hidden bit) result as the absolutely | correct answer. There's no | relationship here to the infinite | precision reciprocal we're | approximating. This is an instruction | that throws away 16 bits of input | mantissa, does a table lookup, and | gives an answer that's exactly 7 bits | (plus hidden bit). The relationship | of this instruction to a reciprocal is | one of motivation and not closer than | that. | I think that's the answer to the | paradigm question I had. I'll think | about that a bit and see what I think | of your edge case results and flags | then. | Ah, that clarifies your earlier | question. Yeah, LMK what you think. | With that re-orientation to what the | instruction means, it looks correct. | I have a couple of comments: | ★ Just above the table you use the | concept of the instruction's | "domain." But the idea of its domain | does not seem very clear to me. I | lean toward removing the statement and | depending on the table. | ★ In the first normative paragraph | after the table, you use the number of | leading zeros in the significand. | That assumes that the term | "significand" does not include the | "hidden" bit, which is zero in the | case of interest. I think a | single-precision significand may be | considered to be 23 bits by some and | 24 bits by others, leading to some | confusion about that sentence. It | might work to reference the leading | zeros in the represented part of the | significand. | ★ As I read farther, it's pretty | confusing. I worry for most people | reading it. I wonder if there should | be a second table referenced where the | first table says "estimate of 1/x" and | dealing only with the magnitude of the | argument. The second table would have | five rows labeled by operand range - | as below - and detail each range with | regard to exponent and | denormalization: | ◎ 2^(-B-1) =< x < 2^(-B) | ◎ 2^(-B) =< x < 2^(-B+1) | ◎ 2^(-B+1) =< x < 2^(B-1) | ◎ 2^(B-1) =< x < 2^(B) | ◎ 2^(B) =< x < 2^(B+1) | ★ The reciprocal square root would | be a little different but the same | idea would apply. | Any of that make sense? | Yeah, let me play around with the | presentation a bit. I'm not sure | whether breaking it into two tables or | expanding the current table will be | clearer, but your suggestion holds | either way. Thanks for being my | guinea pig. | I almost suggested expanding the | current table. That makes it quite a | bit larger. But then, it also means | there's no need to clarify the | relationship between the two tables. | Maybe that's better. And it doesn't | expand the recip sqrt table. | How about this... it's a beast, but I | think it works. | https://github.com/riscv/riscv-v-spec/blob/vfrecip/v-spec.adoc#149-vector-floating-point-reciprocal-estimate-instruction | It is pretty big... I'm just looking | at the recip at this point. I have a | couple of thoughts: | Yeah, but big is OK, I think. | Probably so. | I didn't change the rsqrt table at all. | Since the subnormal cases are mostly | uninteresting, I think the NOTE that | positive subnormal and normal inputs | always produce normal outputs suffices. | That's probably OK. It's much less | confusing. I wonder if two examples for each | (recip and rsqrt) would help. One with a | denormal input and the other normal? | I had been hoping that the reference C code would | scratch that itch, but you're probably right. | I've added a tiny example and a huge example for | each. | □ In the "Output" column for the 5 | new positive and negative entries, you | have ... > y > ... but I think you | should have ... >= y > ... because | when the input is 127 the table has | output 0. So when the input is near | the "left" end of the input range as | expressed in the table, the output is | all the way at the left end of the | output range and needs the "equal." | It's actually correct as-is, because the | output value is never exactly a power of | 2. When the input is exactly a power of | 2, the result is always slightly larger | than the true reciprocal. (It's the | reciprocal of some number near the | midpoint of the interval interval ( 2^n, | nextafter(2^n) )). | When the input mantissa (including hidden bit) | is 0xFF0000, the output mantissa is 0x800000, | if I'm reading the table correctly - 127 in | leads to zero out. | The second row in the table has input: | -2^B+1 < x ≤ -2^B (normal) | Table input 127 is near the left end of the | range while table input 0 is absolutely at the | right end. | The left end is not representable but is just | farther from zero than than 0xFF7F_FFFF | single-precision. The right end is | 0xFF00_0000 single-precision. These turn into | 127 and 0 as table inputs and into 0 and 127 | as table outputs. Then they're 0x8020_0000 | and 0x803F_C000 as single-precision. So the | left end is equal to -2^-(B+1). | and output is listed as: | -2^-(B+1) > y > -2^-B (subnormal, sig | [MSB:MSB-1]=01) | but should allow the equal on the left, | shouldn't it? | | My mistake. I was thinking of the fact that | power-of-2 inputs never produce power-of-2 | outputs. You're of course right that | just-smaller-than-power-of-2 inputs do produce | power-of-2 outputs. Thanks for the correction. | I also reordered the spec so that vfrsqrte7 shows | up before vfrece7, since the former is so much | simpler to explain. More sanity-checking | appreciated. | https://github.com/riscv/riscv-v-spec/blob/vfrecip/v-spec.adoc#149-vector-floating-point-reciprocal-square-root-estimate-instruction | Bill | □ The expressions of subnormal are | still awkward. What about (subnormal | 01...) or (subnormal 1...) and explain | later what that means. It would be | easier to read (and the table would be | a bit smaller). | Thanks, I was hoping someone would suggest | a better way of expressing that. | Bill | | Bill | Bill | Bill | As to the large positive denorm input | case: the only case where this scheme | and IEEE (1.0 / x) differ in the | finity of the result, or differ in | whether OF is raised, is for the exact | input 2^-(B+1), depending on the | rounding mode. We always produce a | finite result for this case, but | there's an arguable reason for it: | we're actually computing the | reciprocal of some number near the | midpoint of the interval ( 2^-(B+1), | nextafter(2^-(B+1)) ), the result of | which is finite, regardless of the | rounding mode. | So, I'm wondering what your paradigm | is for the edge cases. I can see it | might not be worth being too | complicated since the answer isn't | very exact. The paradigm is further | complicated by the idea that the | answer may be refined by further | steps. :-) | Yeah... the intent was to have | reasonable fidelity. I think you can | argue the 2^-(B+1) case either way, | but other ISAs have resolved it the | same way I did. And it's clearly a | feature that corner-case detection | doesn't depend on the significand | (except for its zeroness, that is). | Bill | On 8/10/20 8:54 PM, Andrew Waterman | wrote: | EXTERNAL MAIL | I've PRed a full definition of these | instructions. Please sanity-check my | work: | https://github.com/riscv/riscv-v-spec/blob/78191da47644053d0605b21628e1f5e7961ad5bf/v-spec.adoc#149-vector-floating-point-reciprocal-estimate-instruction | On Mon, Aug 3, 2020 at 5:44 PM Bill | Huffman <huffman@...> wrote: | On 8/3/20 1:41 PM, Andrew Waterman | wrote: | EXTERNAL MAIL | On Mon, Aug 3, 2020 at 12:40 PM Bill | Huffman <huffman@...> wrote: | The recip table matches mine as does | the worst case error. | I have one different entry in the | square root table. For entry 77, | where you have 36, I have 37. I'm not | sure whether it matters. Also, ages | ago, I got a very small difference in | worst case error of 2^-7.317 but I | haven't gone back to trace anything | down about that. | Thanks for validating against your | table, Bill. | With my value for that entry, the | worst error on the interval of | interest is 2^-7.32041, for input | 0x3f1a0000. With yours, it's 2^ | -7.3164 for 0x3f1bfffd. | I agree with your computation with a | really tiny difference (I get that it | just barely rounds to 2^-7.32040). I | can't say why I got 37 when I did it | 8-10 years ago - and I don't think I'm | going to chase that. I'm good with 36 | at that position in the table. | So, I'm good with the table values | below. | Bill | Presumably the error's slightly | smaller for my scheme because I'm | picking the output value that | minimizes the maximum error on the | interval, rather than picking the | midpoint or similar. Of course, the | overall worst error is unaffected. | Bill | On 8/3/20 11:38 AM, DSHORNER wrote: | EXTERNAL MAIL | Now annotated version --detail | https://github.com/David-Horner/recip/blob/master/vrecip.cc | For the 7x7 below notice the biased | value does not exceed 21 for recip (5 | of 7 bits) and 15 for rsqrt (4 of 7 | bits). | ip 7 op 7 LUT #bits 896 verilog 0 | test/test-long 1 | Recip7x7LUT (input [6:0] in, output | reg [6:0] out); | in[6:0] corresponds to sig[S-1:S-6] | out[6:0] corresponds to sig[S-1:S-6] | biased : ((ipN-1) - in) << (op - ip) | // or >> if neg | base bias 127 left-shift 0 | right-shift 0 | 0: out = 127 biased 0; lerr | 0.00390625 rerr 0.00387573 larg 0.5 | rarg 0.503906 | 1: out = 125 biased 1; lerr 0.0039978 | rerr 0.00372314 larg 0.503906 rarg | 0.507812 | 2: out = 123 biased 2; lerr | 0.00421143 rerr 0.00344849 larg | 0.507812 rarg 0.511719 | 3: out = 121 biased 3; lerr | 0.00454712 rerr 0.00305176 larg | 0.511719 rarg 0.515625 | 4: out = 119 biased 4; lerr | 0.00500488 rerr 0.00253296 larg | 0.515625 rarg 0.519531 | 5: out = 117 biased 5; lerr | 0.00558472 rerr 0.00189209 larg | 0.519531 rarg 0.523438 | 6: out = 116 biased 5; lerr | 0.00219727 rerr 0.00524902 larg | 0.523438 rarg 0.527344 | 7: out = 114 biased 6; lerr | 0.00299072 rerr 0.00439453 larg | 0.527344 rarg 0.53125 | 8: out = 112 biased 7; lerr | 0.00390625 rerr 0.00341797 larg | 0.53125 rarg 0.535156 | 9: out = 110 biased 8; lerr | 0.00494385 rerr 0.00231934 larg | 0.535156 rarg 0.539062 | 10: out = 109 biased 8; lerr | 0.00189209 rerr 0.00534058 larg | 0.539062 rarg 0.542969 | 11: out = 107 biased 9; lerr | 0.00314331 rerr 0.00402832 larg | 0.542969 rarg 0.546875 | 12: out = 105 biased 10; lerr | 0.0045166 rerr 0.00259399 larg | 0.546875 rarg 0.550781 | 13: out = 104 biased 10; lerr | 0.00170898 rerr 0.00537109 larg | 0.550781 rarg 0.554688 | 14: out = 102 biased 11; lerr | 0.0032959 rerr 0.00372314 larg | 0.554688 rarg 0.558594 | 15: out = 100 biased 12; lerr | 0.00500488 rerr 0.00195312 larg | 0.558594 rarg 0.5625 | 16: out = 99 biased 12; lerr | 0.00244141 rerr 0.00448608 larg 0.5625 | rarg 0.566406 | 17: out = 97 biased 13; lerr | 0.00436401 rerr 0.00250244 larg | 0.566406 rarg 0.570312 | 18: out = 96 biased 13; lerr | 0.00195312 rerr 0.00488281 larg | 0.570312 rarg 0.574219 | 19: out = 94 biased 14; lerr | 0.00408936 rerr 0.00268555 larg | 0.574219 rarg 0.578125 | 20: out = 93 biased 14; lerr | 0.00183105 rerr 0.00491333 larg | 0.578125 rarg 0.582031 | 21: out = 91 biased 15; lerr | 0.00418091 rerr 0.00250244 larg | 0.582031 rarg 0.585938 | 22: out = 90 biased 15; lerr | 0.0020752 rerr 0.00457764 larg | 0.585938 rarg 0.589844 | 23: out = 88 biased 16; lerr | 0.00463867 rerr 0.00195312 larg | 0.589844 rarg 0.59375 | 24: out = 87 biased 16; lerr | 0.00268555 rerr 0.00387573 larg | 0.59375 rarg 0.597656 | 25: out = 85 biased 17; lerr | 0.00546265 rerr 0.0010376 larg | 0.597656 rarg 0.601562 | 26: out = 84 biased 17; lerr | 0.00366211 rerr 0.00280762 larg | 0.601562 rarg 0.605469 | 27: out = 83 biased 17; lerr | 0.00192261 rerr 0.0045166 larg | 0.605469 rarg 0.609375 | 28: out = 81 biased 18; lerr | 0.00500488 rerr 0.00137329 larg | 0.609375 rarg 0.613281 | 29: out = 80 biased 18; lerr | 0.00341797 rerr 0.00292969 larg | 0.613281 rarg 0.617188 | 30: out = 79 biased 18; lerr | 0.00189209 rerr 0.00442505 larg | 0.617188 rarg 0.621094 | 31: out = 77 biased 19; lerr | 0.00527954 rerr 0.000976562 larg | 0.621094 rarg 0.625 | 32: out = 76 biased 19; lerr | 0.00390625 rerr 0.00231934 larg 0.625 | rarg 0.628906 | 33: out = 75 biased 19; lerr | 0.00259399 rerr 0.00360107 larg | 0.628906 rarg 0.632812 | 34: out = 74 biased 19; lerr | 0.00134277 rerr 0.00482178 larg | 0.632812 rarg 0.636719 | 35: out = 72 biased 20; lerr | 0.00512695 rerr 0.000976562 larg | 0.636719 rarg 0.640625 | 36: out = 71 biased 20; lerr | 0.00402832 rerr 0.00204468 larg | 0.640625 rarg 0.644531 | 37: out = 70 biased 20; lerr | 0.00299072 rerr 0.00305176 larg | 0.644531 rarg 0.648438 | 38: out = 69 biased 20; lerr | 0.00201416 rerr 0.0039978 larg | 0.648438 rarg 0.652344 | 39: out = 68 biased 20; lerr | 0.00109863 rerr 0.00488281 larg | 0.652344 rarg 0.65625 | 40: out = 66 biased 21; lerr | 0.00537109 rerr 0.000549316 larg | 0.65625 rarg 0.660156 | 41: out = 65 biased 21; lerr | 0.00460815 rerr 0.00128174 larg | 0.660156 rarg 0.664062 | 42: out = 64 biased 21; lerr | 0.00390625 rerr 0.00195312 larg | 0.664062 rarg 0.667969 | 43: out = 63 biased 21; lerr | 0.00326538 rerr 0.00256348 larg | 0.667969 rarg 0.671875 | 44: out = 62 biased 21; lerr | 0.00268555 rerr 0.00311279 larg | 0.671875 rarg 0.675781 | 45: out = 61 biased 21; lerr | 0.00216675 rerr 0.00360107 larg | 0.675781 rarg 0.679688 | 46: out = 60 biased 21; lerr | 0.00170898 rerr 0.00402832 larg | 0.679688 rarg 0.683594 | 47: out = 59 biased 21; lerr | 0.00131226 rerr 0.00439453 larg | 0.683594 rarg 0.6875 | 48: out = 58 biased 21; lerr | 0.000976562 rerr 0.00469971 larg | 0.6875 rarg 0.691406 | 49: out = 57 biased 21; lerr | 0.000701904 rerr 0.00494385 larg | 0.691406 rarg 0.695312 | 50: out = 56 biased 21; lerr | 0.000488281 rerr 0.00512695 larg | 0.695312 rarg 0.699219 | 51: out = 55 biased 21; lerr | 0.000335693 rerr 0.00524902 larg | 0.699219 rarg 0.703125 | 52: out = 54 biased 21; lerr | 0.000244141 rerr 0.00531006 larg | 0.703125 rarg 0.707031 | 53: out = 53 biased 21; lerr | 0.000213623 rerr 0.00531006 larg | 0.707031 rarg 0.710938 | 54: out = 52 biased 21; lerr | 0.000244141 rerr 0.00524902 larg | 0.710938 rarg 0.714844 | 55: out = 51 biased 21; lerr | 0.000335693 rerr 0.00512695 larg | 0.714844 rarg 0.71875 | 56: out = 50 biased 21; lerr | 0.000488281 rerr 0.00494385 larg | 0.71875 rarg 0.722656 | 57: out = 49 biased 21; lerr | 0.000701904 rerr 0.00469971 larg | 0.722656 rarg 0.726562 | 58: out = 48 biased 21; lerr | 0.000976562 rerr 0.00439453 larg | 0.726562 rarg 0.730469 | 59: out = 47 biased 21; lerr | 0.00131226 rerr 0.00402832 larg | 0.730469 rarg 0.734375 | 60: out = 46 biased 21; lerr | 0.00170898 rerr 0.00360107 larg | 0.734375 rarg 0.738281 | 61: out = 45 biased 21; lerr | 0.00216675 rerr 0.00311279 larg | 0.738281 rarg 0.742188 | 62: out = 44 biased 21; lerr | 0.00268555 rerr 0.00256348 larg | 0.742188 rarg 0.746094 | 63: out = 43 biased 21; lerr | 0.00326538 rerr 0.00195312 larg | 0.746094 rarg 0.75 | 64: out = 42 biased 21; lerr | 0.00390625 rerr 0.00128174 larg 0.75 | rarg 0.753906 | 65: out = 41 biased 21; lerr | 0.00460815 rerr 0.000549316 larg | 0.753906 rarg 0.757812 | 66: out = 40 biased 21; lerr | 0.00537109 rerr 0.000244141 larg | 0.757812 rarg 0.761719 | 67: out = 40 biased 20; lerr | 0.000244141 rerr 0.00488281 larg | 0.761719 rarg 0.765625 | 68: out = 39 biased 20; lerr | 0.00109863 rerr 0.0039978 larg | 0.765625 rarg 0.769531 | 69: out = 38 biased 20; lerr | 0.00201416 rerr 0.00305176 larg | 0.769531 rarg 0.773438 | 70: out = 37 biased 20; lerr | 0.00299072 rerr 0.00204468 larg | 0.773438 rarg 0.777344 | 71: out = 36 biased 20; lerr | 0.00402832 rerr 0.000976562 larg | 0.777344 rarg 0.78125 | 72: out = 35 biased 20; lerr | 0.00512695 rerr 0.000152588 larg | 0.78125 rarg 0.785156 | 73: out = 35 biased 19; lerr | 0.000152588 rerr 0.00482178 larg | 0.785156 rarg 0.789062 | 74: out = 34 biased 19; lerr | 0.00134277 rerr 0.00360107 larg | 0.789062 rarg 0.792969 | 75: out = 33 biased 19; lerr | 0.00259399 rerr 0.00231934 larg | 0.792969 rarg 0.796875 | 76: out = 32 biased 19; lerr | 0.00390625 rerr 0.000976562 larg | 0.796875 rarg 0.800781 | 77: out = 31 biased 19; lerr | 0.00527954 rerr 0.000427246 larg | 0.800781 rarg 0.804688 | 78: out = 31 biased 18; lerr | 0.000427246 rerr 0.00442505 larg | 0.804688 rarg 0.808594 | 79: out = 30 biased 18; lerr | 0.00189209 rerr 0.00292969 larg | 0.808594 rarg 0.8125 | 80: out = 29 biased 18; lerr | 0.00341797 rerr 0.00137329 larg 0.8125 | rarg 0.816406 | 81: out = 28 biased 18; lerr | 0.00500488 rerr 0.000244141 larg | 0.816406 rarg 0.820312 | 82: out = 28 biased 17; lerr | 0.000244141 rerr 0.0045166 larg | 0.820312 rarg 0.824219 | 83: out = 27 biased 17; lerr | 0.00192261 rerr 0.00280762 larg | 0.824219 rarg 0.828125 | 84: out = 26 biased 17; lerr | 0.00366211 rerr 0.0010376 larg | 0.828125 rarg 0.832031 | 85: out = 25 biased 17; lerr | 0.00546265 rerr 0.000793457 larg | 0.832031 rarg 0.835938 | 86: out = 25 biased 16; lerr | 0.000793457 rerr 0.00387573 larg | 0.835938 rarg 0.839844 | 87: out = 24 biased 16; lerr | 0.00268555 rerr 0.00195312 larg | 0.839844 rarg 0.84375 | 88: out = 23 biased 16; lerr | 0.00463867 rerr 3.05176E-05 larg | 0.84375 rarg 0.847656 | 89: out = 23 biased 15; lerr | 3.05176E-05 rerr 0.00457764 larg | 0.847656 rarg 0.851562 | 90: out = 22 biased 15; lerr | 0.0020752 rerr 0.00250244 larg | 0.851562 rarg 0.855469 | 91: out = 21 biased 15; lerr | 0.00418091 rerr 0.000366211 larg | 0.855469 rarg 0.859375 | 92: out = 21 biased 14; lerr | 0.000366211 rerr 0.00491333 larg | 0.859375 rarg 0.863281 | 93: out = 20 biased 14; lerr | 0.00183105 rerr 0.00268555 larg | 0.863281 rarg 0.867188 | 94: out = 19 biased 14; lerr | 0.00408936 rerr 0.000396729 larg | 0.867188 rarg 0.871094 | 95: out = 19 biased 13; lerr | 0.000396729 rerr 0.00488281 larg | 0.871094 rarg 0.875 | 96: out = 18 biased 13; lerr | 0.00195312 rerr 0.00250244 larg 0.875 | rarg 0.878906 | 97: out = 17 biased 13; lerr | 0.00436401 rerr 6.10352E-05 larg | 0.878906 rarg 0.882812 | 98: out = 17 biased 12; lerr | 6.10352E-05 rerr 0.00448608 larg | 0.882812 rarg 0.886719 | 99: out = 16 biased 12; lerr | 0.00244141 rerr 0.00195312 larg | 0.886719 rarg 0.890625 | 100: out = 15 biased 12; lerr | 0.00500488 rerr 0.000640869 larg | 0.890625 rarg 0.894531 | 101: out = 15 biased 11; lerr | 0.000640869 rerr 0.00372314 larg | 0.894531 rarg 0.898438 | 102: out = 14 biased 11; lerr | 0.0032959 rerr 0.0010376 larg 0.898438 | rarg 0.902344 | 103: out = 14 biased 10; lerr | 0.0010376 rerr 0.00537109 larg | 0.902344 rarg 0.90625 | 104: out = 13 biased 10; lerr | 0.00170898 rerr 0.00259399 larg | 0.90625 rarg 0.910156 | 105: out = 12 biased 10; lerr | 0.0045166 rerr 0.000244141 larg | 0.910156 rarg 0.914062 | 106: out = 12 biased 9; lerr | 0.000244141 rerr 0.00402832 larg | 0.914062 rarg 0.917969 | 107: out = 11 biased 9; lerr | 0.00314331 rerr 0.00109863 larg | 0.917969 rarg 0.921875 | 108: out = 11 biased 8; lerr | 0.00109863 rerr 0.00534058 larg | 0.921875 rarg 0.925781 | 109: out = 10 biased 8; lerr | 0.00189209 rerr 0.00231934 larg | 0.925781 rarg 0.929688 | 110: out = 9 biased 8; lerr | 0.00494385 rerr 0.000762939 larg | 0.929688 rarg 0.933594 | 111: out = 9 biased 7; lerr | 0.000762939 rerr 0.00341797 larg | 0.933594 rarg 0.9375 | 112: out = 8 biased 7; lerr | 0.00390625 rerr 0.000244141 larg | 0.9375 rarg 0.941406 | 113: out = 8 biased 6; lerr | 0.000244141 rerr 0.00439453 larg | 0.941406 rarg 0.945312 | 114: out = 7 biased 6; lerr | 0.00299072 rerr 0.00112915 larg | 0.945312 rarg 0.949219 | 115: out = 7 biased 5; lerr | 0.00112915 rerr 0.00524902 larg | 0.949219 rarg 0.953125 | 116: out = 6 biased 5; lerr | 0.00219727 rerr 0.00189209 larg | 0.953125 rarg 0.957031 | 117: out = 5 biased 5; lerr | 0.00558472 rerr 0.00152588 larg | 0.957031 rarg 0.960938 | 118: out = 5 biased 4; lerr | 0.00152588 rerr 0.00253296 larg | 0.960938 rarg 0.964844 | 119: out = 4 biased 4; lerr | 0.00500488 rerr 0.000976562 larg | 0.964844 rarg 0.96875 | 120: out = 4 biased 3; lerr | 0.000976562 rerr 0.00305176 larg | 0.96875 rarg 0.972656 | 121: out = 3 biased 3; lerr | 0.00454712 rerr 0.000549316 larg | 0.972656 rarg 0.976562 | 122: out = 3 biased 2; lerr | 0.000549316 rerr 0.00344849 larg | 0.976562 rarg 0.980469 | 123: out = 2 biased 2; lerr | 0.00421143 rerr 0.000244141 larg | 0.980469 rarg 0.984375 | 124: out = 2 biased 1; lerr | 0.000244141 rerr 0.00372314 larg | 0.984375 rarg 0.988281 | 125: out = 1 biased 1; lerr 0.0039978 | rerr 6.10352E-05 larg 0.988281 rarg | 0.992188 | 126: out = 1 biased 0; lerr | 6.10352E-05 rerr 0.00387573 larg | 0.992188 rarg 0.996094 | 127: out = 0 biased 0; lerr | 0.00390625 rerr 0 larg 0.996094 rarg 1 | ... [removed hex data dumping] | RSqrt7x7LUT (input [6:0] in, output | reg [6:0] out); | // in[6] corresponds to exp[0] | // in[5:0] corresponds to sig | [S-1:S-5] | // out[6:0] corresponds to sig | [S-1:S-6] | // biased : ((ipN-1) - in) << (op - | ip) | 0: out 127 biased 0; lerr 0.00390625 | rerr 0.00384557 larg 0.25 rarg | 0.253906 | 1: out 125 biased 1; lerr 0.00402773 | rerr 0.00360435 larg 0.253906 rarg | 0.257812 | 2: out 123 biased 2; lerr 0.00432928 | rerr 0.00318533 larg 0.257812 rarg | 0.261719 | 3: out 121 biased 3; lerr 0.00480818 | rerr 0.00259111 larg 0.261719 rarg | 0.265625 | 4: out 119 biased 4; lerr 0.00546183 | rerr 0.00182426 larg 0.265625 rarg | 0.269531 | 5: out 118 biased 4; lerr 0.0022317 | rerr 0.00497249 larg 0.269531 rarg | 0.273438 | 6: out 116 biased 5; lerr 0.00319802 | rerr 0.00389675 larg 0.273438 rarg | 0.277344 | 7: out 114 biased 6; lerr 0.00433191 | rerr 0.00265532 larg 0.277344 rarg | 0.28125 | 8: out 113 biased 6; lerr 0.00148789 | rerr 0.00542232 larg 0.28125 rarg | 0.285156 | 9: out 111 biased 7; lerr 0.00292144 | rerr 0.00388464 larg 0.285156 rarg | 0.289062 | 10: out 109 biased 8; lerr 0.00451607 | rerr 0.0021876 larg 0.289062 rarg | 0.292969 | 11: out 108 biased 8; lerr 0.00204104 | rerr 0.00458999 larg 0.292969 rarg | 0.296875 | 12: out 106 biased 9; lerr 0.00392348 | rerr 0.00260824 larg 0.296875 rarg | 0.300781 | 13: out 105 biased 9; lerr 0.00167641 | rerr 0.00478529 larg 0.300781 rarg | 0.304688 | 14: out 103 biased 10; lerr | 0.00383947 rerr 0.00252584 larg | 0.304688 rarg 0.308594 | 15: out 102 biased 10; lerr 0.0018141 | rerr 0.00448366 larg 0.308594 rarg | 0.3125 | 16: out 100 biased 11; lerr | 0.00425098 rerr 0.00195312 larg 0.3125 | rarg 0.316406 | 17: out 99 biased 11; lerr 0.00244141 | rerr 0.00369747 larg 0.316406 rarg | 0.320312 | 18: out 97 biased 12; lerr 0.00514568 | rerr 0.000902127 larg 0.320312 rarg | 0.324219 | 19: out 96 biased 12; lerr 0.00354633 | rerr 0.00243843 larg 0.324219 rarg | 0.328125 | 20: out 95 biased 12; lerr 0.00203674 | rerr 0.00388594 larg 0.328125 rarg | 0.332031 | 21: out 93 biased 13; lerr 0.00511752 | rerr 0.000717621 larg 0.332031 rarg | 0.335938 | 22: out 92 biased 13; lerr 0.00381051 | rerr 0.00196455 larg 0.335938 rarg | 0.339844 | 23: out 91 biased 13; lerr 0.00258984 | rerr 0.00312603 larg 0.339844 rarg | 0.34375 | 24: out 90 biased 13; lerr 0.00145446 | rerr 0.00420307 larg 0.34375 rarg | 0.347656 | 25: out 88 biased 14; lerr 0.0050098 | rerr 0.000564416 larg 0.347656 rarg | 0.351562 | 26: out 87 biased 14; lerr 0.00406783 | rerr 0.00144985 larg 0.351562 rarg | 0.355469 | 27: out 86 biased 14; lerr 0.00320806 | rerr 0.00225385 larg 0.355469 rarg | 0.359375 | 28: out 85 biased 14; lerr 0.00242958 | rerr 0.00297735 larg 0.359375 rarg | 0.363281 | 29: out 84 biased 14; lerr 0.00173146 | rerr 0.00362122 larg 0.363281 rarg | 0.367188 | 30: out 83 biased 14; lerr 0.00111284 | rerr 0.00418633 larg 0.367188 rarg | 0.371094 | 31: out 82 biased 14; lerr | 0.000572846 rerr 0.00467353 larg | 0.371094 rarg 0.375 | 32: out 80 biased 15; lerr 0.00489479 | rerr 0.00027462 larg 0.375 rarg | 0.378906 | 33: out 79 biased 15; lerr 0.00453439 | rerr 0.000583717 larg 0.378906 rarg | 0.382812 | 34: out 78 biased 15; lerr 0.00425002 | rerr 0.000817442 larg 0.382812 rarg | 0.386719 | 35: out 77 biased 15; lerr 0.0040409 | rerr 0.000976562 larg 0.386719 rarg | 0.390625 | 36: out 76 biased 15; lerr 0.00390625 | rerr 0.00106183 larg 0.390625 rarg | 0.394531 | 37: out 75 biased 15; lerr 0.00384534 | rerr 0.00107398 larg 0.394531 rarg | 0.398438 | 38: out 74 biased 15; lerr 0.00385742 | rerr 0.00101372 larg 0.398438 rarg | 0.402344 | 39: out 73 biased 15; lerr 0.00394179 | rerr 0.00088176 larg 0.402344 rarg | 0.40625 | 40: out 72 biased 15; lerr 0.00409775 | rerr 0.000678786 larg 0.40625 rarg | 0.410156 | 41: out 71 biased 15; lerr 0.00432461 | rerr 0.000405468 larg 0.410156 rarg | 0.414062 | 42: out 70 biased 15; lerr 0.0046217 | rerr 6.24637E-05 larg 0.414062 rarg | 0.417969 | 43: out 70 biased 14; lerr | 6.24637E-05 rerr 0.00472478 larg | 0.417969 rarg 0.421875 | 44: out 69 biased 14; lerr | 0.000349583 rerr 0.00426776 larg | 0.421875 rarg 0.425781 | 45: out 68 biased 14; lerr | 0.000830041 rerr 0.00374284 larg | 0.425781 rarg 0.429688 | 46: out 67 biased 14; lerr 0.00137829 | rerr 0.00315063 larg 0.429688 rarg | 0.433594 | 47: out 66 biased 14; lerr 0.00199374 | rerr 0.00249171 larg 0.433594 rarg | 0.4375 | 48: out 65 biased 14; lerr 0.00267578 | rerr 0.00176667 larg 0.4375 rarg | 0.441406 | 49: out 64 biased 14; lerr 0.00342383 | rerr 0.000976086 larg 0.441406 rarg | 0.445312 | 50: out 63 biased 14; lerr 0.00423733 | rerr 0.000120513 larg 0.445312 rarg | 0.449219 | 51: out 63 biased 13; lerr | 0.000120513 rerr 0.00445945 larg | 0.449219 rarg 0.453125 | 52: out 62 biased 13; lerr | 0.000799499 rerr 0.00349816 larg | 0.453125 rarg 0.457031 | 53: out 61 biased 13; lerr 0.00178341 | rerr 0.00247339 larg 0.457031 rarg | 0.460938 | 54: out 60 biased 13; lerr 0.0028307 | rerr 0.00138568 larg 0.460938 rarg | 0.464844 | 55: out 59 biased 13; lerr 0.00394084 | rerr 0.00023553 larg 0.464844 rarg | 0.46875 | 56: out 59 biased 12; lerr 0.00023553 | rerr 0.00439453 larg 0.46875 rarg | 0.472656 | 57: out 58 biased 12; lerr | 0.000976562 rerr 0.00314314 larg | 0.472656 rarg 0.476562 | 58: out 57 biased 12; lerr 0.0022501 | rerr 0.00183069 larg 0.476562 rarg | 0.480469 | 59: out 56 biased 12; lerr 0.00358461 | rerr 0.000457659 larg 0.480469 rarg | 0.484375 | 60: out 56 biased 11; lerr | 0.000457659 rerr 0.00448366 larg | 0.484375 rarg 0.488281 | 61: out 55 biased 11; lerr | 0.000975489 rerr 0.00301265 larg | 0.488281 rarg 0.492188 | 62: out 54 biased 11; lerr 0.00246829 | rerr 0.00148234 larg 0.492188 rarg | 0.496094 | 63: out 53 biased 11; lerr 0.00402031 | rerr 0.000106817 larg 0.496094 rarg | 0.5 | 64: out 52 biased 11; lerr 0.00563109 | rerr 0.00210731 larg 0.5 rarg 0.507812 | 65: out 51 biased 11; lerr 0.00345996 | rerr 0.00417648 larg 0.507812 rarg | 0.515625 | 66: out 50 biased 11; lerr 0.00143345 | rerr 0.00610301 larg 0.515625 rarg | 0.523438 | 67: out 48 biased 12; lerr 0.00520152 | rerr 0.00219486 larg 0.523438 rarg | 0.53125 | 68: out 47 biased 12; lerr 0.00349943 | rerr 0.00380104 larg 0.53125 rarg | 0.539062 | 69: out 46 biased 12; lerr 0.00193497 | rerr 0.00527137 larg 0.539062 rarg | 0.546875 | 70: out 44 biased 13; lerr 0.00628347 | rerr 0.000789331 larg 0.546875 rarg | 0.554688 | 71: out 43 biased 13; lerr 0.00502921 | rerr 0.00195312 larg 0.554688 rarg | 0.5625 | 72: out 42 biased 13; lerr 0.00390625 | rerr 0.00298721 larg 0.5625 rarg | 0.570312 | 73: out 41 biased 13; lerr 0.00291271 | rerr 0.00389343 larg 0.570312 rarg | 0.578125 | 74: out 40 biased 13; lerr 0.00204677 | rerr 0.00467353 larg 0.578125 rarg | 0.585938 | 75: out 39 biased 13; lerr 0.00130667 | rerr 0.00532924 larg 0.585938 rarg | 0.59375 | 76: out 38 biased 13; lerr | 0.000690699 rerr 0.00586222 larg | 0.59375 rarg 0.601562 | 77: out 36 biased 14; lerr 0.0062566 | rerr 0.000175461 larg 0.601562 rarg | 0.609375 | 78: out 35 biased 14; lerr 0.00592317 | rerr 0.000428823 larg 0.609375 rarg | 0.617188 | 79: out 34 biased 14; lerr 0.00570878 | rerr 0.000564416 larg 0.617188 rarg | 0.625 | 80: out 33 biased 14; lerr 0.00561191 | rerr 0.000583717 larg 0.625 rarg | 0.632812 | 81: out 32 biased 14; lerr 0.00563109 | rerr 0.000488162 larg 0.632812 rarg | 0.640625 | 82: out 31 biased 14; lerr 0.00576489 | rerr 0.000279149 larg 0.640625 rarg | 0.648438 | 83: out 30 biased 14; lerr 0.00601191 | rerr 4.19626E-05 larg 0.648438 rarg | 0.65625 | 84: out 30 biased 13; lerr | 4.19626E-05 rerr 0.00589256 larg | 0.65625 rarg 0.664062 | 85: out 29 biased 13; lerr 0.00047385 | rerr 0.00538852 larg 0.664062 rarg | 0.671875 | 86: out 28 biased 13; lerr 0.00101522 | rerr 0.00477604 larg 0.671875 rarg | 0.679688 | 87: out 27 biased 13; lerr 0.00166483 | rerr 0.00405633 larg 0.679688 rarg | 0.6875 | 88: out 26 biased 13; lerr 0.00242145 | rerr 0.0032306 larg 0.6875 rarg | 0.695312 | 89: out 25 biased 13; lerr 0.00328389 | rerr 0.0023 larg 0.695312 rarg | 0.703125 | 90: out 24 biased 13; lerr 0.00425098 | rerr 0.00126568 larg 0.703125 rarg | 0.710938 | 91: out 23 biased 13; lerr 0.0053216 | rerr 0.000128738 larg 0.710938 rarg | 0.71875 | 92: out 23 biased 12; lerr | 0.000128738 rerr 0.00554953 larg | 0.71875 rarg 0.726562 | 93: out 22 biased 12; lerr 0.00110974 | rerr 0.00424628 larg 0.726562 rarg | 0.734375 | 94: out 21 biased 12; lerr 0.0024487 | rerr 0.00284339 larg 0.734375 rarg | 0.742188 | 95: out 20 biased 12; lerr 0.0038871 | rerr 0.00134187 larg 0.742188 rarg | 0.75 | 96: out 19 biased 12; lerr 0.00542395 | rerr 0.000257287 larg 0.75 rarg | 0.757812 | 97: out 19 biased 11; lerr | 0.000257287 rerr 0.00488281 larg | 0.757812 rarg 0.765625 | 98: out 18 biased 11; lerr 0.00195312 | rerr 0.00312603 larg 0.765625 rarg | 0.773438 | 99: out 17 biased 11; lerr 0.0037447 | rerr 0.00127425 larg 0.773438 rarg | 0.78125 | 100: out 16 biased 11; lerr | 0.00563109 rerr 0.000671612 larg | 0.78125 rarg 0.789062 | 101: out 16 biased 10; lerr | 0.000671612 rerr 0.00426337 larg | 0.789062 rarg 0.796875 | 102: out 15 biased 10; lerr | 0.00271068 rerr 0.00216607 larg | 0.796875 rarg 0.804688 | 103: out 14 biased 10; lerr | 0.00484208 rerr 2.28884E-05 larg | 0.804688 rarg 0.8125 | 104: out 14 biased 9; lerr | 2.28884E-05 rerr 0.00477319 larg | 0.8125 rarg 0.820312 | 105: out 13 biased 9; lerr 0.00230268 | rerr 0.00243701 larg 0.820312 rarg | 0.828125 | 106: out 12 biased 9; lerr 0.00467248 | rerr 1.1444E-05 larg 0.828125 rarg | 0.835938 | 107: out 12 biased 8; lerr 1.1444E-05 | rerr 0.00467353 larg 0.835938 rarg | 0.84375 | 108: out 11 biased 8; lerr 0.00250271 | rerr 0.00210469 larg 0.84375 rarg | 0.851562 | 109: out 10 biased 8; lerr 0.0051047 | rerr 0.000551376 larg 0.851562 rarg | 0.859375 | 110: out 10 biased 7; lerr | 0.000551376 rerr 0.00398129 larg | 0.859375 rarg 0.867188 | 111: out 9 biased 7; lerr 0.00329393 | rerr 0.00118567 larg 0.867188 rarg | 0.875 | 112: out 9 biased 6; lerr 0.00118567 | rerr 0.00564531 larg 0.875 rarg | 0.882812 | 113: out 8 biased 6; lerr 0.00169516 | rerr 0.00271239 larg 0.882812 rarg | 0.890625 | 114: out 7 biased 6; lerr 0.0046605 | rerr 0.000304507 larg 0.890625 rarg | 0.898438 | 115: out 7 biased 5; lerr 0.000304507 | rerr 0.00403259 larg 0.898438 rarg | 0.90625 | 116: out 6 biased 5; lerr 0.00340469 | rerr 0.00088176 larg 0.90625 rarg | 0.914062 | 117: out 6 biased 4; lerr 0.00088176 | rerr 0.00514993 larg 0.914062 rarg | 0.921875 | 118: out 5 biased 4; lerr 0.00235119 | rerr 0.00186722 larg 0.921875 rarg | 0.929688 | 119: out 4 biased 4; lerr 0.00566562 | rerr 0.00149648 larg 0.929688 rarg | 0.9375 | 120: out 4 biased 3; lerr 0.00149648 | rerr 0.00265532 larg 0.9375 rarg | 0.945312 | 121: out 3 biased 3; lerr 0.00494055 | rerr 0.0008372 larg 0.945312 rarg | 0.953125 | 122: out 3 biased 2; lerr 0.0008372 | rerr 0.00324937 larg 0.953125 rarg | 0.960938 | 123: out 2 biased 2; lerr 0.00440902 | rerr 0.000370094 larg 0.960938 rarg | 0.96875 | 124: out 2 biased 1; lerr 0.000370094 | rerr 0.00365258 larg 0.96875 rarg | 0.976562 | 125: out 1 biased 1; lerr 0.00406783 | rerr 9.20338E-05 larg 0.976562 rarg | 0.984375 | 126: out 1 biased 0; lerr 9.20338E-05 | rerr 0.00386801 larg 0.984375 rarg | 0.992188 | 127: out 0 biased 0; lerr 0.00391391 | rerr 0 larg 0.992188 rarg 1 | ... [removed hex data dumping] | max recip 7x7 error at 0.519531: | 0.00558472 or 2^-7.4843 | max rsqrt 7x7 error at 0.546875: | 0.00628347 or 2^-7.31422 | On 2020-08-03 1:17 p.m., Bill Huffman | wrote: | I should have said that my results are | for the 7/7 case. And it sounds like | we're in agreement then. We probably | have the same table. | Bill | On 8/2/20 9:50 AM, DSHORNER wrote: | EXTERNAL MAIL | This is the link to the revised code | that does n by m LUT | https://github.com/David-Horner/recip/blob/master/vrecip.cc | On 2020-08-01 4:51 p.m., David Horner | via lists.riscv.org wrote: | |
|