Date   

Re: Fixed Point (Chapter 13): Clarification Request

swallach
 

perhaps i am not upto date on this topic.   but addresses are fixed point. (integers).   and you need vector support for vector loads using the vector accumulators.(indexs).   the math, other than overflow is essentially the same

i hope this makes sense



On Aug 10, 2020, at 10:52 AM, CDS <cohen.steed@...> wrote:



[Edited Message Follows]

Perhaps, it is important to understand the history of why fixed point is utilized. Historically, fixed point was the alternative for expensive floating point implementations/operations, or was the easy option on top of an integer-only MCU. In today's chips, however,  if floating point is available in a processing system, then floating point will be used over fixed point - sometimes because fixed point introduces a lot of signal processing challenges that are not present in floating point; perhaps because most, if not all, signal processing math is designed in, and for, floating point. Conversion to fixed point happens if no other choice is available or to support legacy code bases.

Consider:
1. For the vector extension, floating point is mandatory. This could lead to the utilization of fixed point math being low. In this case, it will likely involve conversion in and out of floating point before doing the actual math.

2. Fixed point may be used more frequently if there is a significant benefit over floating point:
- legacy code is less of an issue on our architecture because it's new;
- power/performance - but we're already paying the price for area with floating point!

For the current vector extension definition that would, maybe, be for the 1.7 FP or 8-bit fixed point, where there could be a performance benefit depending on the factual implementation of the vector engine for a given RISCV core.

With that said, we are not opposed to supporting fixed point. We are questioning having fixed point being mandatory. Fixed point support has a significant impact on compilers and tools (no native data type + no clear definition) and a significant impact on the usage/support model (fixed point ISA section is missing expected fixed point options).

Hence the suggestion for making it optional, and working through some of the use-cases for fixed point to ensure that the fixed point ISA definition is "usable" by the intended target audience/users.

Internally we've worked through a few simple examples, like IIR and FIR filters, and it quickly becomes apparent that when using 1.(SEW-1) [industry standard definition] as a fixed point definition it quickly because "unmanageable" in terms of managing decimal points, overflows and mixed precision. "Unmanageable" here should be read as: requires a lot of additional checks around all math operations to ensure we don’t overflow or mix decimal points.



WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Re: Integer Overflow/Saturation Operations

CDS <cohen.steed@...>
 

Andy,

Thank you for your response.

The concern I'm raising is less about "How do I avoid overflow?" and more about "Why are we avoiding the specification of saturating instructions, or an overflow flag?"

I can always avoid overflow by using smaller numbers, or using a larger data path. Sometimes, I won't know that I've exceeded my data path limits. Sure, design for safe margins, yet we rely on hardware signaling to inform us when we've gone beyond a hard limit. Another way of stating the question: Why NOT have saturation or overflow for these instructions? The alternative is a silent data corruption.


Re: Fixed Point (Chapter 13): Clarification Request

CDS <cohen.steed@...>
 
Edited

Perhaps, it is important to understand the history of why fixed point is utilized. Historically, fixed point was the alternative for expensive floating point implementations/operations, or was the easy option on top of an integer-only MCU. In today's chips, however,  if floating point is available in a processing system, then floating point will be used over fixed point - sometimes because fixed point introduces a lot of signal processing challenges that are not present in floating point; perhaps because most, if not all, signal processing math is designed in, and for, floating point. Conversion to fixed point happens if no other choice is available or to support legacy code bases.

Consider:
1. For the vector extension, floating point is mandatory. This could lead to the utilization of fixed point math being low. In this case, it will likely involve conversion in and out of floating point before doing the actual math.

2. Fixed point may be used more frequently if there is a significant benefit over floating point:
- legacy code is less of an issue on our architecture because it's new;
- power/performance - but we're already paying the price for area with floating point!

For the current vector extension definition that would, maybe, be for the 1.7 FP or 8-bit fixed point, where there could be a performance benefit depending on the factual implementation of the vector engine for a given RISCV core.

With that said, we are not opposed to supporting fixed point. We are questioning having fixed point being mandatory. Fixed point support has a significant impact on compilers and tools (no native data type + no clear definition) and a significant impact on the usage/support model (fixed point ISA section is missing expected fixed point options).

Hence the suggestion for making it optional, and working through some of the use-cases for fixed point to ensure that the fixed point ISA definition is "usable" by the intended target audience/users.

Internally we've worked through a few simple examples, like IIR and FIR filters, and it quickly becomes apparent that when using 1.(SEW-1) [industry standard definition] as a fixed point definition it quickly because "unmanageable" in terms of managing decimal points, overflows and mixed precision. "Unmanageable" here should be read as: requires a lot of additional checks around all math operations to ensure we don’t overflow or mix decimal points.


Re: Fixed Point (Chapter 13): Clarification Request

Krste Asanovic
 

On Fri, 07 Aug 2020 14:48:34 -0700, "CDS" <cohen.steed@...> said:
| Thank you for the response, Andrew.
| Given that these operations are intended to be conveniences, in the first place (hence: vector), the
| addition of a required macro for inclusion could be considered a basic element. Fixed point is almost
| always going to be used in conjunction with other data formats, and the conversion, as you say, could
| be two instructions - or it could be one.

I disagree with "almost always" unless you refer to mixing integer and
fixed-point. There are certainly many use cases with no
floating-point.

| The confusion my team and I are having with fixed point is not so much with the implementation, but the
| use-case. If we're going to have fixed-point in RISC-V, how about we look at how it's used and build
| that? Barring a (possibly necessary) overhaul, making the specification optional *entirely* -
| separating it out from the rest of vector, may be a compelling
| option.

Fixed-point is widely used, and if anything, interest is growing in
low-precision fixed-point. Fixed-point codecs continue to be
widespread also.

I'm having trouble understanding your viewpoint here.

Krste



|


Vector TG minutes for 2020/8/7 meeting

Krste Asanovic
 

Date: 2020/8/7
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of Attendees: ~22
Current issues on github: https://github.com/riscv/riscv-v-spec

Issues discussed:

#549 Whole register move under big-endian memory system

Discussion was around whether we needed to add EEW to whole-register
vector store also to help with big-endian register spills/fills on
wider vector machines with internal data rearrangment, which otherwise
might have to always use bytes then rearrange on use. It was noted
that current whole-register move instructions are asymmetric with only
loads having EEW encoded. Extensive discussion was around how much to
accomodate big-endian in design. The group was to review how
big-endian worked in general to see if alternative might reduce this
impact.

#550 "V" versus embedded profiles

Group discussed a proposal on how to provide abbreviated Z instruction
subset names to accomodate common ISA configurations in embedded
platforms. Group expressed a desire to separate integer max element
width from floating-point max element width. Also, some
configurations might not want to support all operations at the widest
element width (e.g., no multiply/divide at largest integer width,
only add/sub/accumulate). Group also have general discussion about
whether more general ISA profile format was needed.

# Categorizing issues

The group reviewed the list of outstanding issues and categorized into
before 1.0, and after 1.0, and also tagged some as toolchain issues.
Group encourages all to review the tags, and to also help close out
issues that are for 1.0 release.


Vector TG meeting minutes 2020/7/31 meeting

Krste Asanovic
 

Apologies for delay in sending these out. When doing this week's
minutes, I realized I hadn't sent out previous week's.

Krste

Date: 2020/7/31
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of Attendees: ~12
Current issues on github: https://github.com/riscv/riscv-v-spec


Issues discussed:

# ISA document formatting

For a large part of the meeting, we discussed the pull request to add
better register diagrams. The consensus was that attendees preferred
the improved formatting, and would like to see similar formatting
across RISC-V specs. We will merge the changes in to vector spec once
the table-of-contents facility is added to the html and pdf build
flows.

# "V" profile, #533 VLEN>=128 for V

We had extensive discussion around requiring "V base for application
processors to have VLEN>=128. Some argued for a larger size, 256, but
there was general agreeement that 128 would match current expectations
for application processors.


Re: Integer Overflow/Saturation Operations

Andy Glew Si5
 

For extended precision arithmetic, e.g.  such as is often performed in cryptography, 2X widening  multiply accumulate is the best that I have found. (And as far as I know other members of the cryptography working group -  It is a topic of much discussion.)  Something of the form of

vd.128[i] += vs2.64[i] * vs1.64[i]

i.e. VMACCU.VV, with SEW = 64 bits  (Assuming I remember correctly that SEW is the pre-widened  element width)

Overflow problems are avoided  by putting only 56 bits  of meaningful data in each of the 64 bits => 112 bits of product => 16 guard bits  in the accumulator.   Every 2^16 (less a few)  iterations you need to propagate carries.  If 16 guard bits are not enough,  then put only 48 bits of meaningful data => 32 guard bits.

VMACC.VV,  the signed version, is similarly useful for people who are using signed redundant data rather than unsigned.

--

This approach,  of course, only works well if  operations such as VRGATHER or scatter/gather (indexed)  memory accesses are efficient, or at least not horribly slow.

It only wants to take advantage of the largest possible width doubling multiply. I.e. if you can do 64*64=+128,  then it doesn't need 32*32=+64,  except for  running code written with that size but has not been optimized to take advantage of wider multiply.

People from other companies report success doing such computations with 24  bits used in every 32-bit word,  and even 28 in 32 bits -  although that requires bit manipulation, not just byte manipulation.  In any case, we nearly always want to use the largest possible multiplier.

--

In a packed SIMD manner (aka "divided elements"), the 56r64 approach works well with cross-multiplies:

vd.128[i] += vs2.128[i].hi64 * vs1.128[i].lo64    +    vs2.128[i].lo64 * vs1.128[i].hi64     

 Although again that is not  in the current vector extension.

--

Exact (non-saturating,  non-lossy)  integer/fixed point DSP, of course, really wants 4X widening operations, such as

vd.32[i] += vs2.8[i] * vs1.8[i]

As well as mixed width

vd.32[i] += vs2.16[i] * vs1.8[i]

but these are not in the current  vector proposal.

Being familiar with 4X widening operations like the above I tried to  use them for cryptography, but it's just plain more efficient to use 2X widening,  if you can  arrange to get 56 bits in each 64-bit  vector element efficiently enough.

--

These examples show how 2X widening multiply accumulate can be used even without saturation  or overflow flags.

However, if you only provided a  saturating 2X widening multiply accumulate, extended precision arithmetic could still use the 56r64 (and 112r128)  approach above and just back off a few iterations before propagating carry.



From: Cds <cohen.steed@...>
Sent: Friday, August 07, 2020 8:51AM
To: Tech-Vector-Ext <tech-vector-ext@...>
Subject: [RISC-V] [tech-vector-ext] Integer Overflow/Saturation Operations

 

On 8/7/2020 8:51 AM, CDS wrote:

Vector-widening multiply & accumulate instructions:

  • These instructions, signed or unsigned, will quickly overflow in even simple cases.
  • Given absence of flagging (e.g. OVERFLOW), a saturating version of these instructions would prevent users from making unintentional errors.
  • The current specification leaves the user doing clunky processing to check for overflow after every iteration in a loop.

How else could these instructions be used practically? What is the expectation for utility when the operations overflow quickly?

--- Sorry: Typos (Speech-Os?) Writing Errors <= Speech Recognition <= Computeritis


Re: Fixed Point (Chapter 13): Clarification Request

CDS <cohen.steed@...>
 

Thank you for the response, Andrew.

Given that these operations are intended to be conveniences, in the first place (hence: vector), the addition of a required macro for inclusion could be considered a basic element. Fixed point is almost always going to be used in conjunction with other data formats, and the conversion, as you say, could be two instructions - or it could be one.

The confusion my team and I are having with fixed point is not so much with the implementation, but the use-case. If we're going to have fixed-point in RISC-V, how about we look at how it's used and build that? Barring a (possibly necessary) overhaul, making the specification optional *entirely* - separating it out from the rest of vector, may be a compelling option.


Re: Fixed Point (Chapter 13): Clarification Request

Andrew Waterman
 



On Fri, Aug 7, 2020 at 8:49 AM CDS <cohen.steed@...> wrote:
The definition  of the numeric range (at the beginning of section 13) matches the definition of an integer, not of a fixed-point number. For example, the range specified is the range of an integer, not a number of 1.X or 2.X format. This doesn't seem to be a fixed-point specification in a manner consistent with other fixed-point operations commercially available. As INTEGER-ONLY operations go, these are likely useful instructions. As a fixed-point specification, this section seems to raise a lot of concerns.
Fixed Point math, itself, is somewhat niche. It mostly sees use in legacy audio and mixed signal applications. If it needs to be a part of RISC-V:
  • Could it be a sub-spec, or an optional consideration? The implementation requirements to support these are non-trivial and seem to target a small use-case demand.
Having implemented these instructions recently, I can say they weren’t unduly onerous to provide, and the HW cost increase wasn’t that great (the rounding and clipping logic are new; the rest reuses the integer datapath). But it’s nonzero cost, so your point holds.

I agree that fixed-point could be broken out into a separate extension so that embedded vector units can exclude it for applications where integer-only or integer-and-float-only would suffice.

  • Specify a 1.X format (or some fixed, deterministic point position). The current specification has no definition of a fixed point number format. The number format is implied, as a side effect in some instructions.
  • There is a need for a fixed<->float conversion instruction (as used in signal processing applications on mixed fixed/float processing systems, or for conversion of data from e.g. ADCs/DACs).

I think this can be done in two instructions without additional loss of precision: convert from int to float, then multiply by a floating-point scalar to move the binary point (or vice-versa).


Re: vrsub.vi, used as negation

Andrew Waterman
 

Would be mostly redundant with vadd.vi, since the immediate operand is signed. (Same reason the scalar ISA doesn’t provide a subi instruction.)

On Fri, Aug 7, 2020 at 8:52 AM CDS <cohen.steed@...> wrote:

Is the point of vrsub.vi to provide negation? From a compiler/user perspective, completing the vsub pattern with vsub.vi (even as a virtual instruction) may be a usability enhancement to consider.

 


vrsub.vi, used as negation

CDS <cohen.steed@...>
 

Is the point of vrsub.vi to provide negation? From a compiler/user perspective, completing the vsub pattern with vsub.vi (even as a virtual instruction) may be a usability enhancement to consider.

 


Integer Overflow/Saturation Operations

CDS <cohen.steed@...>
 

Vector-widening multiply & accumulate instructions:

  • These instructions, signed or unsigned, will quickly overflow in even simple cases.
  • Given absence of flagging (e.g. OVERFLOW), a saturating version of these instructions would prevent users from making unintentional errors.
  • The current specification leaves the user doing clunky processing to check for overflow after every iteration in a loop.

How else could these instructions be used practically? What is the expectation for utility when the operations overflow quickly?


Fixed Point (Chapter 13): Clarification Request

CDS <cohen.steed@...>
 

The definition  of the numeric range (at the beginning of section 13) matches the definition of an integer, not of a fixed-point number. For example, the range specified is the range of an integer, not a number of 1.X or 2.X format. This doesn't seem to be a fixed-point specification in a manner consistent with other fixed-point operations commercially available. As INTEGER-ONLY operations go, these are likely useful instructions. As a fixed-point specification, this section seems to raise a lot of concerns.
Fixed Point math, itself, is somewhat niche. It mostly sees use in legacy audio and mixed signal applications. If it needs to be a part of RISC-V:
  • Could it be a sub-spec, or an optional consideration? The implementation requirements to support these are non-trivial and seem to target a small use-case demand.
  • Specify a 1.X format (or some fixed, deterministic point position). The current specification has no definition of a fixed point number format. The number format is implied, as a side effect in some instructions.
  • There is a need for a fixed<->float conversion instruction (as used in signal processing applications on mixed fixed/float processing systems, or for conversion of data from e.g. ADCs/DACs).


Re: [riscv/riscv-v-spec] For V1.0 - Make unsigned scalar integer in widening instructions 2 * SEW (#427) (and signed)

mark
 

great!

again this is meant as informational for when this goes to vote.

this should be discussable now in email with questions and comments.

please include this in the ratification materials (place in a github a sub folder labeled change-rationales).

this is a continual improvement process. please send stephano and I email on how to improve the resulting content or the process at any time.

Thank you!
Mark

On Thu, Aug 6, 2020 at 9:23 PM David Horner <ds2horner@...> wrote:
I filled out the  RISC-V Policy: Change and Extension Rationale
as best I could for the issue #427. I believe it is accessible by all. But I will also paste the contents below.

https://lists.riscv.org/g/tech-vector-ext/files/Change%20Extension%20Rationale%20Submission%20For%20riscv-v-spec%20issue%20%23427.docx

Name: Change and Extension Rationale Submission for ricsv-v-spec issue #427


  1. David Horner

  2. In GitHub riscv-v-spec issue #427 originally April 21,2020;

    Closed: July 24; reconsideration July 30;

    As Change Rationale Aug. 6,2020

  3. Individual as memeber of Vector TG.

  4. August 2020, prior to V1.0 submission for ratification

  5. The Kickoff and/or Freeze Milestones.

    Idoes not need Roadmap visibility.

    It is a refinement to a set of Integer Widening instructions.

  6. List of questions please explain your answers where appropriate (like why did you say yes):

    1. Not a functionality gap? Rather it is an apparent formulation that can improve application performance by avoiding vtype mode shifts.

    2. A horizontal attribute enhancement affecting performance.

      Twice the standard integer scalar range is available for widening integer instructions.

    3. No change to ratified ISA specification, Vector extension in progress.

    4. This request is for a completely new rendering of proposed Vector features.

    5. This can be done with already proposed instructions?

      In general it requires:

      i) executing the current widening with integer identity value

      (1 for multiply, zero otherwise)

      ii) mode switch to twice current Selected Element Width (SEW)

      iii) perform corresponding adjustment on step i) widened vector results

      (multiply by or add/subtract widened integer value, as appropriate)

      iv) mode switch to original SEW.

    6. Users/markets which benefit are restricted to V users

      in which 2*SEW integer values are handled in widening scalar ops.

    7. No expected to affect base or derived or custom profiles ?

    8. Compliance tests and compiler generation will need to handle an enlarged integer scalar register.

      1. No changes in the number of cycles needed for any handler entry and exit, and changes in the number of save/restores required.

      2. Changes required to support this extension are typical of other vector instructions tweaks,

      3. No known resources who have time to implement either or both of the above to work.

    9. I expect the impact on logic/gates to be small. Less invasive that ordinal based mask encoding. Much less disruptive than removing SLEN visibility. More comparable to the mixed width vrgatherei16 instruction that is being added.

    10. It would not be optional.

    11. It is no more discoverable than any of the other base vector instructions.

    12. Concerns for widening multiply were the problem of leveraging the multiply units needed for the next higher SEW for the current SEW. Concern is that the SEW level multiply unit will have to be enlarged. Initial estimates were by a factor of 2.

      Given that the multiplication result is widening to 2*SEW, some of the needed circuitry already is present for an expanded integer input. The expanded multiplication result will be truncated to 2*SEW, and so, for these teo reasons doubling of the circuit is not required. As a result a mitigation that dynamically selected paths based on zero (or sign extended) upper SEW integer bits is not required. Such a scheme was correctly rejected as inappropriate for most implementations, but it does not materially factor into the discussion as partitioning the next higher multiplier circuitry should be adequate for all anticipated implementations.





--
Mark I Himelstein
CTO RISC-V International
+1-408-250-6611
twitter @mark_riscv


Re: [riscv/riscv-v-spec] For V1.0 - Make unsigned scalar integer in widening instructions 2 * SEW (#427) (and signed)

David Horner
 

I filled out the  RISC-V Policy: Change and Extension Rationale
as best I could for the issue #427. I believe it is accessible by all. But I will also paste the contents below.

https://lists.riscv.org/g/tech-vector-ext/files/Change%20Extension%20Rationale%20Submission%20For%20riscv-v-spec%20issue%20%23427.docx

Name: Change and Extension Rationale Submission for ricsv-v-spec issue #427


  1. David Horner

  2. In GitHub riscv-v-spec issue #427 originally April 21,2020;

    Closed: July 24; reconsideration July 30;

    As Change Rationale Aug. 6,2020

  3. Individual as memeber of Vector TG.

  4. August 2020, prior to V1.0 submission for ratification

  5. The Kickoff and/or Freeze Milestones.

    Idoes not need Roadmap visibility.

    It is a refinement to a set of Integer Widening instructions.

  6. List of questions please explain your answers where appropriate (like why did you say yes):

    1. Not a functionality gap? Rather it is an apparent formulation that can improve application performance by avoiding vtype mode shifts.

    2. A horizontal attribute enhancement affecting performance.

      Twice the standard integer scalar range is available for widening integer instructions.

    3. No change to ratified ISA specification, Vector extension in progress.

    4. This request is for a completely new rendering of proposed Vector features.

    5. This can be done with already proposed instructions?

      In general it requires:

      i) executing the current widening with integer identity value

      (1 for multiply, zero otherwise)

      ii) mode switch to twice current Selected Element Width (SEW)

      iii) perform corresponding adjustment on step i) widened vector results

      (multiply by or add/subtract widened integer value, as appropriate)

      iv) mode switch to original SEW.

    6. Users/markets which benefit are restricted to V users

      in which 2*SEW integer values are handled in widening scalar ops.

    7. No expected to affect base or derived or custom profiles ?

    8. Compliance tests and compiler generation will need to handle an enlarged integer scalar register.

      1. No changes in the number of cycles needed for any handler entry and exit, and changes in the number of save/restores required.

      2. Changes required to support this extension are typical of other vector instructions tweaks,

      3. No known resources who have time to implement either or both of the above to work.

    9. I expect the impact on logic/gates to be small. Less invasive that ordinal based mask encoding. Much less disruptive than removing SLEN visibility. More comparable to the mixed width vrgatherei16 instruction that is being added.

    10. It would not be optional.

    11. It is no more discoverable than any of the other base vector instructions.

    12. Concerns for widening multiply were the problem of leveraging the multiply units needed for the next higher SEW for the current SEW. Concern is that the SEW level multiply unit will have to be enlarged. Initial estimates were by a factor of 2.

      Given that the multiplication result is widening to 2*SEW, some of the needed circuitry already is present for an expanded integer input. The expanded multiplication result will be truncated to 2*SEW, and so, for these teo reasons doubling of the circuit is not required. As a result a mitigation that dynamically selected paths based on zero (or sign extended) upper SEW integer bits is not required. Such a scheme was correctly rejected as inappropriate for most implementations, but it does not materially factor into the discussion as partitioning the next higher multiplier circuitry should be adequate for all anticipated implementations.






Proposed WG: RISC V needs CMOs, and hence a CMO Working Group

Andy Glew Si5
 

RISC V needs CMOs, and hence a CMO Working Group





All successful computer instruction sets have Cache Management Operations (CMOs).

Several RISC-V systems have already defined implementation specific CMO instructions. It is desirable to have standard CMO instructions to facilitate portable software.

CMOs do things like flushing dirty data and invalidating clean data for use cases that include non-coherent DMA I/O, security (e.g. Spectre), power management (flush to battery backed-up DRAM), persistence (flush to NVRAM), and more.

CMOs cut across several problem domains. It is desirable to have a consistent approach, rather than different idiosyncratic instructions for different problem domains. RISC-V therefore needs a CMO working group that will coordinate with any working groups in those overlapping domains.



Administrivia

2020/8/5: Email proposing this will soon be sent to the RISC-V Technical Steering Committee and other mailing lists, seeking approval of the formation of such a CMO working group.

Here linked is a wiki version of the WG proposal RISC V needs CMOs, and hence a CMO Working Group. Also a CMOs WG Draft Proposed Charter - although probably too long.

Assuming the CMO WG is approved:

Please indicate if you are interested by replying to this email (to me, Andy Glew). To faciliate scheduling of meetings, please indicate timezone.

A risc.org mailing list should be set up soon.

We have already set up https://github.com/riscv/riscv-CMOs, and will arrange permissions for working group members as soon as possible.

Here linked is a CMOs WG Draft Proposed Charter.

Proposals:

  • At least one CMO proposal has been developed in some detail. It is linked to from https://github.com/riscv/riscv-CMOs, and may soon be moved to this official place.
  • We welcome: Other proposals, and/or examples of implementation specific CMO extensions already implemented




I look forward to meeting other folks interested in CMOs!


--- Sorry: Typos (Speech-Os?) Writing Errors <= Speech Recognition <= Computeritis


[riscv/riscv-v-spec] For V1.0 - Make unsigned scalar integer in widening instructions 2 * SEW (#427) (and signed)

David Horner
 



I posted a comment to the closed #427
Not everyone subscribes to GitHub, so I post it below,

I am requesting  this proposal be reconsidered/re-evaluated for V1.0 inclusion in light of the posting:

Some additional comments to the post.

Increased overhead.

An extra SEW bits need to be distributed to the execution units,
 which on a large VLEN machine could be multiple and physically dispersed on the chip.
More lines to toggle.

Yes, there is extra power, however only once, the scalar values remain resident through all successive iterations on different channels.

There is not additional distribution circuitry, the sew=XLEN case will have to be wired in and
    is thus available for the sew=XLEN/2 case (which has EEW of XLEN for the rs1).

The additional power/complexity/transfer is self limiting, once sew>=XLEN no extra SEW bits are transferred.


Potential Usage:

It is not to save hardware (much can be reused), but to increase functionality.
We have instructions

# Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW
vwaddu.wv  vd, vs2,  vs1, vm # vector-vector
vwaddu.wx  vd, vs2,  rs1, vm # vector-scalar

of the form:

VWADDU.WV:    vd(at 2*sew)[i] := vs2.(at 2*sew) [i] + zext((to 2*sew) vs1(at sew width) [i])
VWADDU.WX:    vd(at 2*sew)[i] := vs2.(at 2*sew) [i] + zext((to 2*sew) narrow((to sew bits) rs1))


the WX form would become:

VWADDU.WV:    vd(at 2*sew)[i] := vs2.(at 2*sew) [i] + narrow((to sew bits) rs1)

It effectively becomes a 2sew add scalar and replaces the sequence:

vsetvli 0,0, sew2n
VADDU.VX
vsetvli 0,0, sewn

In general using 2sew of rs1 allows a scalar input range commensurate with the target rather than the source in the vector widening operation.


Github Posting: https://github.com/riscv/riscv-v-spec/issues/427#issuecomment-666487664

Comments on resolution and request to re-frame

We discussed proposal to widen scalar input to widening operations to
2 * SEW.

For widening multiplies, this would double the size of
multiplier arrays required.

When SEW < XLEN, I noted that the double size multiplier arrays (conceptually)

  1. already exist for the next (SEW) level up non-widening multiplies, and
  2. the widened output would be compatible with that next up multiply.
    Note as well that only vector operator needs to be distributed to the appropriate wider multiplier as the scalar value is “constant” across all multiply operations.

This approach is quite appropriate for many micro-archs; those Uarchs that internally have SLEN=VLEN (= channel width) , and of these, especially those that are register write limited.
In these Uarch the VLEN register 0 would be written in one cycle (or process set) with register 1 in the next (and if LMUL>1 register 2, 3, etc. with each subsequent cycle (or process set). The throughput would be 1/2 of discrete multiplier units per vector operand but as the register write would be saturated there is no actual loss.

This approach does not work well for SLEN<VLEN (and perhaps multiple active channels) that might distribute both vector source from register groups to multiplier units, and double width results to distant register ports. Possibly further complicated by renamed register segments.
These Uarch would rather have dedicated SEWxSEW multiply units (potentially sharing segments of the same multipliers for the next (2 * SEW) level up, extended to provide a double width result.
The benefit of such a configuration is full hardware throughput, that would be tailored to “normal” vector register file read port rate. In that a channel (likely SLEN) width slice would be generating double width element in the same physical register (but potentially to renamed segments of the register) the advantage seen in the simpler SLEN=VLEN design (of consecutively writing full VLEN registers) is not present.

There is further impetus to optimizer the SEW=8 case. Both in the vector x vector and vector x scalar use are expected to be a common use case,. But further, 8 bit is the extreme situation for number of elements, source operand distribution and/or widened result distribution. And lastly, the 8x8 multiplier array is relatively small, so the investment in gates pays substantial dividends on the smaller bit sizes.

The group discussed using a microarchitectural check on scalar width to select a narrower
multiplier, but ... [keep in mind that this dynamic selection is for SLEN<VLEN type Uarchs)

With the scalar 2 * SEW introduction, dynamically selecting between the two approaches would require reading the scalar value , and determining if the upper half (SEW bits) of it were zeros (or all ones for signed), in which case it could use the optimized approach. If the high SEW were not just the sign, then the fall back to using the 2 * SEW multipliers approach would be used. This dynamic re-configuring was rightly trounced. Evaluating the high SEW bits would occur much to late in the process introducing stalls or complex read ahead X register circuitry that is not needed anywhere else and would likely impact cycle timing. Dead on arrival.

group consensus was that this information should be
supplied through a different opcode.

Given that multiplies would
provide the larger benefit, and that adds would then have a
non-uniform format, the decisions was made to stay with the PoR.

I believe this narrative correctly reflects the reasoning.

I fully agree with this final conclusion that uniformity persuades to handling all potential integer 2 * SEW the same way.

However, I believe I must take blame for framing the issue as a duality. Either leverage the next level (2 * SEW) multiplier or optimize with a narrower widening multiplier circuit. An the latter would require a dynamic “macro” selection between them.

I did not present alternatives that change the narrative and basis for decision.

Firstly, a full double width multiplier is not necessary (but certainly sufficient) for the integer SEWx(2 * SEW) case. By definition, the high SEW bits of the vector operand are zero and do not participate in the (2 * SEW)x(2 * SEW) circuitry. Further, only SEW bits of the product of the vector with the high SEW bits of the scalar are retained, and thus need only be generated and summed.

Especially when ELEN > XLEN, but even with lower SEW, the widening multiply (and even non-widening) will likely be implemented as temporal iterations of sum of partial products, in some cases this will be driven by the desire to keep cycle time constrained. This temporal circuity could be utilized for conditionally summing the High-SEW-Scalar-bits with the SEW-vector on the narrow multiply. Thus zero/sign high bits of scalar are not a selection between LARGE/narrow but rather an optimization of the narrow process.

The optimization of narrow multiplies can be incorporated independently for various sizes of SEW. The cost to add the upper X-register SEW-bits is nominal at 8bits and still small at 16 bits. For a RV32 these are the only two integer widening of concern for 2 * SEW scalar. For RV64 the only other integer widening 2 * SEW is 32 bit.
A tradeoff between

  1. half throughput (use next level up full multiplier), or
  2. (as above) conditional temporal, or
  3. parallel partial-product generation and fast sum hardware
    can be chosen independent of upper 32 bits of X register.

Re-framing the proposal in these terms changes the question from a dichotomy to a continuum of design options that can be effectively implemented (and as efficient as possible) on simple Uarch designs without hobbling performant designs.

The question then becomes one of worth for complexity at V1.0.

In this context I believe it is worthy, especially as Krste remarked for the expanded multiple.





Re: VFRECIP/VFRSQRT instructions

Bill Huffman
 


On 8/3/20 1:41 PM, Andrew Waterman wrote:
EXTERNAL MAIL



On Mon, Aug 3, 2020 at 12:40 PM Bill Huffman <huffman@...> wrote:

The recip table matches mine as does the worst case error.

I have one different entry in the square root table.  For entry 77, where you have 36, I have 37.  I'm not sure whether it matters.  Also, ages ago, I got a very small difference in worst case error of 2^-7.317 but I haven't gone back to trace anything down about that.


Thanks for validating against your table, Bill.

With my value for that entry, the worst error on the interval of interest is 2^-7.32041, for input 0x3f1a0000.  With yours, it's 2^-7.3164 for 0x3f1bfffd.

I agree with your computation with a really tiny difference (I get that it just barely rounds to 2^-7.32040).  I can't say why I got 37 when I did it 8-10 years ago - and I don't think I'm going to chase that.  I'm good with 36 at that position in the table.

So, I'm good with the table values below.

     Bill


Presumably the error's slightly smaller for my scheme because I'm picking the output value that minimizes the maximum error on the interval, rather than picking the midpoint or similar.  Of course, the overall worst error is unaffected.

      Bill

On 8/3/20 11:38 AM, DSHORNER wrote:
EXTERNAL MAIL

Now annotated version --detail
https://github.com/David-Horner/recip/blob/master/vrecip.cc

For the 7x7 below notice the biased value does not exceed 21 for recip (5 of 7 bits) and 15 for rsqrt (4 of 7 bits).

ip 7 op 7 LUT #bits 896 verilog 0  test/test-long 1
Recip7x7LUT (input [6:0] in, output reg [6:0] out);
 in[6:0]  corresponds to sig[S-1:S-6]
 out[6:0] corresponds to sig[S-1:S-6]
 biased : ((ipN-1) - in) << (op - ip) // or >> if neg
 base bias 127  left-shift 0 right-shift 0
 0: out = 127 biased 0; lerr 0.00390625 rerr 0.00387573 larg 0.5 rarg 0.503906
 1: out = 125 biased 1; lerr 0.0039978 rerr 0.00372314 larg 0.503906 rarg 0.507812
 2: out = 123 biased 2; lerr 0.00421143 rerr 0.00344849 larg 0.507812 rarg 0.511719
 3: out = 121 biased 3; lerr 0.00454712 rerr 0.00305176 larg 0.511719 rarg 0.515625
 4: out = 119 biased 4; lerr 0.00500488 rerr 0.00253296 larg 0.515625 rarg 0.519531
 5: out = 117 biased 5; lerr 0.00558472 rerr 0.00189209 larg 0.519531 rarg 0.523438
 6: out = 116 biased 5; lerr 0.00219727 rerr 0.00524902 larg 0.523438 rarg 0.527344
 7: out = 114 biased 6; lerr 0.00299072 rerr 0.00439453 larg 0.527344 rarg 0.53125
 8: out = 112 biased 7; lerr 0.00390625 rerr 0.00341797 larg 0.53125 rarg 0.535156
 9: out = 110 biased 8; lerr 0.00494385 rerr 0.00231934 larg 0.535156 rarg 0.539062
 10: out = 109 biased 8; lerr 0.00189209 rerr 0.00534058 larg 0.539062 rarg 0.542969
 11: out = 107 biased 9; lerr 0.00314331 rerr 0.00402832 larg 0.542969 rarg 0.546875
 12: out = 105 biased 10; lerr 0.0045166 rerr 0.00259399 larg 0.546875 rarg 0.550781
 13: out = 104 biased 10; lerr 0.00170898 rerr 0.00537109 larg 0.550781 rarg 0.554688
 14: out = 102 biased 11; lerr 0.0032959 rerr 0.00372314 larg 0.554688 rarg 0.558594
 15: out = 100 biased 12; lerr 0.00500488 rerr 0.00195312 larg 0.558594 rarg 0.5625
 16: out = 99 biased 12; lerr 0.00244141 rerr 0.00448608 larg 0.5625 rarg 0.566406
 17: out = 97 biased 13; lerr 0.00436401 rerr 0.00250244 larg 0.566406 rarg 0.570312
 18: out = 96 biased 13; lerr 0.00195312 rerr 0.00488281 larg 0.570312 rarg 0.574219
 19: out = 94 biased 14; lerr 0.00408936 rerr 0.00268555 larg 0.574219 rarg 0.578125
 20: out = 93 biased 14; lerr 0.00183105 rerr 0.00491333 larg 0.578125 rarg 0.582031
 21: out = 91 biased 15; lerr 0.00418091 rerr 0.00250244 larg 0.582031 rarg 0.585938
 22: out = 90 biased 15; lerr 0.0020752 rerr 0.00457764 larg 0.585938 rarg 0.589844
 23: out = 88 biased 16; lerr 0.00463867 rerr 0.00195312 larg 0.589844 rarg 0.59375
 24: out = 87 biased 16; lerr 0.00268555 rerr 0.00387573 larg 0.59375 rarg 0.597656
 25: out = 85 biased 17; lerr 0.00546265 rerr 0.0010376 larg 0.597656 rarg 0.601562
 26: out = 84 biased 17; lerr 0.00366211 rerr 0.00280762 larg 0.601562 rarg 0.605469
 27: out = 83 biased 17; lerr 0.00192261 rerr 0.0045166 larg 0.605469 rarg 0.609375
 28: out = 81 biased 18; lerr 0.00500488 rerr 0.00137329 larg 0.609375 rarg 0.613281
 29: out = 80 biased 18; lerr 0.00341797 rerr 0.00292969 larg 0.613281 rarg 0.617188
 30: out = 79 biased 18; lerr 0.00189209 rerr 0.00442505 larg 0.617188 rarg 0.621094
 31: out = 77 biased 19; lerr 0.00527954 rerr 0.000976562 larg 0.621094 rarg 0.625
 32: out = 76 biased 19; lerr 0.00390625 rerr 0.00231934 larg 0.625 rarg 0.628906
 33: out = 75 biased 19; lerr 0.00259399 rerr 0.00360107 larg 0.628906 rarg 0.632812
 34: out = 74 biased 19; lerr 0.00134277 rerr 0.00482178 larg 0.632812 rarg 0.636719
 35: out = 72 biased 20; lerr 0.00512695 rerr 0.000976562 larg 0.636719 rarg 0.640625
 36: out = 71 biased 20; lerr 0.00402832 rerr 0.00204468 larg 0.640625 rarg 0.644531
 37: out = 70 biased 20; lerr 0.00299072 rerr 0.00305176 larg 0.644531 rarg 0.648438
 38: out = 69 biased 20; lerr 0.00201416 rerr 0.0039978 larg 0.648438 rarg 0.652344
 39: out = 68 biased 20; lerr 0.00109863 rerr 0.00488281 larg 0.652344 rarg 0.65625
 40: out = 66 biased 21; lerr 0.00537109 rerr 0.000549316 larg 0.65625 rarg 0.660156
 41: out = 65 biased 21; lerr 0.00460815 rerr 0.00128174 larg 0.660156 rarg 0.664062
 42: out = 64 biased 21; lerr 0.00390625 rerr 0.00195312 larg 0.664062 rarg 0.667969
 43: out = 63 biased 21; lerr 0.00326538 rerr 0.00256348 larg 0.667969 rarg 0.671875
 44: out = 62 biased 21; lerr 0.00268555 rerr 0.00311279 larg 0.671875 rarg 0.675781
 45: out = 61 biased 21; lerr 0.00216675 rerr 0.00360107 larg 0.675781 rarg 0.679688
 46: out = 60 biased 21; lerr 0.00170898 rerr 0.00402832 larg 0.679688 rarg 0.683594
 47: out = 59 biased 21; lerr 0.00131226 rerr 0.00439453 larg 0.683594 rarg 0.6875
 48: out = 58 biased 21; lerr 0.000976562 rerr 0.00469971 larg 0.6875 rarg 0.691406
 49: out = 57 biased 21; lerr 0.000701904 rerr 0.00494385 larg 0.691406 rarg 0.695312
 50: out = 56 biased 21; lerr 0.000488281 rerr 0.00512695 larg 0.695312 rarg 0.699219
 51: out = 55 biased 21; lerr 0.000335693 rerr 0.00524902 larg 0.699219 rarg 0.703125
 52: out = 54 biased 21; lerr 0.000244141 rerr 0.00531006 larg 0.703125 rarg 0.707031
 53: out = 53 biased 21; lerr 0.000213623 rerr 0.00531006 larg 0.707031 rarg 0.710938
 54: out = 52 biased 21; lerr 0.000244141 rerr 0.00524902 larg 0.710938 rarg 0.714844
 55: out = 51 biased 21; lerr 0.000335693 rerr 0.00512695 larg 0.714844 rarg 0.71875
 56: out = 50 biased 21; lerr 0.000488281 rerr 0.00494385 larg 0.71875 rarg 0.722656
 57: out = 49 biased 21; lerr 0.000701904 rerr 0.00469971 larg 0.722656 rarg 0.726562
 58: out = 48 biased 21; lerr 0.000976562 rerr 0.00439453 larg 0.726562 rarg 0.730469
 59: out = 47 biased 21; lerr 0.00131226 rerr 0.00402832 larg 0.730469 rarg 0.734375
 60: out = 46 biased 21; lerr 0.00170898 rerr 0.00360107 larg 0.734375 rarg 0.738281
 61: out = 45 biased 21; lerr 0.00216675 rerr 0.00311279 larg 0.738281 rarg 0.742188
 62: out = 44 biased 21; lerr 0.00268555 rerr 0.00256348 larg 0.742188 rarg 0.746094
 63: out = 43 biased 21; lerr 0.00326538 rerr 0.00195312 larg 0.746094 rarg 0.75
 64: out = 42 biased 21; lerr 0.00390625 rerr 0.00128174 larg 0.75 rarg 0.753906
 65: out = 41 biased 21; lerr 0.00460815 rerr 0.000549316 larg 0.753906 rarg 0.757812
 66: out = 40 biased 21; lerr 0.00537109 rerr 0.000244141 larg 0.757812 rarg 0.761719
 67: out = 40 biased 20; lerr 0.000244141 rerr 0.00488281 larg 0.761719 rarg 0.765625
 68: out = 39 biased 20; lerr 0.00109863 rerr 0.0039978 larg 0.765625 rarg 0.769531
 69: out = 38 biased 20; lerr 0.00201416 rerr 0.00305176 larg 0.769531 rarg 0.773438
 70: out = 37 biased 20; lerr 0.00299072 rerr 0.00204468 larg 0.773438 rarg 0.777344
 71: out = 36 biased 20; lerr 0.00402832 rerr 0.000976562 larg 0.777344 rarg 0.78125
 72: out = 35 biased 20; lerr 0.00512695 rerr 0.000152588 larg 0.78125 rarg 0.785156
 73: out = 35 biased 19; lerr 0.000152588 rerr 0.00482178 larg 0.785156 rarg 0.789062
 74: out = 34 biased 19; lerr 0.00134277 rerr 0.00360107 larg 0.789062 rarg 0.792969
 75: out = 33 biased 19; lerr 0.00259399 rerr 0.00231934 larg 0.792969 rarg 0.796875
 76: out = 32 biased 19; lerr 0.00390625 rerr 0.000976562 larg 0.796875 rarg 0.800781
 77: out = 31 biased 19; lerr 0.00527954 rerr 0.000427246 larg 0.800781 rarg 0.804688
 78: out = 31 biased 18; lerr 0.000427246 rerr 0.00442505 larg 0.804688 rarg 0.808594
 79: out = 30 biased 18; lerr 0.00189209 rerr 0.00292969 larg 0.808594 rarg 0.8125
 80: out = 29 biased 18; lerr 0.00341797 rerr 0.00137329 larg 0.8125 rarg 0.816406
 81: out = 28 biased 18; lerr 0.00500488 rerr 0.000244141 larg 0.816406 rarg 0.820312
 82: out = 28 biased 17; lerr 0.000244141 rerr 0.0045166 larg 0.820312 rarg 0.824219
 83: out = 27 biased 17; lerr 0.00192261 rerr 0.00280762 larg 0.824219 rarg 0.828125
 84: out = 26 biased 17; lerr 0.00366211 rerr 0.0010376 larg 0.828125 rarg 0.832031
 85: out = 25 biased 17; lerr 0.00546265 rerr 0.000793457 larg 0.832031 rarg 0.835938
 86: out = 25 biased 16; lerr 0.000793457 rerr 0.00387573 larg 0.835938 rarg 0.839844
 87: out = 24 biased 16; lerr 0.00268555 rerr 0.00195312 larg 0.839844 rarg 0.84375
 88: out = 23 biased 16; lerr 0.00463867 rerr 3.05176E-05 larg 0.84375 rarg 0.847656
 89: out = 23 biased 15; lerr 3.05176E-05 rerr 0.00457764 larg 0.847656 rarg 0.851562
 90: out = 22 biased 15; lerr 0.0020752 rerr 0.00250244 larg 0.851562 rarg 0.855469
 91: out = 21 biased 15; lerr 0.00418091 rerr 0.000366211 larg 0.855469 rarg 0.859375
 92: out = 21 biased 14; lerr 0.000366211 rerr 0.00491333 larg 0.859375 rarg 0.863281
 93: out = 20 biased 14; lerr 0.00183105 rerr 0.00268555 larg 0.863281 rarg 0.867188
 94: out = 19 biased 14; lerr 0.00408936 rerr 0.000396729 larg 0.867188 rarg 0.871094
 95: out = 19 biased 13; lerr 0.000396729 rerr 0.00488281 larg 0.871094 rarg 0.875
 96: out = 18 biased 13; lerr 0.00195312 rerr 0.00250244 larg 0.875 rarg 0.878906
 97: out = 17 biased 13; lerr 0.00436401 rerr 6.10352E-05 larg 0.878906 rarg 0.882812
 98: out = 17 biased 12; lerr 6.10352E-05 rerr 0.00448608 larg 0.882812 rarg 0.886719
 99: out = 16 biased 12; lerr 0.00244141 rerr 0.00195312 larg 0.886719 rarg 0.890625
 100: out = 15 biased 12; lerr 0.00500488 rerr 0.000640869 larg 0.890625 rarg 0.894531
 101: out = 15 biased 11; lerr 0.000640869 rerr 0.00372314 larg 0.894531 rarg 0.898438
 102: out = 14 biased 11; lerr 0.0032959 rerr 0.0010376 larg 0.898438 rarg 0.902344
 103: out = 14 biased 10; lerr 0.0010376 rerr 0.00537109 larg 0.902344 rarg 0.90625
 104: out = 13 biased 10; lerr 0.00170898 rerr 0.00259399 larg 0.90625 rarg 0.910156
 105: out = 12 biased 10; lerr 0.0045166 rerr 0.000244141 larg 0.910156 rarg 0.914062
 106: out = 12 biased 9; lerr 0.000244141 rerr 0.00402832 larg 0.914062 rarg 0.917969
 107: out = 11 biased 9; lerr 0.00314331 rerr 0.00109863 larg 0.917969 rarg 0.921875
 108: out = 11 biased 8; lerr 0.00109863 rerr 0.00534058 larg 0.921875 rarg 0.925781
 109: out = 10 biased 8; lerr 0.00189209 rerr 0.00231934 larg 0.925781 rarg 0.929688
 110: out = 9 biased 8; lerr 0.00494385 rerr 0.000762939 larg 0.929688 rarg 0.933594
 111: out = 9 biased 7; lerr 0.000762939 rerr 0.00341797 larg 0.933594 rarg 0.9375
 112: out = 8 biased 7; lerr 0.00390625 rerr 0.000244141 larg 0.9375 rarg 0.941406
 113: out = 8 biased 6; lerr 0.000244141 rerr 0.00439453 larg 0.941406 rarg 0.945312
 114: out = 7 biased 6; lerr 0.00299072 rerr 0.00112915 larg 0.945312 rarg 0.949219
 115: out = 7 biased 5; lerr 0.00112915 rerr 0.00524902 larg 0.949219 rarg 0.953125
 116: out = 6 biased 5; lerr 0.00219727 rerr 0.00189209 larg 0.953125 rarg 0.957031
 117: out = 5 biased 5; lerr 0.00558472 rerr 0.00152588 larg 0.957031 rarg 0.960938
 118: out = 5 biased 4; lerr 0.00152588 rerr 0.00253296 larg 0.960938 rarg 0.964844
 119: out = 4 biased 4; lerr 0.00500488 rerr 0.000976562 larg 0.964844 rarg 0.96875
 120: out = 4 biased 3; lerr 0.000976562 rerr 0.00305176 larg 0.96875 rarg 0.972656
 121: out = 3 biased 3; lerr 0.00454712 rerr 0.000549316 larg 0.972656 rarg 0.976562
 122: out = 3 biased 2; lerr 0.000549316 rerr 0.00344849 larg 0.976562 rarg 0.980469
 123: out = 2 biased 2; lerr 0.00421143 rerr 0.000244141 larg 0.980469 rarg 0.984375
 124: out = 2 biased 1; lerr 0.000244141 rerr 0.00372314 larg 0.984375 rarg 0.988281
 125: out = 1 biased 1; lerr 0.0039978 rerr 6.10352E-05 larg 0.988281 rarg 0.992188
 126: out = 1 biased 0; lerr 6.10352E-05 rerr 0.00387573 larg 0.992188 rarg 0.996094
 127: out = 0 biased 0; lerr 0.00390625 rerr 0 larg 0.996094 rarg 1

 ... [removed hex data dumping]

RSqrt7x7LUT (input [6:0] in, output reg [6:0] out);
  // in[6] corresponds to exp[0]
  // in[5:0] corresponds to sig[S-1:S-5]
  // out[6:0] corresponds to sig[S-1:S-6]
  // biased : ((ipN-1) - in) << (op - ip)
 0: out 127 biased 0; lerr 0.00390625 rerr 0.00384557 larg 0.25 rarg 0.253906
 1: out 125 biased 1; lerr 0.00402773 rerr 0.00360435 larg 0.253906 rarg 0.257812
 2: out 123 biased 2; lerr 0.00432928 rerr 0.00318533 larg 0.257812 rarg 0.261719
 3: out 121 biased 3; lerr 0.00480818 rerr 0.00259111 larg 0.261719 rarg 0.265625
 4: out 119 biased 4; lerr 0.00546183 rerr 0.00182426 larg 0.265625 rarg 0.269531
 5: out 118 biased 4; lerr 0.0022317 rerr 0.00497249 larg 0.269531 rarg 0.273438
 6: out 116 biased 5; lerr 0.00319802 rerr 0.00389675 larg 0.273438 rarg 0.277344
 7: out 114 biased 6; lerr 0.00433191 rerr 0.00265532 larg 0.277344 rarg 0.28125
 8: out 113 biased 6; lerr 0.00148789 rerr 0.00542232 larg 0.28125 rarg 0.285156
 9: out 111 biased 7; lerr 0.00292144 rerr 0.00388464 larg 0.285156 rarg 0.289062
 10: out 109 biased 8; lerr 0.00451607 rerr 0.0021876 larg 0.289062 rarg 0.292969
 11: out 108 biased 8; lerr 0.00204104 rerr 0.00458999 larg 0.292969 rarg 0.296875
 12: out 106 biased 9; lerr 0.00392348 rerr 0.00260824 larg 0.296875 rarg 0.300781
 13: out 105 biased 9; lerr 0.00167641 rerr 0.00478529 larg 0.300781 rarg 0.304688
 14: out 103 biased 10; lerr 0.00383947 rerr 0.00252584 larg 0.304688 rarg 0.308594
 15: out 102 biased 10; lerr 0.0018141 rerr 0.00448366 larg 0.308594 rarg 0.3125
 16: out 100 biased 11; lerr 0.00425098 rerr 0.00195312 larg 0.3125 rarg 0.316406
 17: out 99 biased 11; lerr 0.00244141 rerr 0.00369747 larg 0.316406 rarg 0.320312
 18: out 97 biased 12; lerr 0.00514568 rerr 0.000902127 larg 0.320312 rarg 0.324219
 19: out 96 biased 12; lerr 0.00354633 rerr 0.00243843 larg 0.324219 rarg 0.328125
 20: out 95 biased 12; lerr 0.00203674 rerr 0.00388594 larg 0.328125 rarg 0.332031
 21: out 93 biased 13; lerr 0.00511752 rerr 0.000717621 larg 0.332031 rarg 0.335938
 22: out 92 biased 13; lerr 0.00381051 rerr 0.00196455 larg 0.335938 rarg 0.339844
 23: out 91 biased 13; lerr 0.00258984 rerr 0.00312603 larg 0.339844 rarg 0.34375
 24: out 90 biased 13; lerr 0.00145446 rerr 0.00420307 larg 0.34375 rarg 0.347656
 25: out 88 biased 14; lerr 0.0050098 rerr 0.000564416 larg 0.347656 rarg 0.351562
 26: out 87 biased 14; lerr 0.00406783 rerr 0.00144985 larg 0.351562 rarg 0.355469
 27: out 86 biased 14; lerr 0.00320806 rerr 0.00225385 larg 0.355469 rarg 0.359375
 28: out 85 biased 14; lerr 0.00242958 rerr 0.00297735 larg 0.359375 rarg 0.363281
 29: out 84 biased 14; lerr 0.00173146 rerr 0.00362122 larg 0.363281 rarg 0.367188
 30: out 83 biased 14; lerr 0.00111284 rerr 0.00418633 larg 0.367188 rarg 0.371094
 31: out 82 biased 14; lerr 0.000572846 rerr 0.00467353 larg 0.371094 rarg 0.375
 32: out 80 biased 15; lerr 0.00489479 rerr 0.00027462 larg 0.375 rarg 0.378906
 33: out 79 biased 15; lerr 0.00453439 rerr 0.000583717 larg 0.378906 rarg 0.382812
 34: out 78 biased 15; lerr 0.00425002 rerr 0.000817442 larg 0.382812 rarg 0.386719
 35: out 77 biased 15; lerr 0.0040409 rerr 0.000976562 larg 0.386719 rarg 0.390625
 36: out 76 biased 15; lerr 0.00390625 rerr 0.00106183 larg 0.390625 rarg 0.394531
 37: out 75 biased 15; lerr 0.00384534 rerr 0.00107398 larg 0.394531 rarg 0.398438
 38: out 74 biased 15; lerr 0.00385742 rerr 0.00101372 larg 0.398438 rarg 0.402344
 39: out 73 biased 15; lerr 0.00394179 rerr 0.00088176 larg 0.402344 rarg 0.40625
 40: out 72 biased 15; lerr 0.00409775 rerr 0.000678786 larg 0.40625 rarg 0.410156
 41: out 71 biased 15; lerr 0.00432461 rerr 0.000405468 larg 0.410156 rarg 0.414062
 42: out 70 biased 15; lerr 0.0046217 rerr 6.24637E-05 larg 0.414062 rarg 0.417969
 43: out 70 biased 14; lerr 6.24637E-05 rerr 0.00472478 larg 0.417969 rarg 0.421875
 44: out 69 biased 14; lerr 0.000349583 rerr 0.00426776 larg 0.421875 rarg 0.425781
 45: out 68 biased 14; lerr 0.000830041 rerr 0.00374284 larg 0.425781 rarg 0.429688
 46: out 67 biased 14; lerr 0.00137829 rerr 0.00315063 larg 0.429688 rarg 0.433594
 47: out 66 biased 14; lerr 0.00199374 rerr 0.00249171 larg 0.433594 rarg 0.4375
 48: out 65 biased 14; lerr 0.00267578 rerr 0.00176667 larg 0.4375 rarg 0.441406
 49: out 64 biased 14; lerr 0.00342383 rerr 0.000976086 larg 0.441406 rarg 0.445312
 50: out 63 biased 14; lerr 0.00423733 rerr 0.000120513 larg 0.445312 rarg 0.449219
 51: out 63 biased 13; lerr 0.000120513 rerr 0.00445945 larg 0.449219 rarg 0.453125
 52: out 62 biased 13; lerr 0.000799499 rerr 0.00349816 larg 0.453125 rarg 0.457031
 53: out 61 biased 13; lerr 0.00178341 rerr 0.00247339 larg 0.457031 rarg 0.460938
 54: out 60 biased 13; lerr 0.0028307 rerr 0.00138568 larg 0.460938 rarg 0.464844
 55: out 59 biased 13; lerr 0.00394084 rerr 0.00023553 larg 0.464844 rarg 0.46875
 56: out 59 biased 12; lerr 0.00023553 rerr 0.00439453 larg 0.46875 rarg 0.472656
 57: out 58 biased 12; lerr 0.000976562 rerr 0.00314314 larg 0.472656 rarg 0.476562
 58: out 57 biased 12; lerr 0.0022501 rerr 0.00183069 larg 0.476562 rarg 0.480469
 59: out 56 biased 12; lerr 0.00358461 rerr 0.000457659 larg 0.480469 rarg 0.484375
 60: out 56 biased 11; lerr 0.000457659 rerr 0.00448366 larg 0.484375 rarg 0.488281
 61: out 55 biased 11; lerr 0.000975489 rerr 0.00301265 larg 0.488281 rarg 0.492188
 62: out 54 biased 11; lerr 0.00246829 rerr 0.00148234 larg 0.492188 rarg 0.496094
 63: out 53 biased 11; lerr 0.00402031 rerr 0.000106817 larg 0.496094 rarg 0.5
 64: out 52 biased 11; lerr 0.00563109 rerr 0.00210731 larg 0.5 rarg 0.507812
 65: out 51 biased 11; lerr 0.00345996 rerr 0.00417648 larg 0.507812 rarg 0.515625
 66: out 50 biased 11; lerr 0.00143345 rerr 0.00610301 larg 0.515625 rarg 0.523438
 67: out 48 biased 12; lerr 0.00520152 rerr 0.00219486 larg 0.523438 rarg 0.53125
 68: out 47 biased 12; lerr 0.00349943 rerr 0.00380104 larg 0.53125 rarg 0.539062
 69: out 46 biased 12; lerr 0.00193497 rerr 0.00527137 larg 0.539062 rarg 0.546875
 70: out 44 biased 13; lerr 0.00628347 rerr 0.000789331 larg 0.546875 rarg 0.554688
 71: out 43 biased 13; lerr 0.00502921 rerr 0.00195312 larg 0.554688 rarg 0.5625
 72: out 42 biased 13; lerr 0.00390625 rerr 0.00298721 larg 0.5625 rarg 0.570312
 73: out 41 biased 13; lerr 0.00291271 rerr 0.00389343 larg 0.570312 rarg 0.578125
 74: out 40 biased 13; lerr 0.00204677 rerr 0.00467353 larg 0.578125 rarg 0.585938
 75: out 39 biased 13; lerr 0.00130667 rerr 0.00532924 larg 0.585938 rarg 0.59375
 76: out 38 biased 13; lerr 0.000690699 rerr 0.00586222 larg 0.59375 rarg 0.601562
 77: out 36 biased 14; lerr 0.0062566 rerr 0.000175461 larg 0.601562 rarg 0.609375
 78: out 35 biased 14; lerr 0.00592317 rerr 0.000428823 larg 0.609375 rarg 0.617188
 79: out 34 biased 14; lerr 0.00570878 rerr 0.000564416 larg 0.617188 rarg 0.625
 80: out 33 biased 14; lerr 0.00561191 rerr 0.000583717 larg 0.625 rarg 0.632812
 81: out 32 biased 14; lerr 0.00563109 rerr 0.000488162 larg 0.632812 rarg 0.640625
 82: out 31 biased 14; lerr 0.00576489 rerr 0.000279149 larg 0.640625 rarg 0.648438
 83: out 30 biased 14; lerr 0.00601191 rerr 4.19626E-05 larg 0.648438 rarg 0.65625
 84: out 30 biased 13; lerr 4.19626E-05 rerr 0.00589256 larg 0.65625 rarg 0.664062
 85: out 29 biased 13; lerr 0.00047385 rerr 0.00538852 larg 0.664062 rarg 0.671875
 86: out 28 biased 13; lerr 0.00101522 rerr 0.00477604 larg 0.671875 rarg 0.679688
 87: out 27 biased 13; lerr 0.00166483 rerr 0.00405633 larg 0.679688 rarg 0.6875
 88: out 26 biased 13; lerr 0.00242145 rerr 0.0032306 larg 0.6875 rarg 0.695312
 89: out 25 biased 13; lerr 0.00328389 rerr 0.0023 larg 0.695312 rarg 0.703125
 90: out 24 biased 13; lerr 0.00425098 rerr 0.00126568 larg 0.703125 rarg 0.710938
 91: out 23 biased 13; lerr 0.0053216 rerr 0.000128738 larg 0.710938 rarg 0.71875
 92: out 23 biased 12; lerr 0.000128738 rerr 0.00554953 larg 0.71875 rarg 0.726562
 93: out 22 biased 12; lerr 0.00110974 rerr 0.00424628 larg 0.726562 rarg 0.734375
 94: out 21 biased 12; lerr 0.0024487 rerr 0.00284339 larg 0.734375 rarg 0.742188
 95: out 20 biased 12; lerr 0.0038871 rerr 0.00134187 larg 0.742188 rarg 0.75
 96: out 19 biased 12; lerr 0.00542395 rerr 0.000257287 larg 0.75 rarg 0.757812
 97: out 19 biased 11; lerr 0.000257287 rerr 0.00488281 larg 0.757812 rarg 0.765625
 98: out 18 biased 11; lerr 0.00195312 rerr 0.00312603 larg 0.765625 rarg 0.773438
 99: out 17 biased 11; lerr 0.0037447 rerr 0.00127425 larg 0.773438 rarg 0.78125
 100: out 16 biased 11; lerr 0.00563109 rerr 0.000671612 larg 0.78125 rarg 0.789062
 101: out 16 biased 10; lerr 0.000671612 rerr 0.00426337 larg 0.789062 rarg 0.796875
 102: out 15 biased 10; lerr 0.00271068 rerr 0.00216607 larg 0.796875 rarg 0.804688
 103: out 14 biased 10; lerr 0.00484208 rerr 2.28884E-05 larg 0.804688 rarg 0.8125
 104: out 14 biased 9; lerr 2.28884E-05 rerr 0.00477319 larg 0.8125 rarg 0.820312
 105: out 13 biased 9; lerr 0.00230268 rerr 0.00243701 larg 0.820312 rarg 0.828125
 106: out 12 biased 9; lerr 0.00467248 rerr 1.1444E-05 larg 0.828125 rarg 0.835938
 107: out 12 biased 8; lerr 1.1444E-05 rerr 0.00467353 larg 0.835938 rarg 0.84375
 108: out 11 biased 8; lerr 0.00250271 rerr 0.00210469 larg 0.84375 rarg 0.851562
 109: out 10 biased 8; lerr 0.0051047 rerr 0.000551376 larg 0.851562 rarg 0.859375
 110: out 10 biased 7; lerr 0.000551376 rerr 0.00398129 larg 0.859375 rarg 0.867188
 111: out 9 biased 7; lerr 0.00329393 rerr 0.00118567 larg 0.867188 rarg 0.875
 112: out 9 biased 6; lerr 0.00118567 rerr 0.00564531 larg 0.875 rarg 0.882812
 113: out 8 biased 6; lerr 0.00169516 rerr 0.00271239 larg 0.882812 rarg 0.890625
 114: out 7 biased 6; lerr 0.0046605 rerr 0.000304507 larg 0.890625 rarg 0.898438
 115: out 7 biased 5; lerr 0.000304507 rerr 0.00403259 larg 0.898438 rarg 0.90625
 116: out 6 biased 5; lerr 0.00340469 rerr 0.00088176 larg 0.90625 rarg 0.914062
 117: out 6 biased 4; lerr 0.00088176 rerr 0.00514993 larg 0.914062 rarg 0.921875
 118: out 5 biased 4; lerr 0.00235119 rerr 0.00186722 larg 0.921875 rarg 0.929688
 119: out 4 biased 4; lerr 0.00566562 rerr 0.00149648 larg 0.929688 rarg 0.9375
 120: out 4 biased 3; lerr 0.00149648 rerr 0.00265532 larg 0.9375 rarg 0.945312
 121: out 3 biased 3; lerr 0.00494055 rerr 0.0008372 larg 0.945312 rarg 0.953125
 122: out 3 biased 2; lerr 0.0008372 rerr 0.00324937 larg 0.953125 rarg 0.960938
 123: out 2 biased 2; lerr 0.00440902 rerr 0.000370094 larg 0.960938 rarg 0.96875
 124: out 2 biased 1; lerr 0.000370094 rerr 0.00365258 larg 0.96875 rarg 0.976562
 125: out 1 biased 1; lerr 0.00406783 rerr 9.20338E-05 larg 0.976562 rarg 0.984375
 126: out 1 biased 0; lerr 9.20338E-05 rerr 0.00386801 larg 0.984375 rarg 0.992188
 127: out 0 biased 0; lerr 0.00391391 rerr 0 larg 0.992188 rarg 1

 ... [removed hex data dumping]

max recip 7x7 error at 0.519531: 0.00558472 or 2^-7.4843
max rsqrt 7x7 error at 0.546875: 0.00628347 or  2^-7.31422


On 2020-08-03 1:17 p.m., Bill Huffman wrote:

I should have said that my results are for the 7/7 case.  And it sounds like we're in agreement then.  We probably have the same table.

      Bill

On 8/2/20 9:50 AM, DSHORNER wrote:
EXTERNAL MAIL

This is the link to the revised code that does n by m LUT


https://github.com/David-Horner/recip/blob/master/vrecip.cc

On 2020-08-01 4:51 p.m., David Horner via lists.riscv.org wrote:



Re: VFRECIP/VFRSQRT instructions

Andrew Waterman
 



On Mon, Aug 3, 2020 at 12:40 PM Bill Huffman <huffman@...> wrote:

The recip table matches mine as does the worst case error.

I have one different entry in the square root table.  For entry 77, where you have 36, I have 37.  I'm not sure whether it matters.  Also, ages ago, I got a very small difference in worst case error of 2^-7.317 but I haven't gone back to trace anything down about that.


Thanks for validating against your table, Bill.

With my value for that entry, the worst error on the interval of interest is 2^-7.32041, for input 0x3f1a0000.  With yours, it's 2^-7.3164 for 0x3f1bfffd.

Presumably the error's slightly smaller for my scheme because I'm picking the output value that minimizes the maximum error on the interval, rather than picking the midpoint or similar.  Of course, the overall worst error is unaffected.

      Bill

On 8/3/20 11:38 AM, DSHORNER wrote:
EXTERNAL MAIL

Now annotated version --detail
https://github.com/David-Horner/recip/blob/master/vrecip.cc

For the 7x7 below notice the biased value does not exceed 21 for recip (5 of 7 bits) and 15 for rsqrt (4 of 7 bits).

ip 7 op 7 LUT #bits 896 verilog 0  test/test-long 1
Recip7x7LUT (input [6:0] in, output reg [6:0] out);
 in[6:0]  corresponds to sig[S-1:S-6]
 out[6:0] corresponds to sig[S-1:S-6]
 biased : ((ipN-1) - in) << (op - ip) // or >> if neg
 base bias 127  left-shift 0 right-shift 0
 0: out = 127 biased 0; lerr 0.00390625 rerr 0.00387573 larg 0.5 rarg 0.503906
 1: out = 125 biased 1; lerr 0.0039978 rerr 0.00372314 larg 0.503906 rarg 0.507812
 2: out = 123 biased 2; lerr 0.00421143 rerr 0.00344849 larg 0.507812 rarg 0.511719
 3: out = 121 biased 3; lerr 0.00454712 rerr 0.00305176 larg 0.511719 rarg 0.515625
 4: out = 119 biased 4; lerr 0.00500488 rerr 0.00253296 larg 0.515625 rarg 0.519531
 5: out = 117 biased 5; lerr 0.00558472 rerr 0.00189209 larg 0.519531 rarg 0.523438
 6: out = 116 biased 5; lerr 0.00219727 rerr 0.00524902 larg 0.523438 rarg 0.527344
 7: out = 114 biased 6; lerr 0.00299072 rerr 0.00439453 larg 0.527344 rarg 0.53125
 8: out = 112 biased 7; lerr 0.00390625 rerr 0.00341797 larg 0.53125 rarg 0.535156
 9: out = 110 biased 8; lerr 0.00494385 rerr 0.00231934 larg 0.535156 rarg 0.539062
 10: out = 109 biased 8; lerr 0.00189209 rerr 0.00534058 larg 0.539062 rarg 0.542969
 11: out = 107 biased 9; lerr 0.00314331 rerr 0.00402832 larg 0.542969 rarg 0.546875
 12: out = 105 biased 10; lerr 0.0045166 rerr 0.00259399 larg 0.546875 rarg 0.550781
 13: out = 104 biased 10; lerr 0.00170898 rerr 0.00537109 larg 0.550781 rarg 0.554688
 14: out = 102 biased 11; lerr 0.0032959 rerr 0.00372314 larg 0.554688 rarg 0.558594
 15: out = 100 biased 12; lerr 0.00500488 rerr 0.00195312 larg 0.558594 rarg 0.5625
 16: out = 99 biased 12; lerr 0.00244141 rerr 0.00448608 larg 0.5625 rarg 0.566406
 17: out = 97 biased 13; lerr 0.00436401 rerr 0.00250244 larg 0.566406 rarg 0.570312
 18: out = 96 biased 13; lerr 0.00195312 rerr 0.00488281 larg 0.570312 rarg 0.574219
 19: out = 94 biased 14; lerr 0.00408936 rerr 0.00268555 larg 0.574219 rarg 0.578125
 20: out = 93 biased 14; lerr 0.00183105 rerr 0.00491333 larg 0.578125 rarg 0.582031
 21: out = 91 biased 15; lerr 0.00418091 rerr 0.00250244 larg 0.582031 rarg 0.585938
 22: out = 90 biased 15; lerr 0.0020752 rerr 0.00457764 larg 0.585938 rarg 0.589844
 23: out = 88 biased 16; lerr 0.00463867 rerr 0.00195312 larg 0.589844 rarg 0.59375
 24: out = 87 biased 16; lerr 0.00268555 rerr 0.00387573 larg 0.59375 rarg 0.597656
 25: out = 85 biased 17; lerr 0.00546265 rerr 0.0010376 larg 0.597656 rarg 0.601562
 26: out = 84 biased 17; lerr 0.00366211 rerr 0.00280762 larg 0.601562 rarg 0.605469
 27: out = 83 biased 17; lerr 0.00192261 rerr 0.0045166 larg 0.605469 rarg 0.609375
 28: out = 81 biased 18; lerr 0.00500488 rerr 0.00137329 larg 0.609375 rarg 0.613281
 29: out = 80 biased 18; lerr 0.00341797 rerr 0.00292969 larg 0.613281 rarg 0.617188
 30: out = 79 biased 18; lerr 0.00189209 rerr 0.00442505 larg 0.617188 rarg 0.621094
 31: out = 77 biased 19; lerr 0.00527954 rerr 0.000976562 larg 0.621094 rarg 0.625
 32: out = 76 biased 19; lerr 0.00390625 rerr 0.00231934 larg 0.625 rarg 0.628906
 33: out = 75 biased 19; lerr 0.00259399 rerr 0.00360107 larg 0.628906 rarg 0.632812
 34: out = 74 biased 19; lerr 0.00134277 rerr 0.00482178 larg 0.632812 rarg 0.636719
 35: out = 72 biased 20; lerr 0.00512695 rerr 0.000976562 larg 0.636719 rarg 0.640625
 36: out = 71 biased 20; lerr 0.00402832 rerr 0.00204468 larg 0.640625 rarg 0.644531
 37: out = 70 biased 20; lerr 0.00299072 rerr 0.00305176 larg 0.644531 rarg 0.648438
 38: out = 69 biased 20; lerr 0.00201416 rerr 0.0039978 larg 0.648438 rarg 0.652344
 39: out = 68 biased 20; lerr 0.00109863 rerr 0.00488281 larg 0.652344 rarg 0.65625
 40: out = 66 biased 21; lerr 0.00537109 rerr 0.000549316 larg 0.65625 rarg 0.660156
 41: out = 65 biased 21; lerr 0.00460815 rerr 0.00128174 larg 0.660156 rarg 0.664062
 42: out = 64 biased 21; lerr 0.00390625 rerr 0.00195312 larg 0.664062 rarg 0.667969
 43: out = 63 biased 21; lerr 0.00326538 rerr 0.00256348 larg 0.667969 rarg 0.671875
 44: out = 62 biased 21; lerr 0.00268555 rerr 0.00311279 larg 0.671875 rarg 0.675781
 45: out = 61 biased 21; lerr 0.00216675 rerr 0.00360107 larg 0.675781 rarg 0.679688
 46: out = 60 biased 21; lerr 0.00170898 rerr 0.00402832 larg 0.679688 rarg 0.683594
 47: out = 59 biased 21; lerr 0.00131226 rerr 0.00439453 larg 0.683594 rarg 0.6875
 48: out = 58 biased 21; lerr 0.000976562 rerr 0.00469971 larg 0.6875 rarg 0.691406
 49: out = 57 biased 21; lerr 0.000701904 rerr 0.00494385 larg 0.691406 rarg 0.695312
 50: out = 56 biased 21; lerr 0.000488281 rerr 0.00512695 larg 0.695312 rarg 0.699219
 51: out = 55 biased 21; lerr 0.000335693 rerr 0.00524902 larg 0.699219 rarg 0.703125
 52: out = 54 biased 21; lerr 0.000244141 rerr 0.00531006 larg 0.703125 rarg 0.707031
 53: out = 53 biased 21; lerr 0.000213623 rerr 0.00531006 larg 0.707031 rarg 0.710938
 54: out = 52 biased 21; lerr 0.000244141 rerr 0.00524902 larg 0.710938 rarg 0.714844
 55: out = 51 biased 21; lerr 0.000335693 rerr 0.00512695 larg 0.714844 rarg 0.71875
 56: out = 50 biased 21; lerr 0.000488281 rerr 0.00494385 larg 0.71875 rarg 0.722656
 57: out = 49 biased 21; lerr 0.000701904 rerr 0.00469971 larg 0.722656 rarg 0.726562
 58: out = 48 biased 21; lerr 0.000976562 rerr 0.00439453 larg 0.726562 rarg 0.730469
 59: out = 47 biased 21; lerr 0.00131226 rerr 0.00402832 larg 0.730469 rarg 0.734375
 60: out = 46 biased 21; lerr 0.00170898 rerr 0.00360107 larg 0.734375 rarg 0.738281
 61: out = 45 biased 21; lerr 0.00216675 rerr 0.00311279 larg 0.738281 rarg 0.742188
 62: out = 44 biased 21; lerr 0.00268555 rerr 0.00256348 larg 0.742188 rarg 0.746094
 63: out = 43 biased 21; lerr 0.00326538 rerr 0.00195312 larg 0.746094 rarg 0.75
 64: out = 42 biased 21; lerr 0.00390625 rerr 0.00128174 larg 0.75 rarg 0.753906
 65: out = 41 biased 21; lerr 0.00460815 rerr 0.000549316 larg 0.753906 rarg 0.757812
 66: out = 40 biased 21; lerr 0.00537109 rerr 0.000244141 larg 0.757812 rarg 0.761719
 67: out = 40 biased 20; lerr 0.000244141 rerr 0.00488281 larg 0.761719 rarg 0.765625
 68: out = 39 biased 20; lerr 0.00109863 rerr 0.0039978 larg 0.765625 rarg 0.769531
 69: out = 38 biased 20; lerr 0.00201416 rerr 0.00305176 larg 0.769531 rarg 0.773438
 70: out = 37 biased 20; lerr 0.00299072 rerr 0.00204468 larg 0.773438 rarg 0.777344
 71: out = 36 biased 20; lerr 0.00402832 rerr 0.000976562 larg 0.777344 rarg 0.78125
 72: out = 35 biased 20; lerr 0.00512695 rerr 0.000152588 larg 0.78125 rarg 0.785156
 73: out = 35 biased 19; lerr 0.000152588 rerr 0.00482178 larg 0.785156 rarg 0.789062
 74: out = 34 biased 19; lerr 0.00134277 rerr 0.00360107 larg 0.789062 rarg 0.792969
 75: out = 33 biased 19; lerr 0.00259399 rerr 0.00231934 larg 0.792969 rarg 0.796875
 76: out = 32 biased 19; lerr 0.00390625 rerr 0.000976562 larg 0.796875 rarg 0.800781
 77: out = 31 biased 19; lerr 0.00527954 rerr 0.000427246 larg 0.800781 rarg 0.804688
 78: out = 31 biased 18; lerr 0.000427246 rerr 0.00442505 larg 0.804688 rarg 0.808594
 79: out = 30 biased 18; lerr 0.00189209 rerr 0.00292969 larg 0.808594 rarg 0.8125
 80: out = 29 biased 18; lerr 0.00341797 rerr 0.00137329 larg 0.8125 rarg 0.816406
 81: out = 28 biased 18; lerr 0.00500488 rerr 0.000244141 larg 0.816406 rarg 0.820312
 82: out = 28 biased 17; lerr 0.000244141 rerr 0.0045166 larg 0.820312 rarg 0.824219
 83: out = 27 biased 17; lerr 0.00192261 rerr 0.00280762 larg 0.824219 rarg 0.828125
 84: out = 26 biased 17; lerr 0.00366211 rerr 0.0010376 larg 0.828125 rarg 0.832031
 85: out = 25 biased 17; lerr 0.00546265 rerr 0.000793457 larg 0.832031 rarg 0.835938
 86: out = 25 biased 16; lerr 0.000793457 rerr 0.00387573 larg 0.835938 rarg 0.839844
 87: out = 24 biased 16; lerr 0.00268555 rerr 0.00195312 larg 0.839844 rarg 0.84375
 88: out = 23 biased 16; lerr 0.00463867 rerr 3.05176E-05 larg 0.84375 rarg 0.847656
 89: out = 23 biased 15; lerr 3.05176E-05 rerr 0.00457764 larg 0.847656 rarg 0.851562
 90: out = 22 biased 15; lerr 0.0020752 rerr 0.00250244 larg 0.851562 rarg 0.855469
 91: out = 21 biased 15; lerr 0.00418091 rerr 0.000366211 larg 0.855469 rarg 0.859375
 92: out = 21 biased 14; lerr 0.000366211 rerr 0.00491333 larg 0.859375 rarg 0.863281
 93: out = 20 biased 14; lerr 0.00183105 rerr 0.00268555 larg 0.863281 rarg 0.867188
 94: out = 19 biased 14; lerr 0.00408936 rerr 0.000396729 larg 0.867188 rarg 0.871094
 95: out = 19 biased 13; lerr 0.000396729 rerr 0.00488281 larg 0.871094 rarg 0.875
 96: out = 18 biased 13; lerr 0.00195312 rerr 0.00250244 larg 0.875 rarg 0.878906
 97: out = 17 biased 13; lerr 0.00436401 rerr 6.10352E-05 larg 0.878906 rarg 0.882812
 98: out = 17 biased 12; lerr 6.10352E-05 rerr 0.00448608 larg 0.882812 rarg 0.886719
 99: out = 16 biased 12; lerr 0.00244141 rerr 0.00195312 larg 0.886719 rarg 0.890625
 100: out = 15 biased 12; lerr 0.00500488 rerr 0.000640869 larg 0.890625 rarg 0.894531
 101: out = 15 biased 11; lerr 0.000640869 rerr 0.00372314 larg 0.894531 rarg 0.898438
 102: out = 14 biased 11; lerr 0.0032959 rerr 0.0010376 larg 0.898438 rarg 0.902344
 103: out = 14 biased 10; lerr 0.0010376 rerr 0.00537109 larg 0.902344 rarg 0.90625
 104: out = 13 biased 10; lerr 0.00170898 rerr 0.00259399 larg 0.90625 rarg 0.910156
 105: out = 12 biased 10; lerr 0.0045166 rerr 0.000244141 larg 0.910156 rarg 0.914062
 106: out = 12 biased 9; lerr 0.000244141 rerr 0.00402832 larg 0.914062 rarg 0.917969
 107: out = 11 biased 9; lerr 0.00314331 rerr 0.00109863 larg 0.917969 rarg 0.921875
 108: out = 11 biased 8; lerr 0.00109863 rerr 0.00534058 larg 0.921875 rarg 0.925781
 109: out = 10 biased 8; lerr 0.00189209 rerr 0.00231934 larg 0.925781 rarg 0.929688
 110: out = 9 biased 8; lerr 0.00494385 rerr 0.000762939 larg 0.929688 rarg 0.933594
 111: out = 9 biased 7; lerr 0.000762939 rerr 0.00341797 larg 0.933594 rarg 0.9375
 112: out = 8 biased 7; lerr 0.00390625 rerr 0.000244141 larg 0.9375 rarg 0.941406
 113: out = 8 biased 6; lerr 0.000244141 rerr 0.00439453 larg 0.941406 rarg 0.945312
 114: out = 7 biased 6; lerr 0.00299072 rerr 0.00112915 larg 0.945312 rarg 0.949219
 115: out = 7 biased 5; lerr 0.00112915 rerr 0.00524902 larg 0.949219 rarg 0.953125
 116: out = 6 biased 5; lerr 0.00219727 rerr 0.00189209 larg 0.953125 rarg 0.957031
 117: out = 5 biased 5; lerr 0.00558472 rerr 0.00152588 larg 0.957031 rarg 0.960938
 118: out = 5 biased 4; lerr 0.00152588 rerr 0.00253296 larg 0.960938 rarg 0.964844
 119: out = 4 biased 4; lerr 0.00500488 rerr 0.000976562 larg 0.964844 rarg 0.96875
 120: out = 4 biased 3; lerr 0.000976562 rerr 0.00305176 larg 0.96875 rarg 0.972656
 121: out = 3 biased 3; lerr 0.00454712 rerr 0.000549316 larg 0.972656 rarg 0.976562
 122: out = 3 biased 2; lerr 0.000549316 rerr 0.00344849 larg 0.976562 rarg 0.980469
 123: out = 2 biased 2; lerr 0.00421143 rerr 0.000244141 larg 0.980469 rarg 0.984375
 124: out = 2 biased 1; lerr 0.000244141 rerr 0.00372314 larg 0.984375 rarg 0.988281
 125: out = 1 biased 1; lerr 0.0039978 rerr 6.10352E-05 larg 0.988281 rarg 0.992188
 126: out = 1 biased 0; lerr 6.10352E-05 rerr 0.00387573 larg 0.992188 rarg 0.996094
 127: out = 0 biased 0; lerr 0.00390625 rerr 0 larg 0.996094 rarg 1

 ... [removed hex data dumping]

RSqrt7x7LUT (input [6:0] in, output reg [6:0] out);
  // in[6] corresponds to exp[0]
  // in[5:0] corresponds to sig[S-1:S-5]
  // out[6:0] corresponds to sig[S-1:S-6]
  // biased : ((ipN-1) - in) << (op - ip)
 0: out 127 biased 0; lerr 0.00390625 rerr 0.00384557 larg 0.25 rarg 0.253906
 1: out 125 biased 1; lerr 0.00402773 rerr 0.00360435 larg 0.253906 rarg 0.257812
 2: out 123 biased 2; lerr 0.00432928 rerr 0.00318533 larg 0.257812 rarg 0.261719
 3: out 121 biased 3; lerr 0.00480818 rerr 0.00259111 larg 0.261719 rarg 0.265625
 4: out 119 biased 4; lerr 0.00546183 rerr 0.00182426 larg 0.265625 rarg 0.269531
 5: out 118 biased 4; lerr 0.0022317 rerr 0.00497249 larg 0.269531 rarg 0.273438
 6: out 116 biased 5; lerr 0.00319802 rerr 0.00389675 larg 0.273438 rarg 0.277344
 7: out 114 biased 6; lerr 0.00433191 rerr 0.00265532 larg 0.277344 rarg 0.28125
 8: out 113 biased 6; lerr 0.00148789 rerr 0.00542232 larg 0.28125 rarg 0.285156
 9: out 111 biased 7; lerr 0.00292144 rerr 0.00388464 larg 0.285156 rarg 0.289062
 10: out 109 biased 8; lerr 0.00451607 rerr 0.0021876 larg 0.289062 rarg 0.292969
 11: out 108 biased 8; lerr 0.00204104 rerr 0.00458999 larg 0.292969 rarg 0.296875
 12: out 106 biased 9; lerr 0.00392348 rerr 0.00260824 larg 0.296875 rarg 0.300781
 13: out 105 biased 9; lerr 0.00167641 rerr 0.00478529 larg 0.300781 rarg 0.304688
 14: out 103 biased 10; lerr 0.00383947 rerr 0.00252584 larg 0.304688 rarg 0.308594
 15: out 102 biased 10; lerr 0.0018141 rerr 0.00448366 larg 0.308594 rarg 0.3125
 16: out 100 biased 11; lerr 0.00425098 rerr 0.00195312 larg 0.3125 rarg 0.316406
 17: out 99 biased 11; lerr 0.00244141 rerr 0.00369747 larg 0.316406 rarg 0.320312
 18: out 97 biased 12; lerr 0.00514568 rerr 0.000902127 larg 0.320312 rarg 0.324219
 19: out 96 biased 12; lerr 0.00354633 rerr 0.00243843 larg 0.324219 rarg 0.328125
 20: out 95 biased 12; lerr 0.00203674 rerr 0.00388594 larg 0.328125 rarg 0.332031
 21: out 93 biased 13; lerr 0.00511752 rerr 0.000717621 larg 0.332031 rarg 0.335938
 22: out 92 biased 13; lerr 0.00381051 rerr 0.00196455 larg 0.335938 rarg 0.339844
 23: out 91 biased 13; lerr 0.00258984 rerr 0.00312603 larg 0.339844 rarg 0.34375
 24: out 90 biased 13; lerr 0.00145446 rerr 0.00420307 larg 0.34375 rarg 0.347656
 25: out 88 biased 14; lerr 0.0050098 rerr 0.000564416 larg 0.347656 rarg 0.351562
 26: out 87 biased 14; lerr 0.00406783 rerr 0.00144985 larg 0.351562 rarg 0.355469
 27: out 86 biased 14; lerr 0.00320806 rerr 0.00225385 larg 0.355469 rarg 0.359375
 28: out 85 biased 14; lerr 0.00242958 rerr 0.00297735 larg 0.359375 rarg 0.363281
 29: out 84 biased 14; lerr 0.00173146 rerr 0.00362122 larg 0.363281 rarg 0.367188
 30: out 83 biased 14; lerr 0.00111284 rerr 0.00418633 larg 0.367188 rarg 0.371094
 31: out 82 biased 14; lerr 0.000572846 rerr 0.00467353 larg 0.371094 rarg 0.375
 32: out 80 biased 15; lerr 0.00489479 rerr 0.00027462 larg 0.375 rarg 0.378906
 33: out 79 biased 15; lerr 0.00453439 rerr 0.000583717 larg 0.378906 rarg 0.382812
 34: out 78 biased 15; lerr 0.00425002 rerr 0.000817442 larg 0.382812 rarg 0.386719
 35: out 77 biased 15; lerr 0.0040409 rerr 0.000976562 larg 0.386719 rarg 0.390625
 36: out 76 biased 15; lerr 0.00390625 rerr 0.00106183 larg 0.390625 rarg 0.394531
 37: out 75 biased 15; lerr 0.00384534 rerr 0.00107398 larg 0.394531 rarg 0.398438
 38: out 74 biased 15; lerr 0.00385742 rerr 0.00101372 larg 0.398438 rarg 0.402344
 39: out 73 biased 15; lerr 0.00394179 rerr 0.00088176 larg 0.402344 rarg 0.40625
 40: out 72 biased 15; lerr 0.00409775 rerr 0.000678786 larg 0.40625 rarg 0.410156
 41: out 71 biased 15; lerr 0.00432461 rerr 0.000405468 larg 0.410156 rarg 0.414062
 42: out 70 biased 15; lerr 0.0046217 rerr 6.24637E-05 larg 0.414062 rarg 0.417969
 43: out 70 biased 14; lerr 6.24637E-05 rerr 0.00472478 larg 0.417969 rarg 0.421875
 44: out 69 biased 14; lerr 0.000349583 rerr 0.00426776 larg 0.421875 rarg 0.425781
 45: out 68 biased 14; lerr 0.000830041 rerr 0.00374284 larg 0.425781 rarg 0.429688
 46: out 67 biased 14; lerr 0.00137829 rerr 0.00315063 larg 0.429688 rarg 0.433594
 47: out 66 biased 14; lerr 0.00199374 rerr 0.00249171 larg 0.433594 rarg 0.4375
 48: out 65 biased 14; lerr 0.00267578 rerr 0.00176667 larg 0.4375 rarg 0.441406
 49: out 64 biased 14; lerr 0.00342383 rerr 0.000976086 larg 0.441406 rarg 0.445312
 50: out 63 biased 14; lerr 0.00423733 rerr 0.000120513 larg 0.445312 rarg 0.449219
 51: out 63 biased 13; lerr 0.000120513 rerr 0.00445945 larg 0.449219 rarg 0.453125
 52: out 62 biased 13; lerr 0.000799499 rerr 0.00349816 larg 0.453125 rarg 0.457031
 53: out 61 biased 13; lerr 0.00178341 rerr 0.00247339 larg 0.457031 rarg 0.460938
 54: out 60 biased 13; lerr 0.0028307 rerr 0.00138568 larg 0.460938 rarg 0.464844
 55: out 59 biased 13; lerr 0.00394084 rerr 0.00023553 larg 0.464844 rarg 0.46875
 56: out 59 biased 12; lerr 0.00023553 rerr 0.00439453 larg 0.46875 rarg 0.472656
 57: out 58 biased 12; lerr 0.000976562 rerr 0.00314314 larg 0.472656 rarg 0.476562
 58: out 57 biased 12; lerr 0.0022501 rerr 0.00183069 larg 0.476562 rarg 0.480469
 59: out 56 biased 12; lerr 0.00358461 rerr 0.000457659 larg 0.480469 rarg 0.484375
 60: out 56 biased 11; lerr 0.000457659 rerr 0.00448366 larg 0.484375 rarg 0.488281
 61: out 55 biased 11; lerr 0.000975489 rerr 0.00301265 larg 0.488281 rarg 0.492188
 62: out 54 biased 11; lerr 0.00246829 rerr 0.00148234 larg 0.492188 rarg 0.496094
 63: out 53 biased 11; lerr 0.00402031 rerr 0.000106817 larg 0.496094 rarg 0.5
 64: out 52 biased 11; lerr 0.00563109 rerr 0.00210731 larg 0.5 rarg 0.507812
 65: out 51 biased 11; lerr 0.00345996 rerr 0.00417648 larg 0.507812 rarg 0.515625
 66: out 50 biased 11; lerr 0.00143345 rerr 0.00610301 larg 0.515625 rarg 0.523438
 67: out 48 biased 12; lerr 0.00520152 rerr 0.00219486 larg 0.523438 rarg 0.53125
 68: out 47 biased 12; lerr 0.00349943 rerr 0.00380104 larg 0.53125 rarg 0.539062
 69: out 46 biased 12; lerr 0.00193497 rerr 0.00527137 larg 0.539062 rarg 0.546875
 70: out 44 biased 13; lerr 0.00628347 rerr 0.000789331 larg 0.546875 rarg 0.554688
 71: out 43 biased 13; lerr 0.00502921 rerr 0.00195312 larg 0.554688 rarg 0.5625
 72: out 42 biased 13; lerr 0.00390625 rerr 0.00298721 larg 0.5625 rarg 0.570312
 73: out 41 biased 13; lerr 0.00291271 rerr 0.00389343 larg 0.570312 rarg 0.578125
 74: out 40 biased 13; lerr 0.00204677 rerr 0.00467353 larg 0.578125 rarg 0.585938
 75: out 39 biased 13; lerr 0.00130667 rerr 0.00532924 larg 0.585938 rarg 0.59375
 76: out 38 biased 13; lerr 0.000690699 rerr 0.00586222 larg 0.59375 rarg 0.601562
 77: out 36 biased 14; lerr 0.0062566 rerr 0.000175461 larg 0.601562 rarg 0.609375
 78: out 35 biased 14; lerr 0.00592317 rerr 0.000428823 larg 0.609375 rarg 0.617188
 79: out 34 biased 14; lerr 0.00570878 rerr 0.000564416 larg 0.617188 rarg 0.625
 80: out 33 biased 14; lerr 0.00561191 rerr 0.000583717 larg 0.625 rarg 0.632812
 81: out 32 biased 14; lerr 0.00563109 rerr 0.000488162 larg 0.632812 rarg 0.640625
 82: out 31 biased 14; lerr 0.00576489 rerr 0.000279149 larg 0.640625 rarg 0.648438
 83: out 30 biased 14; lerr 0.00601191 rerr 4.19626E-05 larg 0.648438 rarg 0.65625
 84: out 30 biased 13; lerr 4.19626E-05 rerr 0.00589256 larg 0.65625 rarg 0.664062
 85: out 29 biased 13; lerr 0.00047385 rerr 0.00538852 larg 0.664062 rarg 0.671875
 86: out 28 biased 13; lerr 0.00101522 rerr 0.00477604 larg 0.671875 rarg 0.679688
 87: out 27 biased 13; lerr 0.00166483 rerr 0.00405633 larg 0.679688 rarg 0.6875
 88: out 26 biased 13; lerr 0.00242145 rerr 0.0032306 larg 0.6875 rarg 0.695312
 89: out 25 biased 13; lerr 0.00328389 rerr 0.0023 larg 0.695312 rarg 0.703125
 90: out 24 biased 13; lerr 0.00425098 rerr 0.00126568 larg 0.703125 rarg 0.710938
 91: out 23 biased 13; lerr 0.0053216 rerr 0.000128738 larg 0.710938 rarg 0.71875
 92: out 23 biased 12; lerr 0.000128738 rerr 0.00554953 larg 0.71875 rarg 0.726562
 93: out 22 biased 12; lerr 0.00110974 rerr 0.00424628 larg 0.726562 rarg 0.734375
 94: out 21 biased 12; lerr 0.0024487 rerr 0.00284339 larg 0.734375 rarg 0.742188
 95: out 20 biased 12; lerr 0.0038871 rerr 0.00134187 larg 0.742188 rarg 0.75
 96: out 19 biased 12; lerr 0.00542395 rerr 0.000257287 larg 0.75 rarg 0.757812
 97: out 19 biased 11; lerr 0.000257287 rerr 0.00488281 larg 0.757812 rarg 0.765625
 98: out 18 biased 11; lerr 0.00195312 rerr 0.00312603 larg 0.765625 rarg 0.773438
 99: out 17 biased 11; lerr 0.0037447 rerr 0.00127425 larg 0.773438 rarg 0.78125
 100: out 16 biased 11; lerr 0.00563109 rerr 0.000671612 larg 0.78125 rarg 0.789062
 101: out 16 biased 10; lerr 0.000671612 rerr 0.00426337 larg 0.789062 rarg 0.796875
 102: out 15 biased 10; lerr 0.00271068 rerr 0.00216607 larg 0.796875 rarg 0.804688
 103: out 14 biased 10; lerr 0.00484208 rerr 2.28884E-05 larg 0.804688 rarg 0.8125
 104: out 14 biased 9; lerr 2.28884E-05 rerr 0.00477319 larg 0.8125 rarg 0.820312
 105: out 13 biased 9; lerr 0.00230268 rerr 0.00243701 larg 0.820312 rarg 0.828125
 106: out 12 biased 9; lerr 0.00467248 rerr 1.1444E-05 larg 0.828125 rarg 0.835938
 107: out 12 biased 8; lerr 1.1444E-05 rerr 0.00467353 larg 0.835938 rarg 0.84375
 108: out 11 biased 8; lerr 0.00250271 rerr 0.00210469 larg 0.84375 rarg 0.851562
 109: out 10 biased 8; lerr 0.0051047 rerr 0.000551376 larg 0.851562 rarg 0.859375
 110: out 10 biased 7; lerr 0.000551376 rerr 0.00398129 larg 0.859375 rarg 0.867188
 111: out 9 biased 7; lerr 0.00329393 rerr 0.00118567 larg 0.867188 rarg 0.875
 112: out 9 biased 6; lerr 0.00118567 rerr 0.00564531 larg 0.875 rarg 0.882812
 113: out 8 biased 6; lerr 0.00169516 rerr 0.00271239 larg 0.882812 rarg 0.890625
 114: out 7 biased 6; lerr 0.0046605 rerr 0.000304507 larg 0.890625 rarg 0.898438
 115: out 7 biased 5; lerr 0.000304507 rerr 0.00403259 larg 0.898438 rarg 0.90625
 116: out 6 biased 5; lerr 0.00340469 rerr 0.00088176 larg 0.90625 rarg 0.914062
 117: out 6 biased 4; lerr 0.00088176 rerr 0.00514993 larg 0.914062 rarg 0.921875
 118: out 5 biased 4; lerr 0.00235119 rerr 0.00186722 larg 0.921875 rarg 0.929688
 119: out 4 biased 4; lerr 0.00566562 rerr 0.00149648 larg 0.929688 rarg 0.9375
 120: out 4 biased 3; lerr 0.00149648 rerr 0.00265532 larg 0.9375 rarg 0.945312
 121: out 3 biased 3; lerr 0.00494055 rerr 0.0008372 larg 0.945312 rarg 0.953125
 122: out 3 biased 2; lerr 0.0008372 rerr 0.00324937 larg 0.953125 rarg 0.960938
 123: out 2 biased 2; lerr 0.00440902 rerr 0.000370094 larg 0.960938 rarg 0.96875
 124: out 2 biased 1; lerr 0.000370094 rerr 0.00365258 larg 0.96875 rarg 0.976562
 125: out 1 biased 1; lerr 0.00406783 rerr 9.20338E-05 larg 0.976562 rarg 0.984375
 126: out 1 biased 0; lerr 9.20338E-05 rerr 0.00386801 larg 0.984375 rarg 0.992188
 127: out 0 biased 0; lerr 0.00391391 rerr 0 larg 0.992188 rarg 1

 ... [removed hex data dumping]

max recip 7x7 error at 0.519531: 0.00558472 or 2^-7.4843
max rsqrt 7x7 error at 0.546875: 0.00628347 or  2^-7.31422


On 2020-08-03 1:17 p.m., Bill Huffman wrote:

I should have said that my results are for the 7/7 case.  And it sounds like we're in agreement then.  We probably have the same table.

      Bill

On 8/2/20 9:50 AM, DSHORNER wrote:
EXTERNAL MAIL

This is the link to the revised code that does n by m LUT


https://github.com/David-Horner/recip/blob/master/vrecip.cc

On 2020-08-01 4:51 p.m., David Horner via lists.riscv.org wrote:



Re: VFRECIP/VFRSQRT instructions

Bill Huffman
 

The recip table matches mine as does the worst case error.

I have one different entry in the square root table.  For entry 77, where you have 36, I have 37.  I'm not sure whether it matters.  Also, ages ago, I got a very small difference in worst case error of 2^-7.317 but I haven't gone back to trace anything down about that.

      Bill

On 8/3/20 11:38 AM, DSHORNER wrote:
EXTERNAL MAIL

Now annotated version --detail
https://github.com/David-Horner/recip/blob/master/vrecip.cc

For the 7x7 below notice the biased value does not exceed 21 for recip (5 of 7 bits) and 15 for rsqrt (4 of 7 bits).

ip 7 op 7 LUT #bits 896 verilog 0  test/test-long 1
Recip7x7LUT (input [6:0] in, output reg [6:0] out);
 in[6:0]  corresponds to sig[S-1:S-6]
 out[6:0] corresponds to sig[S-1:S-6]
 biased : ((ipN-1) - in) << (op - ip) // or >> if neg
 base bias 127  left-shift 0 right-shift 0
 0: out = 127 biased 0; lerr 0.00390625 rerr 0.00387573 larg 0.5 rarg 0.503906
 1: out = 125 biased 1; lerr 0.0039978 rerr 0.00372314 larg 0.503906 rarg 0.507812
 2: out = 123 biased 2; lerr 0.00421143 rerr 0.00344849 larg 0.507812 rarg 0.511719
 3: out = 121 biased 3; lerr 0.00454712 rerr 0.00305176 larg 0.511719 rarg 0.515625
 4: out = 119 biased 4; lerr 0.00500488 rerr 0.00253296 larg 0.515625 rarg 0.519531
 5: out = 117 biased 5; lerr 0.00558472 rerr 0.00189209 larg 0.519531 rarg 0.523438
 6: out = 116 biased 5; lerr 0.00219727 rerr 0.00524902 larg 0.523438 rarg 0.527344
 7: out = 114 biased 6; lerr 0.00299072 rerr 0.00439453 larg 0.527344 rarg 0.53125
 8: out = 112 biased 7; lerr 0.00390625 rerr 0.00341797 larg 0.53125 rarg 0.535156
 9: out = 110 biased 8; lerr 0.00494385 rerr 0.00231934 larg 0.535156 rarg 0.539062
 10: out = 109 biased 8; lerr 0.00189209 rerr 0.00534058 larg 0.539062 rarg 0.542969
 11: out = 107 biased 9; lerr 0.00314331 rerr 0.00402832 larg 0.542969 rarg 0.546875
 12: out = 105 biased 10; lerr 0.0045166 rerr 0.00259399 larg 0.546875 rarg 0.550781
 13: out = 104 biased 10; lerr 0.00170898 rerr 0.00537109 larg 0.550781 rarg 0.554688
 14: out = 102 biased 11; lerr 0.0032959 rerr 0.00372314 larg 0.554688 rarg 0.558594
 15: out = 100 biased 12; lerr 0.00500488 rerr 0.00195312 larg 0.558594 rarg 0.5625
 16: out = 99 biased 12; lerr 0.00244141 rerr 0.00448608 larg 0.5625 rarg 0.566406
 17: out = 97 biased 13; lerr 0.00436401 rerr 0.00250244 larg 0.566406 rarg 0.570312
 18: out = 96 biased 13; lerr 0.00195312 rerr 0.00488281 larg 0.570312 rarg 0.574219
 19: out = 94 biased 14; lerr 0.00408936 rerr 0.00268555 larg 0.574219 rarg 0.578125
 20: out = 93 biased 14; lerr 0.00183105 rerr 0.00491333 larg 0.578125 rarg 0.582031
 21: out = 91 biased 15; lerr 0.00418091 rerr 0.00250244 larg 0.582031 rarg 0.585938
 22: out = 90 biased 15; lerr 0.0020752 rerr 0.00457764 larg 0.585938 rarg 0.589844
 23: out = 88 biased 16; lerr 0.00463867 rerr 0.00195312 larg 0.589844 rarg 0.59375
 24: out = 87 biased 16; lerr 0.00268555 rerr 0.00387573 larg 0.59375 rarg 0.597656
 25: out = 85 biased 17; lerr 0.00546265 rerr 0.0010376 larg 0.597656 rarg 0.601562
 26: out = 84 biased 17; lerr 0.00366211 rerr 0.00280762 larg 0.601562 rarg 0.605469
 27: out = 83 biased 17; lerr 0.00192261 rerr 0.0045166 larg 0.605469 rarg 0.609375
 28: out = 81 biased 18; lerr 0.00500488 rerr 0.00137329 larg 0.609375 rarg 0.613281
 29: out = 80 biased 18; lerr 0.00341797 rerr 0.00292969 larg 0.613281 rarg 0.617188
 30: out = 79 biased 18; lerr 0.00189209 rerr 0.00442505 larg 0.617188 rarg 0.621094
 31: out = 77 biased 19; lerr 0.00527954 rerr 0.000976562 larg 0.621094 rarg 0.625
 32: out = 76 biased 19; lerr 0.00390625 rerr 0.00231934 larg 0.625 rarg 0.628906
 33: out = 75 biased 19; lerr 0.00259399 rerr 0.00360107 larg 0.628906 rarg 0.632812
 34: out = 74 biased 19; lerr 0.00134277 rerr 0.00482178 larg 0.632812 rarg 0.636719
 35: out = 72 biased 20; lerr 0.00512695 rerr 0.000976562 larg 0.636719 rarg 0.640625
 36: out = 71 biased 20; lerr 0.00402832 rerr 0.00204468 larg 0.640625 rarg 0.644531
 37: out = 70 biased 20; lerr 0.00299072 rerr 0.00305176 larg 0.644531 rarg 0.648438
 38: out = 69 biased 20; lerr 0.00201416 rerr 0.0039978 larg 0.648438 rarg 0.652344
 39: out = 68 biased 20; lerr 0.00109863 rerr 0.00488281 larg 0.652344 rarg 0.65625
 40: out = 66 biased 21; lerr 0.00537109 rerr 0.000549316 larg 0.65625 rarg 0.660156
 41: out = 65 biased 21; lerr 0.00460815 rerr 0.00128174 larg 0.660156 rarg 0.664062
 42: out = 64 biased 21; lerr 0.00390625 rerr 0.00195312 larg 0.664062 rarg 0.667969
 43: out = 63 biased 21; lerr 0.00326538 rerr 0.00256348 larg 0.667969 rarg 0.671875
 44: out = 62 biased 21; lerr 0.00268555 rerr 0.00311279 larg 0.671875 rarg 0.675781
 45: out = 61 biased 21; lerr 0.00216675 rerr 0.00360107 larg 0.675781 rarg 0.679688
 46: out = 60 biased 21; lerr 0.00170898 rerr 0.00402832 larg 0.679688 rarg 0.683594
 47: out = 59 biased 21; lerr 0.00131226 rerr 0.00439453 larg 0.683594 rarg 0.6875
 48: out = 58 biased 21; lerr 0.000976562 rerr 0.00469971 larg 0.6875 rarg 0.691406
 49: out = 57 biased 21; lerr 0.000701904 rerr 0.00494385 larg 0.691406 rarg 0.695312
 50: out = 56 biased 21; lerr 0.000488281 rerr 0.00512695 larg 0.695312 rarg 0.699219
 51: out = 55 biased 21; lerr 0.000335693 rerr 0.00524902 larg 0.699219 rarg 0.703125
 52: out = 54 biased 21; lerr 0.000244141 rerr 0.00531006 larg 0.703125 rarg 0.707031
 53: out = 53 biased 21; lerr 0.000213623 rerr 0.00531006 larg 0.707031 rarg 0.710938
 54: out = 52 biased 21; lerr 0.000244141 rerr 0.00524902 larg 0.710938 rarg 0.714844
 55: out = 51 biased 21; lerr 0.000335693 rerr 0.00512695 larg 0.714844 rarg 0.71875
 56: out = 50 biased 21; lerr 0.000488281 rerr 0.00494385 larg 0.71875 rarg 0.722656
 57: out = 49 biased 21; lerr 0.000701904 rerr 0.00469971 larg 0.722656 rarg 0.726562
 58: out = 48 biased 21; lerr 0.000976562 rerr 0.00439453 larg 0.726562 rarg 0.730469
 59: out = 47 biased 21; lerr 0.00131226 rerr 0.00402832 larg 0.730469 rarg 0.734375
 60: out = 46 biased 21; lerr 0.00170898 rerr 0.00360107 larg 0.734375 rarg 0.738281
 61: out = 45 biased 21; lerr 0.00216675 rerr 0.00311279 larg 0.738281 rarg 0.742188
 62: out = 44 biased 21; lerr 0.00268555 rerr 0.00256348 larg 0.742188 rarg 0.746094
 63: out = 43 biased 21; lerr 0.00326538 rerr 0.00195312 larg 0.746094 rarg 0.75
 64: out = 42 biased 21; lerr 0.00390625 rerr 0.00128174 larg 0.75 rarg 0.753906
 65: out = 41 biased 21; lerr 0.00460815 rerr 0.000549316 larg 0.753906 rarg 0.757812
 66: out = 40 biased 21; lerr 0.00537109 rerr 0.000244141 larg 0.757812 rarg 0.761719
 67: out = 40 biased 20; lerr 0.000244141 rerr 0.00488281 larg 0.761719 rarg 0.765625
 68: out = 39 biased 20; lerr 0.00109863 rerr 0.0039978 larg 0.765625 rarg 0.769531
 69: out = 38 biased 20; lerr 0.00201416 rerr 0.00305176 larg 0.769531 rarg 0.773438
 70: out = 37 biased 20; lerr 0.00299072 rerr 0.00204468 larg 0.773438 rarg 0.777344
 71: out = 36 biased 20; lerr 0.00402832 rerr 0.000976562 larg 0.777344 rarg 0.78125
 72: out = 35 biased 20; lerr 0.00512695 rerr 0.000152588 larg 0.78125 rarg 0.785156
 73: out = 35 biased 19; lerr 0.000152588 rerr 0.00482178 larg 0.785156 rarg 0.789062
 74: out = 34 biased 19; lerr 0.00134277 rerr 0.00360107 larg 0.789062 rarg 0.792969
 75: out = 33 biased 19; lerr 0.00259399 rerr 0.00231934 larg 0.792969 rarg 0.796875
 76: out = 32 biased 19; lerr 0.00390625 rerr 0.000976562 larg 0.796875 rarg 0.800781
 77: out = 31 biased 19; lerr 0.00527954 rerr 0.000427246 larg 0.800781 rarg 0.804688
 78: out = 31 biased 18; lerr 0.000427246 rerr 0.00442505 larg 0.804688 rarg 0.808594
 79: out = 30 biased 18; lerr 0.00189209 rerr 0.00292969 larg 0.808594 rarg 0.8125
 80: out = 29 biased 18; lerr 0.00341797 rerr 0.00137329 larg 0.8125 rarg 0.816406
 81: out = 28 biased 18; lerr 0.00500488 rerr 0.000244141 larg 0.816406 rarg 0.820312
 82: out = 28 biased 17; lerr 0.000244141 rerr 0.0045166 larg 0.820312 rarg 0.824219
 83: out = 27 biased 17; lerr 0.00192261 rerr 0.00280762 larg 0.824219 rarg 0.828125
 84: out = 26 biased 17; lerr 0.00366211 rerr 0.0010376 larg 0.828125 rarg 0.832031
 85: out = 25 biased 17; lerr 0.00546265 rerr 0.000793457 larg 0.832031 rarg 0.835938
 86: out = 25 biased 16; lerr 0.000793457 rerr 0.00387573 larg 0.835938 rarg 0.839844
 87: out = 24 biased 16; lerr 0.00268555 rerr 0.00195312 larg 0.839844 rarg 0.84375
 88: out = 23 biased 16; lerr 0.00463867 rerr 3.05176E-05 larg 0.84375 rarg 0.847656
 89: out = 23 biased 15; lerr 3.05176E-05 rerr 0.00457764 larg 0.847656 rarg 0.851562
 90: out = 22 biased 15; lerr 0.0020752 rerr 0.00250244 larg 0.851562 rarg 0.855469
 91: out = 21 biased 15; lerr 0.00418091 rerr 0.000366211 larg 0.855469 rarg 0.859375
 92: out = 21 biased 14; lerr 0.000366211 rerr 0.00491333 larg 0.859375 rarg 0.863281
 93: out = 20 biased 14; lerr 0.00183105 rerr 0.00268555 larg 0.863281 rarg 0.867188
 94: out = 19 biased 14; lerr 0.00408936 rerr 0.000396729 larg 0.867188 rarg 0.871094
 95: out = 19 biased 13; lerr 0.000396729 rerr 0.00488281 larg 0.871094 rarg 0.875
 96: out = 18 biased 13; lerr 0.00195312 rerr 0.00250244 larg 0.875 rarg 0.878906
 97: out = 17 biased 13; lerr 0.00436401 rerr 6.10352E-05 larg 0.878906 rarg 0.882812
 98: out = 17 biased 12; lerr 6.10352E-05 rerr 0.00448608 larg 0.882812 rarg 0.886719
 99: out = 16 biased 12; lerr 0.00244141 rerr 0.00195312 larg 0.886719 rarg 0.890625
 100: out = 15 biased 12; lerr 0.00500488 rerr 0.000640869 larg 0.890625 rarg 0.894531
 101: out = 15 biased 11; lerr 0.000640869 rerr 0.00372314 larg 0.894531 rarg 0.898438
 102: out = 14 biased 11; lerr 0.0032959 rerr 0.0010376 larg 0.898438 rarg 0.902344
 103: out = 14 biased 10; lerr 0.0010376 rerr 0.00537109 larg 0.902344 rarg 0.90625
 104: out = 13 biased 10; lerr 0.00170898 rerr 0.00259399 larg 0.90625 rarg 0.910156
 105: out = 12 biased 10; lerr 0.0045166 rerr 0.000244141 larg 0.910156 rarg 0.914062
 106: out = 12 biased 9; lerr 0.000244141 rerr 0.00402832 larg 0.914062 rarg 0.917969
 107: out = 11 biased 9; lerr 0.00314331 rerr 0.00109863 larg 0.917969 rarg 0.921875
 108: out = 11 biased 8; lerr 0.00109863 rerr 0.00534058 larg 0.921875 rarg 0.925781
 109: out = 10 biased 8; lerr 0.00189209 rerr 0.00231934 larg 0.925781 rarg 0.929688
 110: out = 9 biased 8; lerr 0.00494385 rerr 0.000762939 larg 0.929688 rarg 0.933594
 111: out = 9 biased 7; lerr 0.000762939 rerr 0.00341797 larg 0.933594 rarg 0.9375
 112: out = 8 biased 7; lerr 0.00390625 rerr 0.000244141 larg 0.9375 rarg 0.941406
 113: out = 8 biased 6; lerr 0.000244141 rerr 0.00439453 larg 0.941406 rarg 0.945312
 114: out = 7 biased 6; lerr 0.00299072 rerr 0.00112915 larg 0.945312 rarg 0.949219
 115: out = 7 biased 5; lerr 0.00112915 rerr 0.00524902 larg 0.949219 rarg 0.953125
 116: out = 6 biased 5; lerr 0.00219727 rerr 0.00189209 larg 0.953125 rarg 0.957031
 117: out = 5 biased 5; lerr 0.00558472 rerr 0.00152588 larg 0.957031 rarg 0.960938
 118: out = 5 biased 4; lerr 0.00152588 rerr 0.00253296 larg 0.960938 rarg 0.964844
 119: out = 4 biased 4; lerr 0.00500488 rerr 0.000976562 larg 0.964844 rarg 0.96875
 120: out = 4 biased 3; lerr 0.000976562 rerr 0.00305176 larg 0.96875 rarg 0.972656
 121: out = 3 biased 3; lerr 0.00454712 rerr 0.000549316 larg 0.972656 rarg 0.976562
 122: out = 3 biased 2; lerr 0.000549316 rerr 0.00344849 larg 0.976562 rarg 0.980469
 123: out = 2 biased 2; lerr 0.00421143 rerr 0.000244141 larg 0.980469 rarg 0.984375
 124: out = 2 biased 1; lerr 0.000244141 rerr 0.00372314 larg 0.984375 rarg 0.988281
 125: out = 1 biased 1; lerr 0.0039978 rerr 6.10352E-05 larg 0.988281 rarg 0.992188
 126: out = 1 biased 0; lerr 6.10352E-05 rerr 0.00387573 larg 0.992188 rarg 0.996094
 127: out = 0 biased 0; lerr 0.00390625 rerr 0 larg 0.996094 rarg 1

 ... [removed hex data dumping]

RSqrt7x7LUT (input [6:0] in, output reg [6:0] out);
  // in[6] corresponds to exp[0]
  // in[5:0] corresponds to sig[S-1:S-5]
  // out[6:0] corresponds to sig[S-1:S-6]
  // biased : ((ipN-1) - in) << (op - ip)
 0: out 127 biased 0; lerr 0.00390625 rerr 0.00384557 larg 0.25 rarg 0.253906
 1: out 125 biased 1; lerr 0.00402773 rerr 0.00360435 larg 0.253906 rarg 0.257812
 2: out 123 biased 2; lerr 0.00432928 rerr 0.00318533 larg 0.257812 rarg 0.261719
 3: out 121 biased 3; lerr 0.00480818 rerr 0.00259111 larg 0.261719 rarg 0.265625
 4: out 119 biased 4; lerr 0.00546183 rerr 0.00182426 larg 0.265625 rarg 0.269531
 5: out 118 biased 4; lerr 0.0022317 rerr 0.00497249 larg 0.269531 rarg 0.273438
 6: out 116 biased 5; lerr 0.00319802 rerr 0.00389675 larg 0.273438 rarg 0.277344
 7: out 114 biased 6; lerr 0.00433191 rerr 0.00265532 larg 0.277344 rarg 0.28125
 8: out 113 biased 6; lerr 0.00148789 rerr 0.00542232 larg 0.28125 rarg 0.285156
 9: out 111 biased 7; lerr 0.00292144 rerr 0.00388464 larg 0.285156 rarg 0.289062
 10: out 109 biased 8; lerr 0.00451607 rerr 0.0021876 larg 0.289062 rarg 0.292969
 11: out 108 biased 8; lerr 0.00204104 rerr 0.00458999 larg 0.292969 rarg 0.296875
 12: out 106 biased 9; lerr 0.00392348 rerr 0.00260824 larg 0.296875 rarg 0.300781
 13: out 105 biased 9; lerr 0.00167641 rerr 0.00478529 larg 0.300781 rarg 0.304688
 14: out 103 biased 10; lerr 0.00383947 rerr 0.00252584 larg 0.304688 rarg 0.308594
 15: out 102 biased 10; lerr 0.0018141 rerr 0.00448366 larg 0.308594 rarg 0.3125
 16: out 100 biased 11; lerr 0.00425098 rerr 0.00195312 larg 0.3125 rarg 0.316406
 17: out 99 biased 11; lerr 0.00244141 rerr 0.00369747 larg 0.316406 rarg 0.320312
 18: out 97 biased 12; lerr 0.00514568 rerr 0.000902127 larg 0.320312 rarg 0.324219
 19: out 96 biased 12; lerr 0.00354633 rerr 0.00243843 larg 0.324219 rarg 0.328125
 20: out 95 biased 12; lerr 0.00203674 rerr 0.00388594 larg 0.328125 rarg 0.332031
 21: out 93 biased 13; lerr 0.00511752 rerr 0.000717621 larg 0.332031 rarg 0.335938
 22: out 92 biased 13; lerr 0.00381051 rerr 0.00196455 larg 0.335938 rarg 0.339844
 23: out 91 biased 13; lerr 0.00258984 rerr 0.00312603 larg 0.339844 rarg 0.34375
 24: out 90 biased 13; lerr 0.00145446 rerr 0.00420307 larg 0.34375 rarg 0.347656
 25: out 88 biased 14; lerr 0.0050098 rerr 0.000564416 larg 0.347656 rarg 0.351562
 26: out 87 biased 14; lerr 0.00406783 rerr 0.00144985 larg 0.351562 rarg 0.355469
 27: out 86 biased 14; lerr 0.00320806 rerr 0.00225385 larg 0.355469 rarg 0.359375
 28: out 85 biased 14; lerr 0.00242958 rerr 0.00297735 larg 0.359375 rarg 0.363281
 29: out 84 biased 14; lerr 0.00173146 rerr 0.00362122 larg 0.363281 rarg 0.367188
 30: out 83 biased 14; lerr 0.00111284 rerr 0.00418633 larg 0.367188 rarg 0.371094
 31: out 82 biased 14; lerr 0.000572846 rerr 0.00467353 larg 0.371094 rarg 0.375
 32: out 80 biased 15; lerr 0.00489479 rerr 0.00027462 larg 0.375 rarg 0.378906
 33: out 79 biased 15; lerr 0.00453439 rerr 0.000583717 larg 0.378906 rarg 0.382812
 34: out 78 biased 15; lerr 0.00425002 rerr 0.000817442 larg 0.382812 rarg 0.386719
 35: out 77 biased 15; lerr 0.0040409 rerr 0.000976562 larg 0.386719 rarg 0.390625
 36: out 76 biased 15; lerr 0.00390625 rerr 0.00106183 larg 0.390625 rarg 0.394531
 37: out 75 biased 15; lerr 0.00384534 rerr 0.00107398 larg 0.394531 rarg 0.398438
 38: out 74 biased 15; lerr 0.00385742 rerr 0.00101372 larg 0.398438 rarg 0.402344
 39: out 73 biased 15; lerr 0.00394179 rerr 0.00088176 larg 0.402344 rarg 0.40625
 40: out 72 biased 15; lerr 0.00409775 rerr 0.000678786 larg 0.40625 rarg 0.410156
 41: out 71 biased 15; lerr 0.00432461 rerr 0.000405468 larg 0.410156 rarg 0.414062
 42: out 70 biased 15; lerr 0.0046217 rerr 6.24637E-05 larg 0.414062 rarg 0.417969
 43: out 70 biased 14; lerr 6.24637E-05 rerr 0.00472478 larg 0.417969 rarg 0.421875
 44: out 69 biased 14; lerr 0.000349583 rerr 0.00426776 larg 0.421875 rarg 0.425781
 45: out 68 biased 14; lerr 0.000830041 rerr 0.00374284 larg 0.425781 rarg 0.429688
 46: out 67 biased 14; lerr 0.00137829 rerr 0.00315063 larg 0.429688 rarg 0.433594
 47: out 66 biased 14; lerr 0.00199374 rerr 0.00249171 larg 0.433594 rarg 0.4375
 48: out 65 biased 14; lerr 0.00267578 rerr 0.00176667 larg 0.4375 rarg 0.441406
 49: out 64 biased 14; lerr 0.00342383 rerr 0.000976086 larg 0.441406 rarg 0.445312
 50: out 63 biased 14; lerr 0.00423733 rerr 0.000120513 larg 0.445312 rarg 0.449219
 51: out 63 biased 13; lerr 0.000120513 rerr 0.00445945 larg 0.449219 rarg 0.453125
 52: out 62 biased 13; lerr 0.000799499 rerr 0.00349816 larg 0.453125 rarg 0.457031
 53: out 61 biased 13; lerr 0.00178341 rerr 0.00247339 larg 0.457031 rarg 0.460938
 54: out 60 biased 13; lerr 0.0028307 rerr 0.00138568 larg 0.460938 rarg 0.464844
 55: out 59 biased 13; lerr 0.00394084 rerr 0.00023553 larg 0.464844 rarg 0.46875
 56: out 59 biased 12; lerr 0.00023553 rerr 0.00439453 larg 0.46875 rarg 0.472656
 57: out 58 biased 12; lerr 0.000976562 rerr 0.00314314 larg 0.472656 rarg 0.476562
 58: out 57 biased 12; lerr 0.0022501 rerr 0.00183069 larg 0.476562 rarg 0.480469
 59: out 56 biased 12; lerr 0.00358461 rerr 0.000457659 larg 0.480469 rarg 0.484375
 60: out 56 biased 11; lerr 0.000457659 rerr 0.00448366 larg 0.484375 rarg 0.488281
 61: out 55 biased 11; lerr 0.000975489 rerr 0.00301265 larg 0.488281 rarg 0.492188
 62: out 54 biased 11; lerr 0.00246829 rerr 0.00148234 larg 0.492188 rarg 0.496094
 63: out 53 biased 11; lerr 0.00402031 rerr 0.000106817 larg 0.496094 rarg 0.5
 64: out 52 biased 11; lerr 0.00563109 rerr 0.00210731 larg 0.5 rarg 0.507812
 65: out 51 biased 11; lerr 0.00345996 rerr 0.00417648 larg 0.507812 rarg 0.515625
 66: out 50 biased 11; lerr 0.00143345 rerr 0.00610301 larg 0.515625 rarg 0.523438
 67: out 48 biased 12; lerr 0.00520152 rerr 0.00219486 larg 0.523438 rarg 0.53125
 68: out 47 biased 12; lerr 0.00349943 rerr 0.00380104 larg 0.53125 rarg 0.539062
 69: out 46 biased 12; lerr 0.00193497 rerr 0.00527137 larg 0.539062 rarg 0.546875
 70: out 44 biased 13; lerr 0.00628347 rerr 0.000789331 larg 0.546875 rarg 0.554688
 71: out 43 biased 13; lerr 0.00502921 rerr 0.00195312 larg 0.554688 rarg 0.5625
 72: out 42 biased 13; lerr 0.00390625 rerr 0.00298721 larg 0.5625 rarg 0.570312
 73: out 41 biased 13; lerr 0.00291271 rerr 0.00389343 larg 0.570312 rarg 0.578125
 74: out 40 biased 13; lerr 0.00204677 rerr 0.00467353 larg 0.578125 rarg 0.585938
 75: out 39 biased 13; lerr 0.00130667 rerr 0.00532924 larg 0.585938 rarg 0.59375
 76: out 38 biased 13; lerr 0.000690699 rerr 0.00586222 larg 0.59375 rarg 0.601562
 77: out 36 biased 14; lerr 0.0062566 rerr 0.000175461 larg 0.601562 rarg 0.609375
 78: out 35 biased 14; lerr 0.00592317 rerr 0.000428823 larg 0.609375 rarg 0.617188
 79: out 34 biased 14; lerr 0.00570878 rerr 0.000564416 larg 0.617188 rarg 0.625
 80: out 33 biased 14; lerr 0.00561191 rerr 0.000583717 larg 0.625 rarg 0.632812
 81: out 32 biased 14; lerr 0.00563109 rerr 0.000488162 larg 0.632812 rarg 0.640625
 82: out 31 biased 14; lerr 0.00576489 rerr 0.000279149 larg 0.640625 rarg 0.648438
 83: out 30 biased 14; lerr 0.00601191 rerr 4.19626E-05 larg 0.648438 rarg 0.65625
 84: out 30 biased 13; lerr 4.19626E-05 rerr 0.00589256 larg 0.65625 rarg 0.664062
 85: out 29 biased 13; lerr 0.00047385 rerr 0.00538852 larg 0.664062 rarg 0.671875
 86: out 28 biased 13; lerr 0.00101522 rerr 0.00477604 larg 0.671875 rarg 0.679688
 87: out 27 biased 13; lerr 0.00166483 rerr 0.00405633 larg 0.679688 rarg 0.6875
 88: out 26 biased 13; lerr 0.00242145 rerr 0.0032306 larg 0.6875 rarg 0.695312
 89: out 25 biased 13; lerr 0.00328389 rerr 0.0023 larg 0.695312 rarg 0.703125
 90: out 24 biased 13; lerr 0.00425098 rerr 0.00126568 larg 0.703125 rarg 0.710938
 91: out 23 biased 13; lerr 0.0053216 rerr 0.000128738 larg 0.710938 rarg 0.71875
 92: out 23 biased 12; lerr 0.000128738 rerr 0.00554953 larg 0.71875 rarg 0.726562
 93: out 22 biased 12; lerr 0.00110974 rerr 0.00424628 larg 0.726562 rarg 0.734375
 94: out 21 biased 12; lerr 0.0024487 rerr 0.00284339 larg 0.734375 rarg 0.742188
 95: out 20 biased 12; lerr 0.0038871 rerr 0.00134187 larg 0.742188 rarg 0.75
 96: out 19 biased 12; lerr 0.00542395 rerr 0.000257287 larg 0.75 rarg 0.757812
 97: out 19 biased 11; lerr 0.000257287 rerr 0.00488281 larg 0.757812 rarg 0.765625
 98: out 18 biased 11; lerr 0.00195312 rerr 0.00312603 larg 0.765625 rarg 0.773438
 99: out 17 biased 11; lerr 0.0037447 rerr 0.00127425 larg 0.773438 rarg 0.78125
 100: out 16 biased 11; lerr 0.00563109 rerr 0.000671612 larg 0.78125 rarg 0.789062
 101: out 16 biased 10; lerr 0.000671612 rerr 0.00426337 larg 0.789062 rarg 0.796875
 102: out 15 biased 10; lerr 0.00271068 rerr 0.00216607 larg 0.796875 rarg 0.804688
 103: out 14 biased 10; lerr 0.00484208 rerr 2.28884E-05 larg 0.804688 rarg 0.8125
 104: out 14 biased 9; lerr 2.28884E-05 rerr 0.00477319 larg 0.8125 rarg 0.820312
 105: out 13 biased 9; lerr 0.00230268 rerr 0.00243701 larg 0.820312 rarg 0.828125
 106: out 12 biased 9; lerr 0.00467248 rerr 1.1444E-05 larg 0.828125 rarg 0.835938
 107: out 12 biased 8; lerr 1.1444E-05 rerr 0.00467353 larg 0.835938 rarg 0.84375
 108: out 11 biased 8; lerr 0.00250271 rerr 0.00210469 larg 0.84375 rarg 0.851562
 109: out 10 biased 8; lerr 0.0051047 rerr 0.000551376 larg 0.851562 rarg 0.859375
 110: out 10 biased 7; lerr 0.000551376 rerr 0.00398129 larg 0.859375 rarg 0.867188
 111: out 9 biased 7; lerr 0.00329393 rerr 0.00118567 larg 0.867188 rarg 0.875
 112: out 9 biased 6; lerr 0.00118567 rerr 0.00564531 larg 0.875 rarg 0.882812
 113: out 8 biased 6; lerr 0.00169516 rerr 0.00271239 larg 0.882812 rarg 0.890625
 114: out 7 biased 6; lerr 0.0046605 rerr 0.000304507 larg 0.890625 rarg 0.898438
 115: out 7 biased 5; lerr 0.000304507 rerr 0.00403259 larg 0.898438 rarg 0.90625
 116: out 6 biased 5; lerr 0.00340469 rerr 0.00088176 larg 0.90625 rarg 0.914062
 117: out 6 biased 4; lerr 0.00088176 rerr 0.00514993 larg 0.914062 rarg 0.921875
 118: out 5 biased 4; lerr 0.00235119 rerr 0.00186722 larg 0.921875 rarg 0.929688
 119: out 4 biased 4; lerr 0.00566562 rerr 0.00149648 larg 0.929688 rarg 0.9375
 120: out 4 biased 3; lerr 0.00149648 rerr 0.00265532 larg 0.9375 rarg 0.945312
 121: out 3 biased 3; lerr 0.00494055 rerr 0.0008372 larg 0.945312 rarg 0.953125
 122: out 3 biased 2; lerr 0.0008372 rerr 0.00324937 larg 0.953125 rarg 0.960938
 123: out 2 biased 2; lerr 0.00440902 rerr 0.000370094 larg 0.960938 rarg 0.96875
 124: out 2 biased 1; lerr 0.000370094 rerr 0.00365258 larg 0.96875 rarg 0.976562
 125: out 1 biased 1; lerr 0.00406783 rerr 9.20338E-05 larg 0.976562 rarg 0.984375
 126: out 1 biased 0; lerr 9.20338E-05 rerr 0.00386801 larg 0.984375 rarg 0.992188
 127: out 0 biased 0; lerr 0.00391391 rerr 0 larg 0.992188 rarg 1

 ... [removed hex data dumping]

max recip 7x7 error at 0.519531: 0.00558472 or 2^-7.4843
max rsqrt 7x7 error at 0.546875: 0.00628347 or  2^-7.31422


On 2020-08-03 1:17 p.m., Bill Huffman wrote:

I should have said that my results are for the 7/7 case.  And it sounds like we're in agreement then.  We probably have the same table.

      Bill

On 8/2/20 9:50 AM, DSHORNER wrote:
EXTERNAL MAIL

This is the link to the revised code that does n by m LUT


https://github.com/David-Horner/recip/blob/master/vrecip.cc

On 2020-08-01 4:51 p.m., David Horner via lists.riscv.org wrote:


501 - 520 of 827