Re: Issue #365 vsetvl{i} x0, x0 instruction forms

Krste Asanovic

The main issue is whether the current PoR has any useful purpose when
vl changes. I don't subscribe to "field of dreams" approach. I tried
to find some scenarios hoping there would be some useful cases, but
struggled to come up with anything substantial with current PoR.
There are certainly some possible alternate vl-changing behaviors that
could be useful, but those would be a different instruction. Unless
there is a clear use, the additional vl-modifying behavior in PoR
cannot really be stated as a positive but only a curiosity.

On the negative side, a microarchitecture will have to assume
vl will be read and written by this instruction, even if it almost
never changes. Even for simple machines, this will probably cause
some extra flops to be clocked. For machines with renaming, it can
require a new physical vl is allocated early in machine even if vl
rarely changes. There might be microarch techniques to recycle vl
regs more quickly once known not to change, but would be much simpler
not to have to deal with this. The verification cost alone is a big
negative for a feature that could be rarely/never used.

These instructions will likely be common in loops dealing with
multiple element widths (a common loop will have only one vsetvli that
changes vl and potentially many that manipulate SEW/LMUL), and so
optimizing their implementation is important. Having a hardware
instruction that is "change vtype but not vl, or error" is clearly
useful I think.

The dynamic debug aspect, I agree is relatively minor, but given the
prevalence of "change vtype but not vl" instructions, it is only a
positive that bugs are caught even if not always with clear
determination of problematic instruction (though I guess it will very
rare that the bug will be difficult to find even if only trap on use).

Even though I view dynamic debug as a minor benefit, I think even that
minor concrete benefit outweighs the unknown abstract benefit of
"change vl" behavior, unless there are some great use cases for the
existing PoR scheme that we've missed.

But again, the implementation saving from not having to worry about
dynamic vl changes for these instructions to me far outweighs the
other issues.


On Wed, 22 Jul 2020 09:02:03 -0400, "David Horner" <ds2horner@...> said:
| I wholeheartedly agree with resolving on the mailing list.
| This should be the rule not exception.

| On 2020-07-21 11:58 p.m., Krste Asanovic wrote:
|| I want to bring this to group's attention as I think I've convinced
|| myself that Guy's suggestion is the correct path to follow, i.e.,
|| vsetvli x0, x0, imm
|| will raise vill if the new SEW'/LMUL' ratio is not the same as the old
|| SEW/LMUL implying vl might change. Similarly for vsetvl version.
| My considerations for allowing vl to change were

| a) having a compelling reason to change PoR.
|      vsetvl[i]is extremely important to RVV success.
|      It deserves deep scrutiny.
|      Challenging each and every change,
|     as well as proposing any plausible enhancement
|     are equally important to get this feature,
|     more so than others, right(tm).

| b) tracking assemblers and compilers could present warnings.
|      Part of my support was my bias towards encouraging vl tracking
| support.
|      Tracking vl in code has substantial benefits beyond a replacement
| for this x0,x0 behaviour.
|      I believe RVV success and adoption will be substantially hampered
| without it.I believe that ultimately IOT machines will benefit from RVV
|          if we continue to emphasize  simplicity  the design.
|      It is however specific to RVV.
|      So marginal hardware support that appears to mitigate a need for
| vl tracking gets a check in the negative column.

| c) A perceived simplicity of PoR for minimal designs.
|      I am biased toward ensuring simple machines can efficiently
| support for RVV.

|      Initial uptake is likely to be in the application/HPC domain, but
|      I believe that ultimately IOT machines will benefit from RVV
|          if we continue to emphasize  simplicity  the design.

| d) Setting vill is excellent as a means to avoid trap behaviour.
|      however it requires explicit check after vtype setting ops.
|      Opportunistic approaches will rely on the subsequent fault.
|      This situation is theoretically impossible to statically backward
| trace.
|      A given RVV data instruction could be branched to from anywhere,
|          conditional execution could have executed any vsetvl instruction
|          with virtually any rs1 value.
|      This biases me away from setting vill, in the x0,x0 case setting
| vl avoid vill set.
|      However, in practice branching into a loop will be errant
| behaviour and
|         RVV data instructions will be paired with a vsetvli instruction.
|      My paranoia causes me this too heavily at times. (.... reweighing
| risks)

| e)  in the x0,x0 formulation, vsetvli cannot determine from immediate
| parsing alone vill state.
|      we have strived to ensure the immediate format will meet virtually
| all in loop use cases.
|      Ideally, vsetvl is reserved for context switch (and custom)
| situations.
|      I considered x0,x0 a punt to vsetvl (potentially slow) path to
| allow for the immediate form optimization
|      (i.e. no vill setting considerations after parse) .
|      However, reweighing the benefit of retaining vl and requiring a
| late setting of vill.
|      Given vill setting can always be performed on a slow path
|      with little real impact to normal code ....  reweighing risks.

|| Apart from the debugging motivation that Guy presented,
| see my point d.
|| I would add
|| that this definition effectively removes any read or write of vl from
|| the instruction, possibly removing hazards and simplifying dependency
|| tracking and relieving an OoO machine from providing a new rename
|| register for vl (might still need for vtype).
| this does not talk to my point c.
|| I could not find any non-esoteric use for the vl-trimming behavior of
|| the current PoR for larger SEW/LMUL,
| I've found coders and compiler writers collectively more ingenious than I,
|  not only more eyes in free software but a spectrum of inner-eye
| perceptions and mindsets.

| So although relevant to the discussion, in the negative it is not
| compelling as a benefit.
|| so given these benefits I move we
|| adopt the "sets vill for non-iso SEW/LMUL" meaning.

|| The circuit has
|| to calculate (vsew_new-vlmul_new)!=(vsew_old-vlmul_old) to determine
|| vill, but now never needs to read
| I disagree with this behaviour. increasing VLMAX does not invalidate
| current vl and should not" raise an exception" even indirectly.
| If we are needing a warning , let assembler/compilers do it note b above.

| I also disagree that we always set vill if VLMAX reduced but vl is still
| < newVLMAX.
| Only if the ratio changes do we need to read vl, so in the frequent case
| I agree vl read can be avoided.
| To avoid a vl read
|| or write vl.
| My principle is hardware should not attempt to debug or correct software.
| Although hardware developers may believe a specific
| validation/verification facility will be useful to programmers (SEW/LMUL
| in-variance checking)
| such "policy" should not be imposed but rather a means to electively
| support such a policy be provided.
| Setting vill when original vl cannot be maintained is valid, enforcing
| an invariance policy is not.
|| ...
|| As a general optimization guide, software should endeavor to use this
|| form instead of passing in AVL to avoid the vl update when not
|| necessary.
| I agree.
| This is what was envisioned by providing x0,x0.
| Further, this encoding implies an intent which makes code clearer.
| Someone doing tricks needs to add a comment.

| I'm leaning to accepting the proposal as I amended.
|| I hope this is one we can resolve on the mailing list to save time in
|| the next meeting.
| as do I.
|| Krste


Join { to automatically receive all group messages.