Point of agreement #1 - x0,x0 variant should not change vl.
I believe we are also in agreement on
#2 - if vl would change because of a SEW/LMUL change vill should be set.
#3) If vill is set should vl remain unchanged? (I vote for yes).
#4) Should potential change of vl set vill? Currently that condition is equivalent to a SEW/LMUL ratio change.
4a) in all cases? even if vl is zero? even if vl is 1? (this rule has fringe cases).
4b) what do we do when another vtype parameter is added that also would potentially change vl?
What is the likely formulation of such an algorithm?
In general something comparable to a simple ratio would be inadequate.
I believe this SEW/LMUL formulation is not future proof.
#5) Why not defined the x0,x0 variant that doesn't change vl as succeeding if vl doesn't change?
Only setting vill if the resultant new-vl does not match the previous vl.
(Point #3 is still relevant, but there are no longer any corner cases as in 4a and 4b).
Krste below expresses some reasons that lean towards SEW/LMUL invariance rather than vl invariance be the determinant for setting vill.
Specifically, comparing vl to the new-vl requires reading the old vl and that is potentially expensive, why not avoid the read of vl altogether?
One approach is based on #4.
Instead read previous(current?) vlmul and vsew, calculate ratio, compare with new ratio and set vill if different.
We can avoid vlmul/vsew read by retaining the current SEW/LMUL values (or ratio)
(can be stored locally, only 6 bits for vsew and vlmul)
and compare that to the new SEW/LMUL ratio.
What of advocating for #5 - what is the overhead here?
A simplistic approach can read vl and push it through the existing circuitry,
except when the calculated MAXVL exceeds the calculated vl set vill
otherwise leave the current vl alone (or overwrite it with itself, whichever).
For simple designs there is a simple implementation that can further be optimized by setting vill on a slow path.
Alternatively we can use the SEW/LMUL optimization approach:
We can store the vl info locally.
For standard V minimum (log2(VLEN*8); log2(128) + log2(8) = 7 + 3 = 10 bits,
with an additional bit per doubling of VLEN) .
We compare the calculated vl with that.
This compares favourably to #4 optimization.
But we can do better than that.
We only need compare calculated MAXVL
(comparable computational cost to SEW/LMUL ratio)
which is normally done anyway (so can leverage existing circuitry)
and compare that to locally stored vl information.
MAXVL varies from 1 (in the worst case) to VLEN*8.
As MAXVL is always a power of 2 the number of bits to store is log2(log2(VLEN*8)) or 4 bits for up to VLEN=2K.
Thus 4 bits for the locally saved vl information which is the minimal MAXVL for current vl.
(V minimum is ELEN=64 and VLEN=128 which is among the case for which 3 bits suffice)
I'm not a circuit guru, but "MINVL" from vl is inexpensive to calculate,
especially as it also does not need to be on the critical path for non- x0,x0 variants
that are the only ones that need store vl info locally.
It would appear that #5 is a net win for circuitry and a better formulation of vl unchanged requirements.
#5 now has my vote.
I provide further analysis within the replies below.
On 2020-07-22 8:21 p.m., Bill Huffman wrote:
I agree with Krste's support for Guy's proposal here.thanks for the response.
Loops withan Important and valid point that I also support.
BillI disagree with characterizing this as the main issue.
I agree that it is an important consideration.
The pivotal question as I see it is, what action the instruction should take when vl would change.
PoR says change it, as any other vsetvl variant would.
Until such a use is discovered.I don't subscribe to "field of dreams" approach. I tried
I don't disagree that it is an important consideration, only that it is secondary.
If explicitly disallowing the "apparently useless" behaviour itself causes substantial cost, we can live with a meaningless instruction formulation.
RVI frequently allows formulations that lacking a clear and compelling use case because
it is an artifact of the general useful operation, that to exclude it would increase overhead (instruction decode , etc.)
e.g. bne rs1,rs2,-2, branch to within the same instruction which, depending upon rs1/rs2 values can be a C.BNEZ infinite loop if a specific register (x8 through x15) is non-zero.
The same could hold true here. In my opinion, this is substantially why (this was main part of my reasoning), the current PoR was adopted.
Bill expresses succinctly:
Loops with multiple element widths are likely to have more non-vl-changingIt is precisely due to the nature of its expected (lack of) use,
that in other situations we would disregard the low use and esoteric case as harmless.
Consider the reluctance to reserve RVV simm5/rs1=0 formulations that match an existing simpler instruction.
However, in this case I agree that the formulation x0,x0 is valuable to use effectively, solely because vsetvli is so important.
Even as a secondary consideration, lack of usefulness is disturbing for a dominant feature.
That is PoR and I believe there is now general agreement (3 to zero so far) that changing vl is not the desired behaviour.On the negative side, a microarchitecture will have to assume
So, Point of agreement #1 - x0,x0 variant should not change vl.
Let's put this into perspective - all other vsetvl variants write vl, that is the primary purpose, it is explicitly in the name.Even for simple machines, this will probably cause
We are proposing an optimization for what we anticipate (reasonably) to be a common used case, as Bill stated.
The potential is to save some flops by avoiding the write (and delays caused by its cascade/flow/synch effects) .
Agreed. Another check for the Point of agreement #1For machines with renaming, it can
Agreed. Another check for the Point of agreement #1The (certification)verification cost alone is a big
Agreed. The above argument restated as for the Point of agreement #1
Expressed as not as persuasive, but at least a fraction check for the Point of agreement #1.The dynamic debug aspect, I agree is relatively minor, but given the
But we cross a line to believe the objective is "that bugs are caught".
What bug is it that we believe we can design hardware to catch?
As a database analyst, I told the application developers with whom I worked
that their compiled and running program was not "wrong".
It was doing just fine exactly what they directed it to do.
It was the perfect program for a problem other than the one they wanted to solve.
Ditto for bugs. Behaviour that one programer wants to avoid another may intend.
We cannot solve bugs in hardware. CICS attempts to do so are infamous.
All we can do is provide operations that do exactly as they are stipulated, ideally with no corner cases, with a simple conceptual definition.
Enforcing a perceived good software/"expected use" policy is rarely directly achievable or desirable.
Keep SEW/LMUL ratio invariant is a policy/"expected use case".
I contend there are deliberate exceptions to this policy, or, in the alternative, at minimum the policy has a limited domain.
If there are exceptions or the domain is limited it is not a good characteristic to enforce, even as a special instruction formulation.
Rather, a better characteristic to enforce is the vl in-variance in a special formulation.
It follows from the instruction formulation in which no explicit AVL is supplied (X0).
It is the underlying characteristic in the checks above and below.
Krste| I wholeheartedly agree with resolving on the mailing list.On Wed, 22 Jul 2020 09:02:03 -0400, "David Horner" <ds2horner@...> said: