Re: Issue #365 vsetvl{i} x0, x0 instruction forms

David Horner

see the rest of the thread for more context.

On 2020-07-23 2:37 a.m., Andrew Waterman wrote:
It would appear that #5 is a net win for circuitry and a better
formulation of vl unchanged requirements.

It's not just about the cost of the comparators; it's also about avoiding the RAW hazard on the previous value of VL.

The RAW hazard on the previous value of vtype in Krste's proposal is less of a concern, since the previous vtype will usually have been supplied by an immediate operand.  Optimizing for this case, it's straightforward for renamed implementations to maintain a speculative copy of the vtype register in the decode stage.  The same doesn't work for vl, which in most cases was most recently sourced from a register operand.

To clarify for the list:
The RAW (Read after Write) hazard already exists for all vl consumers, specifically all RVV data operations and vl csr read.
PoR rules are crafted so that substantial validation can occur without knowing vl. 
   (e.g. register group alignment given lmul and vr1/vr2/vd )
Never-the-less aggressive ooo will have to carry a tentative vl value for at least sets of RVV instructions.
If that value has changed, in flight ops will potentially need to be rolled-back/synched-to-checkpoint, the new vl supplied and execution resheduled/resumed.

A) The x0,x0 formulation potentially adds this vsetvli variant to those instructions that consume vl.
B) The desire is that this variant can also be eliminated as a writer of vl, which could create the RAW hazzard.

Point of agreement # 1 plus #3 guarantees B.
So, as Krste mentions, for some loss of orthogonality we get a guaranteed vl RAW threat avoidance.

Krste's proposal (check SEW/LMUL invariance) handles the majority of the use cases, trading a vl RAW concern for a vsew/lmul RAW concern.
Unfortunately, vsew/lmul RAW hazards also arise from vsetvl register values.
Fortunately these are infrequent so a brute force stall on vsetvl and/or quiesce might be appropriate.

Quiesce is not an appropriate default remedy for vsetvli x0,x0.
However, quiesce for failed  SEW/LMUL invariance check is very appropriate as it is anticipated to be very rare indeed.
(rare to the point, apparently, that some believe it should not be allowed).

My points to this are
a) ooo exceptions are hardly rare and the mechanism to invoke a failsafe is well understood and triggered in many scenarios.
       Handling this x0,x0 case could be an additional hardship, yes, but not uniquely so nor especially arduous.
b) A full quiesce is not required, and the x0,x0 stall waiting for updated vl can be avoided in virtually all cases by the SEW/LMUL check 
c) the stall on x0,x0 can be asynchronous with other downstream processing iff #1 and #3 are both approved.
      i.e. the x0,x0 instruction will not introduce any further hazard than is already present for concurrent processes that consume vl.
d) setting vill also affects downstream (and perhaps concurrently inflight RVV data operations/instructions).
      condition a) has to be available in any case,
       it is only that the extremely rare condition of vl mismatch will be defered
       potentially invoking more rollback or limiting sync opportunities.
e) even this can be mitigated by tagging vsetvli with the minimal bits required for "VLMIN" as a speculative copy.
f) even if the recovery from SEW/LMUL or vl mismatch is abysmal,
   (i.e. full checkpoint, roll back and quiesce),
   the application can avoid this by using the standard formulation of AVL in rs1 on such machines. 
g) in conclusion not setting vill when vl does not change, even if SEW/LMUL ratio does, need not materially introduce or exasperate any RAW hazards. 

My vote is still with #5.
It is consistent with past practice of deference to simplicity of architecture at the potential expense of (ooo) microarchitecture.
Especially as in this case where  multiple reasonable approaches to mitigate the RAW hazards, at low in-practice performance cost, are possible.

Join to automatically receive all group messages.