see the rest of the thread for more context.
On 2020-07-23 2:37 a.m., Andrew
Waterman wrote:
It
would appear that #5 is a net win for circuitry and a better
formulation of vl unchanged requirements.
It's not just about the cost of the comparators; it's also
about avoiding the RAW hazard on the previous value of VL.
The RAW hazard on the previous value of vtype in Krste's
proposal is less of a concern, since the previous vtype will
usually have been supplied by an immediate operand. Optimizing
for this case, it's straightforward for renamed implementations
to maintain a speculative copy of the vtype register in the
decode stage. The same doesn't work for vl, which in most cases
was most recently sourced from a register operand.
To clarify for the list:
The RAW (Read after Write) hazard already exists for all vl
consumers, specifically all RVV data operations and vl csr read.
PoR rules are crafted so that substantial validation can occur
without knowing vl.
(e.g. register group alignment given lmul and vr1/vr2/vd )
Never-the-less aggressive ooo will have to carry a tentative vl
value for at least sets of RVV instructions.
If that value has changed, in flight ops will potentially need to be
rolled-back/synched-to-checkpoint, the new vl supplied and execution
resheduled/resumed.
A) The x0,x0 formulation potentially adds this vsetvli variant to
those instructions that consume vl.
B) The desire is that this variant can also be eliminated as a
writer of vl, which could create the RAW hazzard.
Point of agreement # 1 plus #3 guarantees B.
So, as Krste mentions, for some loss of orthogonality we get a
guaranteed vl RAW threat avoidance.
Krste's proposal (check SEW/LMUL invariance) handles the majority of
the use cases, trading a vl RAW concern for a vsew/lmul RAW concern.
Unfortunately, vsew/lmul RAW hazards also arise from vsetvl register
values.
Fortunately these are infrequent so a brute force stall on vsetvl
and/or quiesce might be appropriate.
Quiesce is not an appropriate default remedy for vsetvli x0,x0.
However, quiesce for failed SEW/LMUL invariance check is very
appropriate as it is anticipated to be very rare indeed.
(rare to the point, apparently, that some believe it should not be
allowed).
My points to this are
a) ooo exceptions are hardly rare and the mechanism to invoke a
failsafe is well understood and triggered in many scenarios.
Handling this x0,x0 case could be an additional hardship,
yes, but not uniquely so nor especially arduous.
b) A full quiesce is not required, and the x0,x0 stall waiting for
updated vl can be avoided in virtually all cases by the SEW/LMUL
check
c) the stall on x0,x0 can be asynchronous with other downstream
processing iff #1 and #3 are both approved.
i.e. the x0,x0 instruction will not introduce any further
hazard than is already present for concurrent processes that consume
vl.
d) setting vill also affects downstream (and perhaps concurrently
inflight RVV data operations/instructions).
condition a) has to be available in any case,
it is only that the extremely rare condition of vl mismatch
will be defered
potentially invoking more rollback or limiting sync
opportunities.
e) even this can be mitigated by tagging vsetvli with the minimal
bits required for "VLMIN" as a speculative copy.
f) even if the recovery from SEW/LMUL or vl mismatch is abysmal,
(i.e. full checkpoint, roll back and quiesce),
the application can avoid this by using the standard formulation
of AVL in rs1 on such machines.
g) in conclusion not setting vill when vl does not change, even if
SEW/LMUL ratio does, need not materially introduce or exasperate any
RAW hazards.
My vote is still with #5.
It is consistent with past practice of deference to simplicity of
architecture at the potential expense of (ooo) microarchitecture.
Especially as in this case where multiple reasonable approaches to
mitigate the RAW hazards, at low in-practice performance cost, are
possible.