Intriguing. I especially appreciate the effort to identify
fundamental aspects of fractional LMUL.
On 2020-03-21 3:46 p.m., Bill Huffman
wrote:
Maybe this has been stated
already, and I just haven't seen it, but it seems like there's
a constraint with fractional LMUL that wasn't there before.
The compiler must ensure that: LMUL >= SEW/ELEN.
As you mention below this
constraint is problematic when VLEN=ELEN, or a low multiple of
VLEN.
I appreciate you stating the
concern formally.
Informally, I've considered that I
would need LMUL>=4 on a minimal machine to make meaningful
use of such operations as [even] vslideup/down1 for SEW=ELEN.
(vslideup and vslidedown even
worse with variable range severely limited)
I think a further refinement of the constraint to incorporate the
effects with other out of lane operations would be very helpful
guidance for developing code that can run across the full spectrum
of allowed designs.
I
think that my concerns for lane crossing with fractional
LMUL came from mentally violating this constraint without
realizing its significance.
Bill
In more detail, a number of
constraints, and their origins must be:
- VLEN >=
ELEN from the spec
- VLMAX = LMUL*VLEN/SEW from
the spec
- VLMAX >=
1 seems obvious
- LMUL*VLEN/SEW >=
1 from #2 and #3
- LMUL >=
SEW/VLEN from #4
- LMUL >=
SEW/ELEN if we are not to increase minimum
VLEN in #1
We may need to address this with
target classes, similar to UNIX vs embeded.
Or at least note how these
constraints affect each.
#6 is put into compiler visible terms
(without VLEN or VLMAX). It is a new constraint, I think, since
always before LMUL >= 1 meant it was forced to be satisfied
because of the existing constraint that ELEN >= SEW.
Not only is it a constraint, but
violating it would bring no advantage. The computation would
use registers just as efficiently with a larger LMUL. For
example, if there were only ELEN/2 and ELEN elements and
widening/narrowing operations between them, LMUL < 1/2
would not reduce the number of registers used, but would
reduce VLMAX for longer vectors without providing any
advantage for shorter vectors.
I think the concern here may be
addressed by modifications to the fractional LMUL model.
In particular I have another
proposal (not surprised? ;)
I hope to ensure that that proposal addresses these concerns.
Thank you again for your insights.