Re: Fractional LMUL Constraint
Intriguing. I especially appreciate the effort to identify fundamental aspects of fractional LMUL.
Maybe this has been stated already, and I just haven't seen it, but it seems like there's a constraint with fractional LMUL that wasn't there before. The compiler must ensure that: LMUL >= SEW/ELEN.
As you mention below this constraint is problematic when VLEN=ELEN, or a low multiple of VLEN.
I appreciate you stating the concern formally.
Informally, I've considered that I
would need LMUL>=4 on a minimal machine to make meaningful
use of such operations as [even] vslideup/down1 for SEW=ELEN.
(vslideup and vslidedown even
worse with variable range severely limited)
I think a further refinement of the constraint to incorporate the effects with other out of lane operations would be very helpful guidance for developing code that can run across the full spectrum of allowed designs.
I think that my concerns for lane crossing with fractional LMUL came from mentally violating this constraint without realizing its significance.
In more detail, a number of constraints, and their origins must be:
- VLEN >= ELEN from the spec
- VLMAX = LMUL*VLEN/SEW from the spec
- VLMAX >= 1 seems obvious
- LMUL*VLEN/SEW >= 1 from #2 and #3
- LMUL >= SEW/VLEN from #4
- LMUL >= SEW/ELEN if we are not to increase minimum VLEN in #1
We may need to address this with target classes, similar to UNIX vs embeded.
Or at least note how these constraints affect each.#6 is put into compiler visible terms (without VLEN or VLMAX). It is a new constraint, I think, since always before LMUL >= 1 meant it was forced to be satisfied because of the existing constraint that ELEN >= SEW.
Not only is it a constraint, but violating it would bring no advantage. The computation would use registers just as efficiently with a larger LMUL. For example, if there were only ELEN/2 and ELEN elements and widening/narrowing operations between them, LMUL < 1/2 would not reduce the number of registers used, but would reduce VLMAX for longer vectors without providing any advantage for shorter vectors.
I think the concern here may be addressed by modifications to the fractional LMUL model.
In particular I have another proposal (not surprised? ;)
I hope to ensure that that proposal addresses these concerns.
Thank you again for your insights.