On Thu, Jun 3, 2021 at 1:08 PM Zalman Stern <zalman@...> wrote:
that's pretty handy, actually. I'm not sure it should be a property of
the V spec itself, rather it could be a requirement that software
which is translated in this method could require an implementation
with VLEN >= 128 else it would fall back to a scalar translation.
for RVV, I was pretty comfortable with the requirement that RVV
require VLEN >= 128 before this whole thread started. it seemed like a
good length (4 x 32b words) which matched other SIMD instructions sets
as you have noted.
with this post, Tariq indicated that he wants to reduce the amount of
state. from this, I started to think it might be better to shorten
this to VLEN >= 64 or perhaps VLEN >= max(XLEN,FLEN) rather than
reducing the number of named registers [*]
Regarding performance, VLEN=32 or 64 seems ridiculously small until
you consider register grouping. The RVV-lite profile that I'm
proposing would require SEW/LMUL=8, so VLMAX=4 for VLEN=32, and
VLMAX=8 for VLEN=64. These are reasonable vector lengths to get
reasonable amounts of parallelism.
[*] why not just restrict small implementations to 16 or 8 named
registers with VLEN >= 128? it is a consequence of how RVV has chosen
to implement widening and narrowing instructions, which require using
register grouping. in my RVV-lite profile, I considered eliminating
register groups entirely, but this would require some other way to do
widening/narrowing which would not be compatible with RVV. with
SEW/LMUL=32/4, a common setting, there are only 8 vector registers
available. to save register file area, restricting this to just 4
vector registers seems too restrictive. instead, I think relaxing
VLMAX >= 64 achieves the same effect (halving the required register
file size) without requiring such a restriction.