Re: Smaller embedded version of the Vector extension

Guy Lemieux

On Thu, Jun 3, 2021 at 1:08 PM Zalman Stern <zalman@...> wrote:

If the minimum VLEN is at least 128-bits, one can translate NEON/SSE intrinsics directly without having to have every vector instruction dominated by a loop over the vector length.
that's pretty handy, actually. I'm not sure it should be a property of
the V spec itself, rather it could be a requirement that software
which is translated in this method could require an implementation
with VLEN >= 128 else it would fall back to a scalar translation.

for RVV, I was pretty comfortable with the requirement that RVV
require VLEN >= 128 before this whole thread started. it seemed like a
good length (4 x 32b words) which matched other SIMD instructions sets
as you have noted.

with this post, Tariq indicated that he wants to reduce the amount of
state. from this, I started to think it might be better to shorten
this to VLEN >= 64 or perhaps VLEN >= max(XLEN,FLEN) rather than
reducing the number of named registers [*]

Regarding performance, VLEN=32 or 64 seems ridiculously small until
you consider register grouping. The RVV-lite profile that I'm
proposing would require SEW/LMUL=8, so VLMAX=4 for VLEN=32, and
VLMAX=8 for VLEN=64. These are reasonable vector lengths to get
reasonable amounts of parallelism.

[*] why not just restrict small implementations to 16 or 8 named
registers with VLEN >= 128? it is a consequence of how RVV has chosen
to implement widening and narrowing instructions, which require using
register grouping. in my RVV-lite profile, I considered eliminating
register groups entirely, but this would require some other way to do
widening/narrowing which would not be compatible with RVV. with
SEW/LMUL=32/4, a common setting, there are only 8 vector registers
available. to save register file area, restricting this to just 4
vector registers seems too restrictive. instead, I think relaxing
VLMAX >= 64 achieves the same effect (halving the required register
file size) without requiring such a restriction.


Join { to automatically receive all group messages.