Re: More thoughts on Git update (8a9fbce) Added fractional LMUL

Nick Knight

Hi Krste,

On Sat, Apr 25, 2020 at 11:51 PM Krste Asanovic <krste@...> wrote:
Could consider later adding "cast" instructions that convert a vector
of N SEW=8 elements into a vector of N/2 SEW=16 elements by
concatenating the two bytes (and similar for other combinations of
source and dest SEWs).  These would be a simple move/copy on an
SLEN=VLEN machine, but would perform a permute on an SLEN<VLEN machine
with bytes crossing between SLEN sections (probably reusing the memory
pipeline crossbar in an implementation, to store the source vector in
its memory format, then load the destination vector in its register
format).  So vector is loaded once from memory as SEW=8, then cast
into appropriate type to extract other fields.  Misaligned words might
need a slide before casting.

I have recently learned from my interactions with EPI/BSC folks that cryptographic routines make frequent use of such operations. For one concrete example, they need to reinterpret an e32 vector as e64 (length VL/2) to perform 64-bit arithmetic, then reinterpret the result as e32. They currently use SLEN-dependent type-punning; this example only seems to be problematic in the extremal case SLEN == 32.

For these types of problems, it would be useful to have a "reinterpret_cast" instruction, which changes SEW and LMUL on a register group as if SLEN==VLEN. For example,

# SEW = 32, LMUL = 4
v_reinterpret v0, e64, m1

would perform "register-group fission" on [v0, v1, v2, v3], concatenating (logically) adjacent pairs of 32-bit elements into 64-bit elements (up to, or perhaps ignoring VL). And we would perform the inverse operation, "register-group fusion", as follows:

# SEW = 64, LMUL = 1
v_reinterpret v0, e32, m4

Like you suggested, this is implementable by (sequences of) stores and loads; the advantage is it optimizes for the (common?) case of SLEN == VLEN. And there probably are optimizations for other combinations of SLEN, VLEN, SEW_{old,new}, and LMUL_{old,new}, which could also be hidden from the programmer. Hence, I think it would be useful in developing portable software.

Nick Knight

Join { to automatically receive all group messages.