Re: Vector TG Meeting tomorrow
Jan Wassenberg
A topic to discuss: lower bound on VLEN. The upper bound is helpful but even VL-agnostic code sometimes wants at least 128 bits. Example: N parallel instances of AES (16 bytes each), or N 128-bit results from 64x64 normal or carryless multiplication. We can get this already (assuming SEW_LMUL1MAX = 64) by setting LMUL=2, but it seems like a poor tradeoff that software should halve the number of registers/groups, just so that hardware could theoretically have single-element vectors. Can we mandate VLEN >= 2*SEW_LMUL1MAX, perhaps in a profile? That would help software :) BTW, are we intending to have the same binaries work on different implementations? It seems the only way to discover SEW_LMUL1MAX is to try various SEW/LMUL and check for vill. Because LMUL is baked into the intrinsic function name, software that wants portable binaries would have to recompile all vector code for LMUL=1,2,4,8, and then pick the first one that works. That's very burdensome, a profile guaranteeing SEW_LMUL1MAX = 64 or at least LMUL2MAX = 64 would also help a lot. On Fri, Jul 9, 2021 at 6:58 AM Krste Asanovic <krste@...> wrote: We’ll meet tomorrow to see if there are any remaining concerns before going Into public review, |
|