Re: Vector Task Group minutes 2020/5/15

Guy Lemieux

The precise data layout pattern does not matter.

What matters is that a single distribution pattern is agreed upon to
avoid fragmenting the software ecosystem.

With my additional restriction, the load/store side of an
implementation is greatly simplified, allowing for simple

The main drawback of my restriction is how to avoid the overhead of
the cast instruction in an aggressive implementation? The cast
instruction must rearrange data to translate between LMUL!=1 and
LMUL=1 data layouts; my proposal requires these casts to be executed
between any load/stores (which always assume LMUL=1) and compute
instructions which use LMUL!=1. I think this can sometimes be done for
"free" by carefully planning your compute instructions. For example, a
series of vld instructions with LMUL=1 followed by a cast to LMUL>1 to
the same register group destination can be macro-op fused. I don't
think the same thing can be done for vst instructions, unless it
macro-op fuses a longer sequence consisting of cast / vst / clear
register group (or some other operation that overwrites the cast
destination, indicating the cast is superfluous and only used by the


On Wed, May 27, 2020 at 10:13 AM David Horner <ds2horner@...> wrote:

This is v0.8 with SLEN=8.

Join { to automatically receive all group messages.