Re: Vector Task Group minutes 2020/5/15
Guy Lemieux
The precise data layout pattern does not matter.
toggle quoted message
Show quoted text
What matters is that a single distribution pattern is agreed upon to avoid fragmenting the software ecosystem. With my additional restriction, the load/store side of an implementation is greatly simplified, allowing for simple implementations. The main drawback of my restriction is how to avoid the overhead of the cast instruction in an aggressive implementation? The cast instruction must rearrange data to translate between LMUL!=1 and LMUL=1 data layouts; my proposal requires these casts to be executed between any load/stores (which always assume LMUL=1) and compute instructions which use LMUL!=1. I think this can sometimes be done for "free" by carefully planning your compute instructions. For example, a series of vld instructions with LMUL=1 followed by a cast to LMUL>1 to the same register group destination can be macro-op fused. I don't think the same thing can be done for vst instructions, unless it macro-op fuses a longer sequence consisting of cast / vst / clear register group (or some other operation that overwrites the cast destination, indicating the cast is superfluous and only used by the stores). Guy On Wed, May 27, 2020 at 10:13 AM David Horner <ds2horner@...> wrote:
|
|