#### Re: Vector Task Group minutes 2020/5/15

David Horner

for those not on Github I posted this to #461:

I gather what was missing from this were examples.
I prefer to consider clstr as a dynamic parameter, that some implementations will use a range of values.

However, for the sake of examples we can consider the cases where CLSTR=32.

VLEN=256b, SLEN=128, vl=8, CLSTR=32

Byte     1F1E1D1C1B1A19181716151413121110 F E D C B A 9 8 7 6 5 4 3 2 1 0

7 6 5 4                         3 2 1 0 SEW=8b

7   6   3   2                   5   4   1   0 SEW=16b

7       5       3       1       6       4       2       0 SEW=32b

VLEN=256b, SLEN=64, vl=13, CLSTR=32

Byte     1F1E1D1C1B1A19181716151413121110 F E D C B A 9 8 7 6 5 4 3 2 1 0

C         B A 9 8         7 6 5 4         3 2 1 0 SEW=8b SEW=8b

7       3       6       2       5       1       4       0 SEW=32b

B               A               9       C       8  @ reg+1

By inspection unary and binary single SEW operations do not affect order.
However, for a widening operation, EEW=16 and 64 respectively which will yield:

VLEN=256b, SLEN=128, vl=8, CLSTR=32

Byte     1F1E1D1C1B1A19181716151413121110 F E D C B A 9 8 7 6 5 4 3 2 1 0

7   6   3   2                   5   4   1   0 SEW=16b

7       5       3       1       6       4       2       0 SEW=32b

3               1               2               0 SEW=64b

7               5               6               4  @ reg+1

Narrowing work in reverse.
When SLEN=VLEN clstr is irrelevant and effectively infinite as there is no other SLEN group in which to advance, so the current SLEN chunk has to be used (in the round-robin fashion.
Thank you for the template to use.
I don’t think SLEN = 1/4 VLEN has to be diagrammed.
And of course, store also works in reverse of load.

@David-Horner

On 2020-05-26 11:17 a.m., David Horner via lists.riscv.org wrote:

On Tue, May 26, 2020, 04:38 , <krste@...> wrote:

.

----------------------------------------------------------------------

I think David is trying to find a design where bytes are contiguous
within ELEN (or some other unit < ELEN) but then striped above that to
avoid casting.
Correct
I don't think this can work.

First, SLEN has to be big enough to hold ELEN/8 * ELEN words.
I don't understand the reason for this constraint.
E.g.,
when ELEN=32, you can pack four contiguous bytes in ELEN,but then
require SLEN to have space for four ELEN words to avoid either wires
crossing SLEN partitions, or requiring multiple cycles to compute
small vectors (v0.8 design).
Still not got it.

VLEN=256b, ELEN=32b, SLEN=(ELEN**2)/8,
Byte     1F1E1D1C1B1A19181716151413121110 F E D C B A 9 8 7 6 5 4 3 2 1 0
7 6 5 4                         3 2 1 0 SEW=8b
7       6       5       4       3       2       1       0 SEW=ELEN=32b
clstr is not a  count but a size.
When CLSTR is 32 this last row is
7       5       3       1       6       4       2       0 SEW=ELEN=32b
If I understood your diagram correctly.

See #461. It is effectively what SLEN was under v0.8. But potentially configurable.

I'm doing this for my cell phone. I'll work it up better when I'm at my laptop

Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.