Re: Vector Task Group minutes 2020/5/15

David Horner

that may be

On 2020-05-19 7:14 p.m., Bill Huffman wrote:
I believe it is provably not possible for our vectors to have more than
two of the following properties:
For me the definitions contained in 1,2 and 3 need to be more rigorously defined before I can agree that the constraints/behaviours  described are provably inconsistent on aggregate..

1. The datapath can be sliced into multiple slices to improve wiring
such that corresponding elements of different sizes reside in the
same slice.
2. Memory accesses containing enough contiguous bytes to fill the
datapath width corresponding to one vector can spread evenly
across the slices when loading or storing a vector group of two,
four, or eight vector registers.
This one is particularly difficult for me to formalize.
When vl = vlen * lmul, (for lmul 2,4 or 8)  then cache lines can be requested in an order such that when they arrive corresponding segments can be filled.
So, I'm not sure if the focus here is an efficiency concern?
3. The operation corresponding to storing a register group at one
element width and loading back the same number of bytes into a
register group of the same size but with a different element width
results in exactly the same register position for all bytes.
What we can definitely prove is that a specific design has specific characteristics and eliminates other characteristics.
I agree that the current design has the characteristics you describe.

However, for #3, I appears to me that a facility that clusters elements of smaller than a certain size still allows behaviours 1 and 2.
Further,for element lengths up to that cluster size in-register order matches the in-memory order.

The SLEN solution we've had for some time allows for #1 and #2. We're
discussing requiring "cast" operations in place of having property #3.

I wonder whether we should look again at giving up property #2 instead.
I also agree reconsidering #2
It would cost additional logic in wide, sliced datapaths to keep up
memory bandwidth.
Here I believe is where you introduce efficacy in implementation.
Once implementation design considerations are introduced the proof becomes much more complex;
Compounded by objectives and technical tradeoffs and less a mathematics rigor issue .

But the damage might be less than requiring casts and
the potential of splitting the ecosystem?
I also agree with you that reconsidering #2 can lead to conceptually simpler designs that perhaps will result in less eco fragmentation.
However, anticipating a communities response to even the smallest of changes is crystal ball material.

There are many variations and approaches still open to us to address in-register and in-memory order agreement, and to address widening approaches (in particular, interleaving or striping with generalized SLEN parameters).

I'm still waiting on the proposed casting details. If that resolves all our concerns, great.

In the interim I believe it may be worthwhile exercises to consider equivalences of functionality.

Specifically, vertical stripping vs horizontal interleave for widening ops, in-register vs in-memory order for element width alignment.
I hope that the more we identify the easier it will be to compare them and evaluate trade-offs.

I also think it constructive to consider big-endian vs little-endian with concerns about granularity (inherent in big endian and obscured with little-endian (aligned vs unaligned still relevant))


On 5/15/20 11:55 AM, Krste Asanovic wrote:

Date: 2020/5/15
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of Attendees: ~20
Current issues on github:;!!EHscmS1ygiU1lA!W3LXrGwuFwNIJ12NX5xQnmMbzk4zgzIDO39xVFEgrQGQSggvT8Zg9M2ElNRv61w$

Issues discussed:

# MLEN=1 change

The new layout of mask registers with fixed MLEN=1 was discussed. The
group was generally in favor of the change, though there is a proposal
in flight to rearrange bits to align with bytes. This might save some
wiring but could increase bits read/written for the mask in a

#434 SLEN=VLEN as optional extension

Most of the time was spent discussing the possible software
fragmentation from having code optimized for SLEN=LEN versus
SLEN<VLEN, and how to avoid. The group was keen to prevent possible
fragmentation, so is going to consider several options:

- providing cast instructions that are mandatory, so at least
SLEN<VLEN code runs correctly on SLEN=VLEN machines.

- consider a different data layout that could allow casting up to ELEN
(<=SLEN), however these appear to result in even greater variety of
layouts or dynamic layouts

- invent a microarchitecture that can appear as SLEN=VLEN but
internally restrict datapath communication within SLEN width of
datapath, or prove this is impossible/expensive

# v0.9

The group agreed to declare the current version of the spec as 0.9,
representing a clear stable step for software and implementors.

Join to automatically receive all group messages.