Re: Vector Task Group minutes 2020/5/15

David Horner

By the way, this is similar to the "prefix" proposal that applies the transform to selected source and/or destinations.

The cost is reduced as the "transform" is occurring while the operation is also occurring,
  -   on SLEN= machines these "prefix" are nops
  - on SLEN< micro archs the specific combination of transform and operation can be optimized - especially for high use combinations.

The cost is further reduced for some combinations as an intermediate state can be used for that op and transform rather than storing a result that has to be usable by any operation.

To the extent that these cast operations can apply in this way, the benefit of the "prefix" over cast may be limited and not warrant a "new" prefix mechanism like #423 (and a specific application #456).

On 2020-05-16 8:50 a.m., David Horner via wrote:
I would agree that "by definition" this is a sufficient condition to obtain the instructions that Krste was envisioning of instructions the also nop on SLEN=VLEN machine.
That is a sufficient condition to address the byte mismatch of SLEN < VLEN.

However, is it necessary as it is a very expensive operation for SLEN<?

Are there casting instructions that are reasonably low cost on both SLEN= and SLEN< VLEN that create an intermediate state that works for both?

And if there are such operations, do you only provide them (and NOT the heavy handed "as if written to memory and back")?
Can two such instructions do the full transition for SLEN< to SLEN=?
If so, is it sufficiently easy to recognize such a pair and fuse as a nop on SLEN= systems?
Can applications alternatively rely on a linkage editor to nop them?

 I have no good solution (yet) as the guts of the range of microarch tricks is not my forte.

But there are others who undoubtedly are mulling over such considerations.

It would not be a lose-win proposition but a limited win-win.

I look forward to Krste's proposals . I have been surprised before!!

On 2020-05-16 2:07 a.m., Bill Huffman wrote:
It seems like the function of a cast instruction the same as storing to
memory (stride-1) with one SEW and loading back the same number of bytes
with another SEW.  Is that a correct understanding?


On 5/15/20 11:55 AM, Krste Asanovic wrote:

Date: 2020/5/15
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of Attendees: ~20
Current issues on github:;!!EHscmS1ygiU1lA!W3LXrGwuFwNIJ12NX5xQnmMbzk4zgzIDO39xVFEgrQGQSggvT8Zg9M2ElNRv61w$

Issues discussed:

# MLEN=1 change

The new layout of mask registers with fixed MLEN=1 was discussed.  The
group was generally in favor of the change, though there is a proposal
in flight to rearrange bits to align with bytes.  This might save some
wiring but could increase bits read/written for the mask in a

#434 SLEN=VLEN as optional extension

Most of the time was spent discussing the possible software
fragmentation from having code optimized for SLEN=LEN versus
SLEN<VLEN, and how to avoid.  The group was keen to prevent possible
fragmentation, so is going to consider several options:

- providing cast instructions that are mandatory, so at least
    SLEN<VLEN code runs correctly on SLEN=VLEN machines.

- consider a different data layout that could allow casting up to ELEN
    (<=SLEN), however these appear to result in even greater variety of
    layouts or dynamic layouts

- invent a microarchitecture that can appear as SLEN=VLEN but
    internally restrict datapath communication within SLEN width of
    datapath, or prove this is impossible/expensive

# v0.9

The group agreed to declare the current version of the spec as 0.9,
representing a clear stable step for software and implementors.

Join { to automatically receive all group messages.