Re: Vector Task Group minutes 2020/5/15


Bill Huffman
 

Hi David,

I wasn't at all trying to suggest that we could use the store/load
combination instead of defining cast instructions. I was only trying to
verify the required operation.

You said that it is a sufficient condition. If there's a less
constraining condition that is also sufficient, could you describe? I'm
hoping either to prove there is not a layout or (possibly) show that
there is one.

Bill

On 5/16/20 6:06 AM, David Horner wrote:
EXTERNAL MAIL


By the way, this is similar to the "prefix" proposal that applies the
transform to selected source and/or destinations.

The cost is reduced as the "transform" is occurring while the operation
is also occurring,
  -   on SLEN= machines these "prefix" are nops
  - on SLEN< micro archs the specific combination of transform and
operation can be optimized - especially for high use combinations.

The cost is further reduced for some combinations as an intermediate
state can be used for that op and transform rather than storing a result
that has to be usable by any operation.

To the extent that these cast operations can apply in this way, the
benefit of the "prefix" over cast may be limited and not warrant a "new"
prefix mechanism like #423 (and a specific application #456).

https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/issues/423__;!!EHscmS1ygiU1lA!Vi0QgSrHLMFBGFcES-MJshFWqPS3cRhfJEvF2d9GJNVwi-XzSV3xVUFlEBpRH4U$
https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/issues/456__;!!EHscmS1ygiU1lA!Vi0QgSrHLMFBGFcES-MJshFWqPS3cRhfJEvF2d9GJNVwi-XzSV3xVUFlt3H0BTg$



On 2020-05-16 8:50 a.m., David Horner via lists.riscv.org wrote:
I would agree that "by definition" this is a sufficient condition to
obtain the instructions that Krste was envisioning of instructions the
also nop on SLEN=VLEN machine.
That is a sufficient condition to address the byte mismatch of SLEN <
VLEN.

However, is it necessary as it is a very expensive operation for SLEN<?

Are there casting instructions that are reasonably low cost on both
SLEN= and SLEN< VLEN that create an intermediate state that works for
both?

And if there are such operations, do you only provide them (and NOT
the heavy handed "as if written to memory and back")?
Can two such instructions do the full transition for SLEN< to SLEN=?
If so, is it sufficiently easy to recognize such a pair and fuse as a
nop on SLEN= systems?
Can applications alternatively rely on a linkage editor to nop them?

 I have no good solution (yet) as the guts of the range of microarch
tricks is not my forte.

But there are others who undoubtedly are mulling over such
considerations.

It would not be a lose-win proposition but a limited win-win.

I look forward to Krste's proposals . I have been surprised before!!

On 2020-05-16 2:07 a.m., Bill Huffman wrote:
It seems like the function of a cast instruction the same as storing to
memory (stride-1) with one SEW and loading back the same number of bytes
with another SEW.  Is that a correct understanding?

       Bill

On 5/15/20 11:55 AM, Krste Asanovic wrote:
EXTERNAL MAIL



Date: 2020/5/15
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of Attendees: ~20
Current issues on github:
https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec__;!!EHscmS1ygiU1lA!W3LXrGwuFwNIJ12NX5xQnmMbzk4zgzIDO39xVFEgrQGQSggvT8Zg9M2ElNRv61w$


Issues discussed:

# MLEN=1 change

The new layout of mask registers with fixed MLEN=1 was discussed.  The
group was generally in favor of the change, though there is a proposal
in flight to rearrange bits to align with bytes.  This might save some
wiring but could increase bits read/written for the mask in a
microarchitecture.

#434 SLEN=VLEN as optional extension

Most of the time was spent discussing the possible software
fragmentation from having code optimized for SLEN=LEN versus
SLEN<VLEN, and how to avoid.  The group was keen to prevent possible
fragmentation, so is going to consider several options:

- providing cast instructions that are mandatory, so at least
    SLEN<VLEN code runs correctly on SLEN=VLEN machines.

- consider a different data layout that could allow casting up to ELEN
    (<=SLEN), however these appear to result in even greater variety of
    layouts or dynamic layouts

- invent a microarchitecture that can appear as SLEN=VLEN but
    internally restrict datapath communication within SLEN width of
    datapath, or prove this is impossible/expensive


# v0.9

The group agreed to declare the current version of the spec as 0.9,
representing a clear stable step for software and implementors.










Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.