Re: Vector TG meeting minutes 2020/4/03

David Horner

I agree Nick.

So here is a suggestion, not completely facetiously:

For load byte/half/word

example when SEW = 64

An implementation can optimize the sequence

strided load by 1/2/4

shift left 56/48/32

arith right 56/48/32

but a sign extend byte/half/word to SEW would make fusing/chaining simpler.

And these without widening.

For stores:

a “pack” SEW (of byte/half/word) instruction by SLEN into appropriate LMUL=1/8, 1/4 or 1/2 would allow standard unit strided store to work.

A fractional LMUL that uses interleave (rather than right justified SLEN chunks) would not need this pack instruction.

On 2020-04-04 8:04 p.m., Nick Knight wrote:

Hi Thang,

Can you, and anyone else who responds, please be concrete about the applications you have in mind? I tried to do so in my email.

In my opinion, concrete examples are crucial to making an informed decision. I hope you agree.

Nick Knight

On Sat, Apr 4, 2020 at 4:56 PM Thang Tran <thang@...> wrote:
There are real application (mixed integer/FP - convert instruction is used) codes written with load/store byte/halfword/word. There is a huge performance impact by adding widening instruction in a small critical loop where every additional instruction causes > 10% impact on performance.

I am strongly against dropping the byte/halfword/word for load/store.

Thanks, Thang

-----Original Message-----
From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Krste Asanovic
Sent: Saturday, April 4, 2020 1:43 PM
To: tech-vector-ext@...
Subject: [RISC-V] [tech-vector-ext] Vector TG meeting minutes 2020/4/03

Date: 2020/4/03
Task Group: Vector Extension
Chair: Krste Asanovic
Number of Attendees: ~15
Current issues on github:

Issues discussed: #354/362

The following issues were discussed.

Closing on version v0.9. A list of proposed changes to form v0.9 were presented.  The main dispute was around dropping byte/halfword/word vector load/stores.

#354/362 Drop byte/halfword/word vector load/stores

Most of the meeting time was spent discussing this issue, which was contentious.

Participants in favor of retaining these instructions were concerned about the code size and performance impact of dropping them.
Proponents in favor of dropping them noted that the main impact was only for integer code (floating-point code does not benefit from these instructions), that performance might be lower using these instructions rather than widening, and that there was a large benefit in reducing memory pipeline complexity.  The group was going to consider some examples to be supplied by the members, including some mixed floating-point/integer code.

Discussion to continue on mailing list.

Join to automatically receive all group messages.