Re: More thoughts on Git update (8a9fbce) Added fractional LMUL


Bill Huffman
 

On 4/27/20 12:32 PM, krste@... wrote:
EXTERNAL MAIL



On Mon, 27 Apr 2020 18:14:39 +0000, Bill Huffman <huffman@...> said:
| On 4/27/20 7:02 AM, Krste Asanovic wrote:
[..]
|| I created a github issue for this, #434 - text repeated below,
|| Krste
||
|| Should SLEN=VLEN be an extension?
||
[...]

| It might be the case that the machines where SLEN=VLEN would be the same
| machines where it would be attractive to use vectors for such code -
| machines where vectors provided larger registers and some parallelism
| rather than machines where vectors usually complete in one or a few
| cycles and wouldn't deal well with irregular operations. That probably
| increases the value of an extension.

I think having vectors complete in one or a few cycles (shallow
temporal) is orthogonal to choice of SLEN=VLEN.

I think SLEN=VLEN is simply about how wide you want interactions
between arithmetic units. I'm guessing e.g. 128-256b wide datapaths
are probably OK with SLEN=VLEN, whereas 512b and up datapaths are
probably starting to see issues, independent of VLEN in either case.
Sorry, I didn't say what I meant very well. I agree that it's the width
that matters. Machines with short vector registers are likely to be
SLEN=VLEN even if the complete quickly.

In my experience 256b width is shaky and may well want SLEN=128.

In any case, I'm wondering if having cast instructions is better than an
extension. I think it avoids the potential fragmentation.


| On the other hand, adding casting operations would seem to decrease the
| value of an extension (see below).

|| A second issue either way is whether we should add "cast"
|| operations. They are primarily useful for the SLEN<VLEN machines
|| though are difficult to implement efficiently there; the SLEN=VLEN
|| implementation is just a register-register copy. We could choose to
|| add the cast operations as another optional extension, which is my
|| preference at this time.

| Where SLEN<VLEN, cast operations might be implemented as vector register
| gather operations with element index values determined by SLEN, VLEN and
| SEW.

Agree this is a sensible implementation strategy, but pattern is
simpler than general vrgather, and can also implement as a store(src
SEW)+load(dest SEW) across memory crossbar given that you need to
materialize/parse in-memory formats there anyway.
OK. That's also quite do-able. Physical layout and control issues
could make for either implementation, I think.


| But where SLEN=VLEN, they would be moves. If then, we add casts,
| would an SLEN=VLEN extension still be valuable?

Casting makes it possible to have a common interface, but given that
SLEN=VLEN will be common choice and it's easy for software to figure
this out, and there is a performance/complexity advantage to not using
the casts when SLEN=VLEN, I can't see mandating everyone use the
casting model as working in practice. Also, I don't believe casting
provides an efficient solution for all the use cases.

Now, a SLEN<VLEN machine could provide a configuration switch to turn
off all but the first SLEN partition (maybe what David was alluding to)
and then support the SLEN=VLEN extension albeit at reduced
performance.
Agreed. That's feasible. It might be set by vsetvl, but unchanged by
vsetvli, and implemented by reduction of VLMAX as you suggest. That
might be a reasonable tradeoff.

Maybe there's no cast and no extension. Only a bit that may reduce
performance, but makes SLEN=VLEN.

Bill


And an SLEN=VLEN machine could implement the cast extension to run
software that used those at no penalty.

Krste



Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.