On 4/27/20 12:32 PM, krste@... wrote:
| On 4/27/20 7:02 AM, Krste Asanovic wrote:
On Mon, 27 Apr 2020 18:14:39 +0000, Bill Huffman <huffman@...> said:
|| I created a github issue for this, #434 - text repeated below,
|| Should SLEN=VLEN be an extension?
| It might be the case that the machines where SLEN=VLEN would be the same
| machines where it would be attractive to use vectors for such code -
| machines where vectors provided larger registers and some parallelism
| rather than machines where vectors usually complete in one or a few
| cycles and wouldn't deal well with irregular operations. That probably
| increases the value of an extension.
I think having vectors complete in one or a few cycles (shallow
temporal) is orthogonal to choice of SLEN=VLEN.
I think SLEN=VLEN is simply about how wide you want interactions
between arithmetic units. I'm guessing e.g. 128-256b wide datapaths
are probably OK with SLEN=VLEN, whereas 512b and up datapaths are
probably starting to see issues, independent of VLEN in either case.
Sorry, I didn't say what I meant very well. I agree that it's the width
that matters. Machines with short vector registers are likely to be
SLEN=VLEN even if the complete quickly.
In my experience 256b width is shaky and may well want SLEN=128.
In any case, I'm wondering if having cast instructions is better than an
extension. I think it avoids the potential fragmentation.
| On the other hand, adding casting operations would seem to decrease the
| value of an extension (see below).
|| A second issue either way is whether we should add "cast"
|| operations. They are primarily useful for the SLEN<VLEN machines
|| though are difficult to implement efficiently there; the SLEN=VLEN
|| implementation is just a register-register copy. We could choose to
|| add the cast operations as another optional extension, which is my
|| preference at this time.
| Where SLEN<VLEN, cast operations might be implemented as vector register
| gather operations with element index values determined by SLEN, VLEN and
Agree this is a sensible implementation strategy, but pattern is
simpler than general vrgather, and can also implement as a store(src
SEW)+load(dest SEW) across memory crossbar given that you need to
materialize/parse in-memory formats there anyway.
OK. That's also quite do-able. Physical layout and control issues
could make for either implementation, I think.
| But where SLEN=VLEN, they would be moves. If then, we add casts,
| would an SLEN=VLEN extension still be valuable?
Casting makes it possible to have a common interface, but given that
SLEN=VLEN will be common choice and it's easy for software to figure
this out, and there is a performance/complexity advantage to not using
the casts when SLEN=VLEN, I can't see mandating everyone use the
casting model as working in practice. Also, I don't believe casting
provides an efficient solution for all the use cases.
Now, a SLEN<VLEN machine could provide a configuration switch to turn
off all but the first SLEN partition (maybe what David was alluding to)
and then support the SLEN=VLEN extension albeit at reduced
Agreed. That's feasible. It might be set by vsetvl, but unchanged by
vsetvli, and implemented by reduction of VLMAX as you suggest. That
might be a reasonable tradeoff.
Maybe there's no cast and no extension. Only a bit that may reduce
performance, but makes SLEN=VLEN.
And an SLEN=VLEN machine could implement the cast extension to run
software that used those at no penalty.