Re: More thoughts on Git update (8a9fbce) Added fractional LMUL

Krste Asanovic

On Mon, 27 Apr 2020 18:14:39 +0000, Bill Huffman <huffman@...> said:
| On 4/27/20 7:02 AM, Krste Asanovic wrote:
|| I created a github issue for this, #434 - text repeated below,
|| Krste
|| Should SLEN=VLEN be an extension?

| It might be the case that the machines where SLEN=VLEN would be the same
| machines where it would be attractive to use vectors for such code -
| machines where vectors provided larger registers and some parallelism
| rather than machines where vectors usually complete in one or a few
| cycles and wouldn't deal well with irregular operations. That probably
| increases the value of an extension.

I think having vectors complete in one or a few cycles (shallow
temporal) is orthogonal to choice of SLEN=VLEN.

I think SLEN=VLEN is simply about how wide you want interactions
between arithmetic units. I'm guessing e.g. 128-256b wide datapaths
are probably OK with SLEN=VLEN, whereas 512b and up datapaths are
probably starting to see issues, independent of VLEN in either case.

| On the other hand, adding casting operations would seem to decrease the
| value of an extension (see below).

|| A second issue either way is whether we should add "cast"
|| operations. They are primarily useful for the SLEN<VLEN machines
|| though are difficult to implement efficiently there; the SLEN=VLEN
|| implementation is just a register-register copy. We could choose to
|| add the cast operations as another optional extension, which is my
|| preference at this time.

| Where SLEN<VLEN, cast operations might be implemented as vector register
| gather operations with element index values determined by SLEN, VLEN and
| SEW.

Agree this is a sensible implementation strategy, but pattern is
simpler than general vrgather, and can also implement as a store(src
SEW)+load(dest SEW) across memory crossbar given that you need to
materialize/parse in-memory formats there anyway.

| But where SLEN=VLEN, they would be moves. If then, we add casts,
| would an SLEN=VLEN extension still be valuable?

Casting makes it possible to have a common interface, but given that
SLEN=VLEN will be common choice and it's easy for software to figure
this out, and there is a performance/complexity advantage to not using
the casts when SLEN=VLEN, I can't see mandating everyone use the
casting model as working in practice. Also, I don't believe casting
provides an efficient solution for all the use cases.

Now, a SLEN<VLEN machine could provide a configuration switch to turn
off all but the first SLEN partition (maybe what David was alluding to)
and then support the SLEN=VLEN extension albeit at reduced

And an SLEN=VLEN machine could implement the cast extension to run
software that used those at no penalty.


Join { to automatically receive all group messages.