Re: More thoughts on Git update (8a9fbce) Added fractional LMUL
On 2020-04-27 3:56 p.m., Bill Huffman wrote:
Perhaps you would be willing to comment on #410 Place stabler RVV control fields in bits [30:12] of vtype.On 4/27/20 12:32 PM, krste@... wrote:EXTERNAL MAILOn Mon, 27 Apr 2020 18:14:39 +0000, Bill Huffman <huffman@...> said:| On 4/27/20 7:02 AM, Krste Asanovic wrote: [..] || I created a github issue for this, #434 - text repeated below, || Krste || || Should SLEN=VLEN be an extension? || [...] | It might be the case that the machines where SLEN=VLEN would be the same | machines where it would be attractive to use vectors for such code - | machines where vectors provided larger registers and some parallelism | rather than machines where vectors usually complete in one or a few | cycles and wouldn't deal well with irregular operations. That probably | increases the value of an extension. I think having vectors complete in one or a few cycles (shallow temporal) is orthogonal to choice of SLEN=VLEN. I think SLEN=VLEN is simply about how wide you want interactions between arithmetic units. I'm guessing e.g. 128-256b wide datapaths are probably OK with SLEN=VLEN, whereas 512b and up datapaths are probably starting to see issues, independent of VLEN in either case.Sorry, I didn't say what I meant very well. I agree that it's the width that matters. Machines with short vector registers are likely to be SLEN=VLEN even if the complete quickly. In my experience 256b width is shaky and may well want SLEN=128. In any case, I'm wondering if having cast instructions is better than an extension. I think it avoids the potential fragmentation.| On the other hand, adding casting operations would seem to decrease the | value of an extension (see below). || A second issue either way is whether we should add "cast" || operations. They are primarily useful for the SLEN<VLEN machines || though are difficult to implement efficiently there; the SLEN=VLEN || implementation is just a register-register copy. We could choose to || add the cast operations as another optional extension, which is my || preference at this time. | Where SLEN<VLEN, cast operations might be implemented as vector register | gather operations with element index values determined by SLEN, VLEN and | SEW. Agree this is a sensible implementation strategy, but pattern is simpler than general vrgather, and can also implement as a store(src SEW)+load(dest SEW) across memory crossbar given that you need to materialize/parse in-memory formats there anyway.OK. That's also quite do-able. Physical layout and control issues could make for either implementation, I think.| But where SLEN=VLEN, they would be moves. If then, we add casts, | would an SLEN=VLEN extension still be valuable? Casting makes it possible to have a common interface, but given that SLEN=VLEN will be common choice and it's easy for software to figure this out, and there is a performance/complexity advantage to not using the casts when SLEN=VLEN, I can't see mandating everyone use the casting model as working in practice. Also, I don't believe casting provides an efficient solution for all the use cases. Now, a SLEN<VLEN machine could provide a configuration switch to turn off all but the first SLEN partition (maybe what David was alluding to) and then support the SLEN=VLEN extension albeit at reduced performance.Agreed. That's feasible. It might be set by vsetvl, but unchanged by vsetvli,
reduction of VLMAX is not sufficient.and implemented by reduction of VLMAX as you suggest. That might be a reasonable tradeoff.
Within each SLEN chunk the existing data will already be "scrambled".
It would be possible to load SEW=SLEN data (or load whole register) to prep the data, avoiding scrambling.
But otherwise, the new _source_ data will need to be loaded under tne new mode.
And it does not address register groups (of more than 1 physical register)
To both limit to one SLEN group AND reduce registrar groups to a single physical register is a double wammy.
Maybe there's no cast and no extension. Only a bit that may reduce performance, but makes SLEN=VLEN. BillAnd an SLEN=VLEN machine could implement the cast extension to run software that used those at no penalty. Krste