Re: Vector Task Group minutes 2020/12/04
I'm not contemplating changing mask design (yet again) at this point
in process. I don't see any great advantage to any of these last
round of proposals, as they all have significant downsides for some
part of implementation space. The current design, like any real
design, is not perfect, but does balance a lot of competing concerns
coming from different design points.
@sols: The mask load instructions are being added to allow a
microarchitecture to see all common mask writes, enabling complex
microarchitectures to perform mask optimizations. In particular, for
wide datapaths and for renamed registers.
Without renaming, and without deep temporal registers, having v0 be
only mask source reduces cost of mask read port.
@swallach: The mask logical operations can be fused with masked
operations in more complex machines to reduce software cost of only
allowing v0 be mask.
@sols,lidawei: Adding more dedicated mask register state increase
cost/complexity for all machines. Long LMUL needs a lot of bits to
hold mask. Dropping longer LMUL would reduce efficiency of simple
@roger: Using x registers for masks breaks vector-length agnostic goal and
would limit LMUL.
@lidawei: Fractional LMUL helps with case where you want widening
operations and lots of mask registers. If uarch utlization is low
with lower LMUL, then one solution is to increase VLEN for same
physical datapath width.
@swallach: ARM SVE uses predicates to implement vector length, so
unsurprisingly ends up needing more mask resources. RVV vl can be
considered additional mask that is AND-ed in with each mask.
| Having been a silent observer of this group for what seems like a very long time, but now recently liberated from previousOn Wed, 16 Dec 2020 14:08:22 -0800, Grant Martin <email@example.com> said:
| constraints, I will observe that I have seen the use in DSPs of both dedicated mask register files and use of general vector
| type registers to serve this purpose.
| Along with operations for manipulating them.
| While there are pros and cons for both, I lean to the side of not having a special mask register file and special operations,
| but instead use existing resources and operations.
| However I have a process observation as well - it has taken RV Vector proposal a long time to converge to a near 1.0
| specification. Would going down a different route cause enough delay and debate that it would derange the process and
| significantly delay the standardization that is desired? As opposed to more modest suggestions.
| Thanks and best regards
| Grant Martin
| Mobile +1.510.703.7470
| Home +1.925.846.8683
| Sent from my iPad
| On Dec 16, 2020, at 12:54 PM, swallach <firstname.lastname@example.org> wrote:
| i guess i am looking at the wrong set of apps.
| in any case VM registers NOT in the vector registers permits a robust and performance optimized operations under mask.
| wrt extra instructions. i am neutral.
| On Dec 16, 2020, at 3:49 PM, Roger Espasa <email@example.com> wrote:
| 8 Maks registers are quite needed in modern outer-vectorized loops. Also in graphic shaders. I would say 16 is
| Now, and I am not defending this, if we had to go this route, I would seriously fight for masks-in-x-registers. I.e
| :no new state , no new instructions. Only a few arch tricks to try to avoid loss of decoupling between vector unit and
| scalar unit. That’s better than a new set of registers and instructions
| On Wed, 16 Dec 2020 at 21:34, swallach <firstname.lastname@example.org> wrote:
| in my experience only only one maybe two vm registers are needed
| nested loops under if statements is rare.
|| On Dec 16, 2020, at 3:29 PM, Bill Huffman <email@example.com> wrote:
|| I don’t think a separate mask register will do at all. It would take a mask register file with at least 8 and
| maybe 16 registers. Lots of compare results need to be kept and operations need to be done on mask registers. I
| don't think we should have a separate mask register file.
|| -----Original Message-----
|| From: firstname.lastname@example.org <email@example.com> On Behalf Of swallach
|| Sent: Wednesday, December 16, 2020 12:26 PM
|| To: Alex Solomatnikov <firstname.lastname@example.org>
|| Cc: Krste Asanovic <email@example.com>; firstname.lastname@example.org
|| Subject: Re: [RISC-V] [tech-vector-ext] Vector Task Group minutes 2020/12/04
|| EXTERNAL MAIL
|| i totally agree. if this is done, then instructions like: count bits, etc can directly apply to the mask
|| also, from a hardware implementation, the VM register can be implemented with LATÇHES. this facilitates a
| better implementation (imho) for operations under mask
|| and yes load and store VM are required
|| If separate loads and stores are introduced for mask, then separate vmask register can be introduced to avoid
| dual use of v0 (as a regular vector register and as a mask register) and its complications.
| WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and
| may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If
| you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are
| strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this
| communication in error, please notify the sender and destroy and delete any copies you may have received.