#### Re: Check mask all ones / all zeros

Guy Lemieux

It depends -- exactly what do you plan to do after determining if a
mask is all-0 or all-1 or other?

vpopc and vfirst can both special-case these common results via
precomputation, so they both take minimal cycles. in that regard, they are
equivalent and there is no need to add your special instruction.

the problem is that both vpopc.m and vfirst.m write to the X register
file, which forces synchronization between scalar and vector units.
this may cost extra cycles of stalling ... which may negatively affect
performance. you could introduce a new instruction or a CSR read which
checks the mask result in an asynchronous fashion (or not).

so, what exactly do you plan to do after knowing the result is all-0
or all-1 ? do you want to initiate a branch or something else? does a
precise (synchronized) result matter, or can you tolerate decoupling
delays?

for example, it could be possible to specify that a CSR contains the
result of a mask being all-0, all-1, or otherwise, and that this CSR
is asynchronously updated. hence, a scalar control loop may operate
until the all-0 result is finally true without causing any hard
synchronization with the vector unit. this sort of approach would work
for some computaitons, eg mandelbrot, which require a change in the
control flow after all units have achieved a certain status, and where
there is no harm to continuing an extra iteration or two due to
latency between vector instructions and the CSR.

g

On Wed, May 19, 2021 at 10:49 PM Roger Ferrer Ibanez
<roger.ferrer@...> wrote:

Hi all,

I could not find any instruction that immediately computes this. Apologies if I missed the obvious here.

Two options came to mind:

vpopc.m and check whether the result is 0 (all zeros) or VLMAX(SEW, LMUL). I am under the impression that population count is not a fast operation (though I guess it depends on the actual VLEN)
vfirst.m, returns -1 it the mask is all zeros. For all ones we can do vmnot.m first and then vfirst.m. Might not be much faster than vpopc.m but (at expense of vmnot.m) does not need to compute VLMAX(SEW,LMUL).

Perhaps there are other alternatives?

Thoughts on whether it'd make sense to have a specific instruction for these checks? As in one instruction that returns one of three possible results (e.g. 1 for all ones, -1 for all zeros, 0 otherwise) in a GPR.

Thank you very much,

--
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación

WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer

Join tech-vector-ext@lists.riscv.org to automatically receive all group messages.