Re: Check mask all ones / all zeros
It depends -- exactly what do you plan to do after determining if a
mask is all-0 or all-1 or other?
vpopc and vfirst can both special-case these common results via
precomputation, so they both take minimal cycles. in that regard, they are
equivalent and there is no need to add your special instruction.
the problem is that both vpopc.m and vfirst.m write to the X register
file, which forces synchronization between scalar and vector units.
this may cost extra cycles of stalling ... which may negatively affect
performance. you could introduce a new instruction or a CSR read which
checks the mask result in an asynchronous fashion (or not).
so, what exactly do you plan to do after knowing the result is all-0
or all-1 ? do you want to initiate a branch or something else? does a
precise (synchronized) result matter, or can you tolerate decoupling
for example, it could be possible to specify that a CSR contains the
result of a mask being all-0, all-1, or otherwise, and that this CSR
is asynchronously updated. hence, a scalar control loop may operate
until the all-0 result is finally true without causing any hard
synchronization with the vector unit. this sort of approach would work
for some computaitons, eg mandelbrot, which require a change in the
control flow after all units have achieved a certain status, and where
there is no harm to continuing an extra iteration or two due to
latency between vector instructions and the CSR.
On Wed, May 19, 2021 at 10:49 PM Roger Ferrer Ibanez