Hi,
Perhaps instead of using bit vector to encode an entire matrix, we can encode a sub block.
There is a common sparse matrix format called BCSR that blocks the non-zero values of CSR, so that we can reduce col_ind[] storage and reused vector x.
The main disadvantage of BCSR is we have to pad zeros, where we can actually use a bit mask to encode nonzeros of a sub block as Nagendra's bit vector implementation so that the overhead can be avoided.
I could not find good reduction instructions for tiled matrix vector multiplications if we have multiple rows in a block.
One sub block:
A =
a b
0 d
Corresponding x:
x =
e
f
Bit vector:
1 1 0 1
Computation:
a b 0 d
e f e f
fmul = ae bf 0e df
accumulate (reduction) ae+bf,0e+df
(Note we can skip that zero computation using bit mask).
Thanks,
Dawei