Re: Sparse Matrix-Vector Multiply (again) and Bit-Vector Compression
toggle quoted messageShow quoted text
The goal of the Zvediv extension is to add some support for sub-byte datatypes. So my hope is that applications that use bitvectors to encode nonzero structure will be supported via Zvediv. However, Zvediv is currently a very rough draft and likely won't be included in the base V-extension.
In the meanwhile, I suspect there are some tricks you could use. The following code fragment is a rough sketch of the idea I have in mind (I expect there are bugs.)
# a0 = number of matrix columns left in current row
# a1 = pointer to bitvector (current offset)
# a2 = pointer to matrix nonzero array (current offset)
# a3 = pointer to (dense) right-hand side array (current offset)
vsetvli t0, a0, e32, m8 # stripmine loop
addi t1, t0, 7
srli t1, t1, 3 # t1 = ceil(t0/8), the number of bitvector bytes that
# encode the nonzero structure of the next t0 columns
vsetvli x0, t1, e8, m8
vle8 v0, (a1) # load bitvector chunk for next t0 columns
vsetvli x0, t0, e8, m8
vpopc.m t2, v0 # t2 = number of nonzeros in chunk;
# **DANGER** sketchy type-punning on v0
vcompress.vm v16, v8, v0
vsetvli x0, t2, e32, m8
vle32 v0, (a2) # load matrix nonzeros
vlxei8 v8, (a3), v16 # gather RHS entries;
# **DANGER** sketchy type-punning on v16
# do dot product
# bump pointers and loop to next column chunk (or next row).
My hope was that the "sketchy type-punning" should succeed when SLEN = VLEN, but it will presumably fail when SLEN < VLEN. In the latter case, in the absence of typecast instructions, we'll need to use loads/stores (or uarch-dependent vrgathers) to permute data in the VRF.
On Thu, May 7, 2020 at 11:43 AM Nagendra Gulur <nagendra.gd@...> wrote:
I am now investigating how to efficiently implement sparse matrix X (dense) vector multiplications (spMV) using RISCV vectors using bit-vector format of compressing the sparse matrix. The inner loop of sequential spMV algorithm simply multiply-accumulates non-zeroes of a row of M with corresponding values of the dense vector (say V) to form an output element. To implement this efficiently where bit-vectors are used to compress M is where I have a question/observation. In bit-vector compression, the sparse matrix metadata is a bit map of 0s and 1s. A 0 indicates that the matrix value is a 0. A 1 indicates that the matrix value is non-zero. A values array -- say M_Vals[.] -- stores these non-zero values.