Re: Sparse Matrix-Vector Multiply (again) and Bit-Vector Compression

Nick Knight

I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version of the spec --- please correct me if I'm wrong --- what I called "sketchy type-punning" (of the mask register) is now kosher. A potential issue in my sketch is the vector-to-scalar dependence caused by vpopc.m: I can imagine this performing poorly on implementations with decoupled scalar and vector pipes. If I correctly understand Krste's suggestion of using viota.m, it would avoid this vpopc.m and a few other mask operations at the cost of converting the unit-stride load of matrix nonzeros into a gather.

Please keep in mind this is in the context of the "bitvector" sparse matrix representation that Nagendra was considering. I'm not aware of anyone using this representation for the SpMVs appearing in the HPCG benchmark, or more generally in HPC applications. (In a private email thread, Nagendra told me he had machine learning applications in mind.) The "standard" CSR SpMV algorithm is much simpler to express in RVV. Also regarding HPCG, I think it would be more interesting (and challenging) to study the preconditioner ("SymGS") than the SpMV.


On Wed, Jul 8, 2020 at 3:11 PM swallach <steven.wallach@...> wrote:
please share the asm for  spmv,  the key kernel (s),  

in any case,  the execution time for operations  using a mask,  is very implementation/machine dependent

it is a function on how aggressive,  in hardware,    addressing the vector register elements is.  is addressing always sequential  or addresses generated out of sequence based on the bit mask (in V0 

as a result of the increasing performance of HPCG, spmv is an extremely  important computation kernel.

in my experience,  the dominant  time for HPCG, is the memory latency time (short vectors).   also node to  node networking

| I am now investigating how to efficiently implement sparse matrix X (dense) vector multiplications (spMV) using RISCV vectors using bit-vector format of
| compressing the sparse matrix. The inner loop of sequential spMV algorithm simply multiply-accumulates non-zeroes of a row of M with corresponding values of the
| dense vector (say V[]) to form an output element.  To implement this efficiently where bit-vectors are used to compress M is where I have a question/observation.
| In bit-vector compression, the sparse matrix metadata is a bit map of 0s and 1s. A 0 indicates that the matrix value is a 0. A 1 indicates that the matrix value is
| non-zero. A values array -- say M_Vals[.] -- stores these non-zero values. 

WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

Join to automatically receive all group messages.