Re: Sparse MatrixVector Multiply (again) and BitVector Compression
Nick Knight
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version of the spec  please correct me if I'm wrong  what I called "sketchy typepunning" (of the mask register) is now kosher. A potential issue in my sketch is the vectortoscalar dependence caused by vpopc.m: I can imagine this performing poorly on implementations with decoupled scalar and vector pipes. If I correctly understand Krste's suggestion of using viota.m, it would avoid this vpopc.m and a few other mask operations at the cost of converting the unitstride load of matrix nonzeros into a gather. Please keep in mind this is in the context of the "bitvector" sparse matrix representation that Nagendra was considering. I'm not aware of anyone using this representation for the SpMVs appearing in the HPCG benchmark, or more generally in HPC applications. (In a private email thread, Nagendra told me he had machine learning applications in mind.) The "standard" CSR SpMV algorithm is much simpler to express in RVV. Also regarding HPCG, I think it would be more interesting (and challenging) to study the preconditioner ("SymGS") than the SpMV. Best, Nick
On Wed, Jul 8, 2020 at 3:11 PM swallach <steven.wallach@...> wrote:

