Re: Sparse Matrix-Vector Multiply (again) and Bit-Vector Compression
I believe that the (rough) idea I sketched earlier in this thread (May 8) still works with the latest version of the spec --- please correct me if I'm wrong --- what I called "sketchy type-punning" (of the mask register) is now kosher. A potential issue in my sketch is the vector-to-scalar dependence caused by vpopc.m: I can imagine this performing poorly on implementations with decoupled scalar and vector pipes. If I correctly understand Krste's suggestion of using viota.m, it would avoid this vpopc.m and a few other mask operations at the cost of converting the unit-stride load of matrix nonzeros into a gather. Please keep in mind this is in the context of the "bitvector" sparse matrix representation that Nagendra was considering. I'm not aware of anyone using this representation for the SpMVs appearing in the HPCG benchmark, or more generally in HPC applications. (In a private email thread, Nagendra told me he had machine learning applications in mind.) The "standard" CSR SpMV algorithm is much simpler to express in RVV. Also regarding HPCG, I think it would be more interesting (and challenging) to study the preconditioner ("SymGS") than the SpMV. Best, Nick On Wed, Jul 8, 2020 at 3:11 PM swallach <steven.wallach@...> wrote:
|
|