Re: Vector Task Group minutes 2020/5/15
Nick Knight
I appreciate this discussion about making things friendlier to software. I've always felt the constraints on SLENagnostic software to be a nuisance, albeit a mild one. However, I do have a concern about removing LMUL > 1 memory operations regarding code bloat. This is all purely subjective: I have not done any concrete analysis. But here is an example that illustrates my concern: # C code: # int8_t x[N]; for(int i = 0; i < N; ++i) ++x[i]; # keep N in a0 and &x[0] in a1 # "BEFORE" (original RVV code): loop: vsetvli t0, a0, e8,m8 vle8.v v0, (a1) vadd.vi v0, v0, 1 vse8.v v0, (a1) add a1, a1, t0 sub a0, a0, t0 bnez a0, loop # "AFTER" removing LMUL > 1 loads/stores: loop: vsetvli t0, a0, e8,m8 mv t1, t0 mv a2, a1 # loads: vsetvli t2, t1, e8,m1 vle8.v v0, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vle8.v v1, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vle8.v v2, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vle8.v v3, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vle8.v v4, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vle8.v v5, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vle8.v v6, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vle8.v v7, (a2) # cast instructions ... vsetvli x0, t0, e8,m8 vadd.vi v0, (a1) # more cast instructions ... mv t1, t0 mv a2, a1 # stores: vsetvli t2, t1, e8,m1 vse8.v v0, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vse8.v v1, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vse8.v v2, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vse8.v v3, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vse8.v v4, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vse8.v v5, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vse8.v v6, (a2) add a2, a2, t2 sub t2, t2, t1 vsetvli t2, t1, e8,m1 vse8.v v7, (a2) add a1, a1, t0 sub a0, a0, t0 bnez a0, loop On Wed, May 27, 2020 at 10:24 AM Guy Lemieux <glemieux@...> wrote: The precise data layout pattern does not matter. 
