Re: A couple of questions about the vector spec

Guy Lemieux

1. Yes, understood about the scalar destination complication. I have to figure a better use of vd[1], vd[2] etc.
possibly vslide1up after every reduction, producing a vector of
reductions (possibly in backwards order, unless you rearrange your
outer loop order).

2. The vrgather (per 0.8 spec) does vd[i] = vs2[vs1[i]] -- I am not sure how this fixes the conversion from index to address offset. I need the bit shift to happen somewhere in the code.
I'm not suggesting that you use vrgather to convert indices to byte
offsets. I'm wondering if there is a way to handle sparse rows/columns
entirely differently that uses vrgather instead of vlx (note: I have
no idea if it's possible, as I've never tried to implement sparse
matrix code). However, vlx and vrgather are very similar (one applies
to memory byte addresses, the other applies to vector elements, so
obviously there is some difference).

I am not suggesting implicit scaling but programmer specified scaling amount in the instruction (0/1/2/3 bit shift). Based on knowledge of the matrix element data type, the programmer can certainly specify a shift amount.
You are overthinking this. Well-designed vector units may be able to
fuse/chain a vssl.vv instruction with a vlx instruction. You shouldn't
think one instruction must run to completion before the next one

Note that in my sparse matrix vector multiply code, the innermost loop is 9 instructions including the scaling instruction. If this were removed, it reduces dynamic instruction count by about 10%. It seems to be a valuable saving.
Yes, it would save instruction issue bandwidth. On simple vector
implementations, it may speed things up. On complex ones, it shouldn't
make a difference as this will be an anticipated frequently occuring


Join { to automatically receive all group messages.