Re: 64-bit instruction encoding wish list

Claire Wolf <claire@...>

to me it seems like reading the dest element index from a scalar reg sounds like a significant microarchitectural overhead. can you describe why this is needed?

I would assume that use cases for this replace sequences of reduce operations interleaved with vector slide operations. If the reduction is a multi-cycle op and the slide is a single-cycle op then I would assume that getting rid of the single-cycle op won't change much in terms of performance. and if it's just to squeeze out the last bit of extra performance then maybe it would be an alternative to implement instruction function between reduce and vector slide (although that might cause issues with instruction throughput if some of the instructions are already 64-bit wide).

On Wed, 11 Mar 2020 at 16:15, Nagendra Gulur <> wrote:
How about if the destination element number came from a scalar register? So we will need only 5 bits to specify the x register. 

This may even work better than hard coding the destination inside the vector reduction instruction permitting software to dynamically control the destination. 

Best regards 

On Wed, Mar 11, 2020 at 9:11 AM Claire Wolf <claire@...> wrote:
regarding vector reduction destination: the V spec seems to allow for really large vector machines with thousands of vector elements. I'm not sure what the right bit width for the field with the reduction destination would be.

On Wed, 11 Mar 2020 at 14:57, Nagendra Gulur <> wrote:
It appears I can not edit the wiki. But I can clarify one item.

Regarding "Indexed memory accesses that implicitly scale the index by SEW/8":
Explanation: In scientific sparse matrix codes (and perhaps also DNN codes), sparse matrices are represented by column indices of non-zero values. In such cases, the loaded indices must be converted to element address offsets by scaling (left shifting) the indices by 0 / 1 / 2 / 3 positions. For eg: if SEW=32, then scale the indices by 4 (left shift by 2). The idea of the instruction capability is to specify this scaling behavior -- it is not always desirable to have scaling going on, so there needs to be a way for the instruction to specify if and what scaling is to be done. Note that this scaling operation is tied to vector loads that load the index data and not to the indexed vector loads that use these (now-scaled) indices.

I am not sure what Andrew had in mind regarding the other index width topic listed.

Since I can not edit the wiki, I have to raise another item for 64-bit encoding: vector reduction destination. Would it be possible to specify vector reduction destination explicitly in the instruction rather than always the implicit vd[0]? 

Join { to automatically receive all group messages.