Vector Byte Arrangement in Wide Implementations

Bill Huffman

I've been thinking through the cases where a wide implementation that wants "slices" could have to introduce a hiccup to rearrange bytes because of an EEW change (since SLEN is gone).  The ones I know of, with comments, are:

  • The programmer intended to read different size elements than were written
    • This should be extremely rare.  There are lots of things to manage - such as the vector length changing.
    • The hiccup will simply happen.
  • The compiler is spilling and filling vector registers within a compilation unit
    • The store should be a whole register store and the load should be a whole register load with a size hint for the next use and the hiccup will be avoided.
    • Filling with the wrong size will be a compiler performance bug
  • The compiler is spilling and filling vector registers across compilation units
    • This probably happens only if there are callee-saved vector registers
    • Is IPA realistic for this case, or will it happen any time there are callee-saved vector registers regardless of compilation units?
    • If it does happen, hardware may want to predict the correct size for the fill
    • With the current instructions, I don't think there's a way to make that happen (see below)
  • The OS is swapping processes
    • This is rare and we will live with the hiccup.

So, the first question is whether these are all the cases.  Are there any other cases where the EEW of a register will change?

The second question is whether to provide for the spilling and filling across compilation units.  The problem is that the whole register loads all have a size hint.  If the desired load EEW is not known, the instruction must still state a load EEW hint and the hardware has no way to know that it would be a good idea to use its prediction on this load.

It might be nice here to have a load type that indicated that the hardware ought to predict the next use type by associating it with the type in use at the time of the previous whole register store of the same register.  It might work to have a separate whole register store encoding that indicated the next whole register load could predict the current micro-architectural value instead of using the size hint in the load.  The separate store is easier to encode.

Any thoughts on how a predictor might know without adding such a load?


Join to automatically receive all group messages.