Vector Byte Arrangement in Wide Implementations
I've been thinking through the cases where a wide implementation that wants "slices" could have to introduce a hiccup to rearrange bytes because of an EEW change (since SLEN is gone). The ones I know of, with comments, are:
So, the first question is whether these are all the cases. Are there any other cases where the EEW of a register will change?
The second question is whether to provide for the spilling and filling across compilation units. The problem is that the whole register loads all have a size hint. If the desired load EEW is not known, the instruction must still state a load EEW hint and the hardware has no way to know that it would be a good idea to use its prediction on this load.
It might be nice here to have a load type that indicated that the hardware ought to predict the next use type by associating it with the type in use at the time of the previous whole register store of the same register. It
might work to have a separate whole register store encoding that indicated the next whole register load could predict the current micro-architectural value instead of using the size hint in the load. The separate store is easier to encode.
Any thoughts on how a predictor might know without adding such a load?