MLEN=1 update


Krste Asanovic
 

I've made a major update to mask encoding, pushed to repo.

The earlier change to support fractional LMUL effectively "broke" the
earlier mask encoding. The new scheme is simpler, but is different.

Please review and comment.

Krste


Nick Knight
 

Hi Krste,

It seems that many (all?) masked instructions are now allowed to overwrite the mask register. This new scheme would indeed be simpler --- is this true? Can you give any high-level insight into why this is possible now and wasn't before?

Thanks,
Nick Knight


On Wed, May 13, 2020 at 12:45 PM Krste Asanovic <krste@...> wrote:

I've made a major update to mask encoding, pushed to repo.

The earlier change to support fractional LMUL effectively "broke" the
earlier mask encoding.  The new scheme is simpler, but is different.

Please review and comment.

Krste






Nick Knight
 

Never mind, I misread the git commit log. It appears that no instructions can overwrite the mask register. Sorry.

On Wed, May 13, 2020 at 1:44 PM Nick Knight via lists.riscv.org <nick.knight=sifive.com@...> wrote:
Hi Krste,

It seems that many (all?) masked instructions are now allowed to overwrite the mask register. This new scheme would indeed be simpler --- is this true? Can you give any high-level insight into why this is possible now and wasn't before?

Thanks,
Nick Knight

On Wed, May 13, 2020 at 12:45 PM Krste Asanovic <krste@...> wrote:

I've made a major update to mask encoding, pushed to repo.

The earlier change to support fractional LMUL effectively "broke" the
earlier mask encoding.  The new scheme is simpler, but is different.

Please review and comment.

Krste






Krste Asanovic
 

Right, now there’s just a single blanket statement that no vector instruction can overwrite the mask register 

Krste (on iPhone, forgive terseness)



On May 13, 2020, at 1:53 PM, Nick Knight <nick.knight@...> wrote:


Never mind, I misread the git commit log. It appears that no instructions can overwrite the mask register. Sorry.

On Wed, May 13, 2020 at 1:44 PM Nick Knight via lists.riscv.org <nick.knight=sifive.com@...> wrote:
Hi Krste,

It seems that many (all?) masked instructions are now allowed to overwrite the mask register. This new scheme would indeed be simpler --- is this true? Can you give any high-level insight into why this is possible now and wasn't before?

Thanks,
Nick Knight

On Wed, May 13, 2020 at 12:45 PM Krste Asanovic <krste@...> wrote:

I've made a major update to mask encoding, pushed to repo.

The earlier change to support fractional LMUL effectively "broke" the
earlier mask encoding.  The new scheme is simpler, but is different.

Please review and comment.

Krste






David Horner
 

I applaud the ordinal nature of the mask structure that is independent of SEW and LMUL.

The problem that I have is the mismatch between units of measure, which is bytes in element lengths and bits in the mask. This unnecessarily compounds  skew in the ordinal alignment.

The mask internal structure, although it is simpler and simpler to express, the byte/bit mismatch does:
- not co-related to the reality of element units
- not provide a usable mapping of elements across physical registers
- not optimal for element to mask analysis
- not optimize wiring for any element width, indeed it is poor for all Element Widths

It is however good for vfirst and related mask ops.

I botched the formula in #448
I should have proposed
bit_location_of_mask [i] = ( i * 8 ) % VLEN  + floor (( i * 8 ) / VLEN) /8

Which aligns mask bits at corresponding byte alignment , with each bit in the byte identifying one of the 8 physical registers.

Why choose byte for the alignment?
Because
1) byte is the smallest unit for data alignment and
2) byte has the highest cardinality for max LMUL and a given VLEN.
    For each successive element size correlation drops off by a factor of 4.

Granted muxing onto a bus and muxing to the operation units will substantially mitigate the skew but not eliminate entirely as the sub-byte clustering has no masking purpose.
As noted above it does have value in scanning for first set mask bit.
What is the appropriate trade-off from microarch?
I will defer to others.

But the software benefits are mentioned in the points above.

I spent an inordinate amount of time deriving a wiring metric based on
SUM ( abs(bit location of element [i] – bit location of mask [i]), where bit location is modulo VLEN)

No surprise that it shows a substantial benefit of the byte alignment over strict bit order.
And no surprise that the variation is proportional to VLEN**2.
Nor that for EEW of byte and byte optimized formula the horizontal cost is effectively zero.

Half and word, etc. benefit too, but of course with lesser weighting.