A simple fractional LMUL proposal


Krste Asanovic
 

I've been wading through the fractional LMUL discussion on github but
believe the simple basic solution below meets the immediate needs,
without blocking possible reuse of unused register fields later. I
want to put this out there to provide a baseline strawman against
which to compare the other more exotic variants.

The proposed mapping is given below.

* For machines with SLEN=VLEN, the microarchitectural modification to
support fractional LMUL is very minor. The main changes are to add
the additional bit in vtype to support the additional LMUL values, and
to have setvl calculations take the fractional LMUL into account when
calculating VLMAX and setting vl. The only effect is to execute
instructions with shorter vl than, but otherwise identically to,
existing LMULs.

* For machines with SLEN<VLEN, the simple "reduce VL" doesn't quite
work. Instead each SLEN-wide partition has to reduce VL locally. This
is shown in the figures below. Even this is not too large a change as
datapath wiring stays the same and it's mainly an issue of turning off
unused portions of the datapath, though in new patterns.

I'm not in favor of shifting the used portion to the top of the
register to enable scalar values or short vectors to use the space
below, as this would change the way fractional LMUL vector
instructions read out values and complicate chaining and interlock
checks for simple baselines. I believe there are cleaner
register-bit-scavenging schemes possible when we have a larger number
of architectural register names available.

The unused portions would be affected by tail undisturbed/agnostic
setting (not mask undisturbed/agnostic setting).

LMUL[2:0] encoding

111 LMUL=8
110 LMUL=4
101 LMUL=2
100 LMUL=1
011 LMUL=1/2
010 LMUL=1/4
001 LMUL=1/8
000 (reserved)

We limit mandatory supported SEW at different LMUL to following
values:

LMUL = 1/2, SEW <= ELEN/2
LMUL = 1/4, SEW <= ELEN/4
LMUL = 1/8, SEW <= ELEN/8

i.e., SEW <= LMUL*ELEN, for LMUL<=1 and ELEN @ LMUL=1
(some systems can have different ELEN for LMUL>1)

Example layout, drawn with two ASCII characters per byte
horizontally. This is drawn to show SLEN<VLEN (but just considering
the right 128b shows how SLEN=VLEN would look).

VLEN=256b, SLEN=128b

SEW/LMUL=4

FEDCBA9876543210FEDCBA9876543210|FEDCBA9876543210FEDCBA9876543210 Mask

2F2E2D2C2B2A29282726252423222120|0F0E0D0C0B0A09080706050403020100 SEW=8b, LMUL=2
3F3E3D3C3B3A39383736353433323130|1F1E1D1C1B1A19181716151413121110

--27--26--25--24--23--22--21--20|--07--06--05--04--03--02--01--00 SEW=16b, LMUL=4
--2F--2E--2D--2C--2B--2A--29--28|--0F--0E--0D--0C--0B--0A--09--08
--37--36--35--34--33--32--33--30|--17--16--15--14--13--12--11--10
--3F--3E--3D--3C--3B--3A--39--38|--1F--1E--1D--1C--1B--1A--19--18

------23------22------21------20|------03------02------01------00 SEW=32b, LMUL=8
------27------26------25------24|------07------06------05------04
------2B------2A------29------28|------0B------0A------09------08
------2F------2E------2D------2C|------0F------0E------0D------0C
....


SEW/LMUL=8

1F1E1D1C1B1A19181716151413121110|-F-E-D-C-B-A-9-8-7-6-5-4-3-2-1-0 Mask

1F1E1D1C1B1A19181716151413121110|-F-E-D-C-B-A-9-8-7-6-5-4-3-2-1-0 SEW=8b, LMUL=1

--17--16--15--14--13--12--11--10|---7---6---5---4---3---2---1---0 SEW=16b, LMUL=2
--1F--1E--1D--1C--1B--1A--19--18|---F---E---D---C---B---A---9---8


------13------12------11------10|-------3-------2-------1-------0 SEW=32b, LMUL=4
------17------16------15------14|-------7-------6-------5-------4
...


--------------11--------------10|---------------1---------------0 SEW=64b, LMUL=8
...


SEW/LMUL=16

xxxxxxxxxxxxxxxx-F-E-D-C-B-A-9-8|xxxxxxxxxxxxxxxx-7-6-5-4-3-2-1-0 Mask

xxxxxxxxxxxxxxxx-F-E-D-C-B-A-9-8|xxxxxxxxxxxxxxxx-7-6-5-4-3-2-1-0 SEW=8b, LMUL=1/2

---F---E---D---C---B---A---9---8|---7---6---5---4---3---2---1---0 SEW=16b, LMUL=1


-------B-------A-------9-------8|-------3-------2-------1-------0 SEW=32b, LMUL=2
-------F-------E-------D-------C|-------7-------6-------5-------4


---------------9---------------8|---------------1---------------0 SEW=64b, LMUL=4
---------------B---------------A|---------------3---------------2
...


SEW/LMUL=32

xxxxxxxxxxxxxxxxxxxxxxxx-7-6-5-4|xxxxxxxxxxxxxxxxxxxxxxxx-3-2-1-0 Mask

xxxxxxxxxxxxxxxxxxxxxxxx-7-6-5-4|xxxxxxxxxxxxxxxxxxxxxxxx-3-2-1-0 SEW=8b, LMUL=1/4

xxxxxxxxxxxxxxxxx--7---6---5---4|xxxxxxxxxxxxxxxx---3---2---1---0 SEW=16b, LMUL=1/2


-------7-------6-------5-------4|-------3-------2-------1-------0 SEW=32b, LMUL=1


---------------5---------------4|---------------1---------------0 SEW=64b, LMUL=2
---------------7---------------6|---------------3---------------2

SEW/LMUL=64

xxxxxxxxxxxxxxxxxxxxxxxxxxxx-3-2|xxxxxxxxxxxxxxxxxxxxxxxxxxxx-1-0 Mask

xxxxxxxxxxxxxxxxxxxxxxxxxxxx-3-2|xxxxxxxxxxxxxxxxxxxxxxxxxxxx-1-0 SEW=8b, LMUL=1/8

xxxxxxxxxxxxxxxxxxxxxxxx---3---2|xxxxxxxxxxxxxxxxxxxxxxxx---1---0 SEW=16b, LMUL=1/4


xxxxxxxxxxxxxxxx-------3-------2|xxxxxxxxxxxxxxxx-------1-------0 SEW=32b, LMUL=1/2


---------------3---------------2|---------------1---------------0 SEW=64b, LMUL=1



Krste

Join {tech-vector-ext@lists.riscv.org to automatically receive all group messages.