Clarification of Fractional LMUL requirements, and the storage/derivation of ELEN/SEWLMUL1MAX values


Krste Asanovic
 

Thanks for the suggestions.

I tried to clean up and clarify this section:

https://github.com/riscv/riscv-v-spec/commit/3cc98373f954df996c2d7973ef0fc38bc866f620

Krste

On Wed, 8 Sep 2021 15:25:39 -0700, "Gregory Kielian via lists.riscv.org" <gkielian=google.com@...> said:
| Hi,
| Re-reading section 3.3.2 in the documentation (link), would like to propose adding goal, constraints, steps for implementing Fraction LMUL

| I think adding these would really help clarify both the VFLMUL idea and implementation. I've been having extensive discussions around this,
| re-reading this section a bunch, and thinking it would probably be good to add additional lines to the vspec.adoc to clarify the idea.

| Sharing my tentative understanding below (and some questions on ELEN and SEWLMUL1MAX), derived mainly from looking at the spike lmul checks and
| 3.3.2, curious as well if this captures the main intent of the fractional-lmul or there are aspects which are missing or equations require
| adjustment:

| • Goal clarification:

| □ Fractional LMUL allows the result of widening operations to be definitively contained within a single vector register.

| □ The advantage this provides seems (at least) two-fold

| ☆ Any register is usable for widening with fractional LMULs (opposed to integer LMUL can be used only for registers evenly divisible
| by the LMUL, e.g. v0, v8, v16, v24 for LMUL = 8).

| ☆ Related to above, less registers are locked down by the application of widening, reducing register availability bottlenecks and the
| needed number of stores/loads to-and-from memory.

| □ In order to ensure that the result of widening operations can be contained in a single register, there are certain constraints (see
| below)
| • Constraints:

| □ SEW <= ELEN*VFLMUL

| ☆ Example 1: ELEN = e32, VFLMUL= ⅛

| ○ SEW <= ELEN*VLMUL = 4, VFLMUL ⅛ illegal for ELEN e32

| ☆ Example 2: ELEN e32, VFLMUL = ¼

| ○ SEW <= ELEN*VFLMUL = 8, therefore SEW must be e8

| ☆ Example 2: ELEN e32, VFLMUL = ½ 

| ○ SEW <= ELEN*VLMUL = 16, therefore SEW must either e8, e16

| □ Note: For architectures where ELEN > SEWLMUL1MAX, one would go throught the same exercise as above but with s/ELEN/SEWLMUL1MAX.

| • Where to store/how-to-derive of ELEN and/or SEWLMUL1MAX:

| □ ELEN/SEWLMUL1MAX are not stored in CSR’s, ELEN may be derived from the extension:

| ☆ Example: ELEN = e32 for ZVE32x

| □ SEWLMUL1MAX storage/derivation questions (this particular one is unclear to me):

| ☆ If ELEN > SEWLMUL1MAX, how would one derive SEWLMUL1MAX from the ELEN?

| ☆ Where (e.g. any CSR) would the SEWLMUL1MAX be stored?

| ☆ Would this be derived from knowing the specific extension and perhaps the Vlenb and held in a special architecture specific
| register?

| Suggested edits for discussion: 

| • Adding SEW equation, possibly in mathematical notation, to clarify the policy

| • Adding some examples to clarify the policy

| • Adding goal/intent and advantages of using fractional-lmul vs lmul and vice versa

| Would be happy to contribute pull requests after confirming whether this understanding is correct, and clarifying questions about the
| SEWLMUL1MAX/ELEN derivation/storage.

| All the best,

| Gregory

|

Join {tech-vector-ext@lists.riscv.org to automatically receive all group messages.