Clarification of Fractional LMUL requirements, and the storage/derivation of ELEN/SEWLMUL1MAX values
Hi,
Re-reading section 3.3.2 in the documentation (link), would like to propose adding goal, constraints, steps for implementing Fraction LMUL
I think adding these would really help clarify both the VFLMUL idea and implementation. I've been having extensive discussions around this, re-reading this section a bunch, and thinking it would probably be good to add additional lines to the vspec.adoc to clarify the idea.
Sharing my tentative understanding below (and some questions on ELEN and SEWLMUL1MAX), derived mainly from looking at the spike lmul checks and 3.3.2, curious as well if this captures the main intent of the fractional-lmul or there are aspects which are missing or equations require adjustment:
Goal clarification:
Fractional LMUL allows the result of widening operations to be definitively contained within a single vector register.
The advantage this provides seems (at least) two-fold
Any register is usable for widening with fractional LMULs (opposed to integer LMUL can be used only for registers evenly divisible by the LMUL, e.g. v0, v8, v16, v24 for LMUL = 8).
Related to above, less registers are locked down by the application of widening, reducing register availability bottlenecks and the needed number of stores/loads to-and-from memory.
- In order to ensure that the result of widening operations can be contained in a single register, there are certain constraints (see below)
Constraints:
SEW <= ELEN*VFLMUL
Example 1: ELEN = e32, VFLMUL= ⅛
SEW <= ELEN*VLMUL = 4, VFLMUL ⅛ illegal for ELEN e32
Example 2: ELEN e32, VFLMUL = ¼
SEW <= ELEN*VFLMUL = 8, therefore SEW must be e8
Example 2: ELEN e32, VFLMUL = ½
SEW <= ELEN*VLMUL = 16, therefore SEW must either e8, e16
Note: For architectures where ELEN > SEWLMUL1MAX, one would go throught the same exercise as above but with s/ELEN/SEWLMUL1MAX.
Where to store/how-to-derive of ELEN and/or SEWLMUL1MAX:
ELEN/SEWLMUL1MAX are not stored in CSR’s, ELEN may be derived from the extension:
Example: ELEN = e32 for ZVE32x
SEWLMUL1MAX storage/derivation questions (this particular one is unclear to me):
If ELEN > SEWLMUL1MAX, how would one derive SEWLMUL1MAX from the ELEN?
Where (e.g. any CSR) would the SEWLMUL1MAX be stored?
Would this be derived from knowing the specific extension and perhaps the Vlenb and held in a special architecture specific register?
Suggested edits for discussion:
Adding SEW equation, possibly in mathematical notation, to clarify the policy
Adding some examples to clarify the policy
Adding goal/intent and advantages of using fractional-lmul vs lmul and vice versa
All the best,
Gregory
I tried to clean up and clarify this section:
https://github.com/riscv/riscv-v-spec/commit/3cc98373f954df996c2d7973ef0fc38bc866f620
Krste
| Hi,On Wed, 8 Sep 2021 15:25:39 -0700, "Gregory Kielian via lists.riscv.org" <gkielian=google.com@...> said:
| Re-reading section 3.3.2 in the documentation (link), would like to propose adding goal, constraints, steps for implementing Fraction LMUL
| I think adding these would really help clarify both the VFLMUL idea and implementation. I've been having extensive discussions around this,
| re-reading this section a bunch, and thinking it would probably be good to add additional lines to the vspec.adoc to clarify the idea.
| Sharing my tentative understanding below (and some questions on ELEN and SEWLMUL1MAX), derived mainly from looking at the spike lmul checks and
| 3.3.2, curious as well if this captures the main intent of the fractional-lmul or there are aspects which are missing or equations require
| adjustment:
| • Goal clarification:
| □ Fractional LMUL allows the result of widening operations to be definitively contained within a single vector register.
| □ The advantage this provides seems (at least) two-fold
| ☆ Any register is usable for widening with fractional LMULs (opposed to integer LMUL can be used only for registers evenly divisible
| by the LMUL, e.g. v0, v8, v16, v24 for LMUL = 8).
| ☆ Related to above, less registers are locked down by the application of widening, reducing register availability bottlenecks and the
| needed number of stores/loads to-and-from memory.
| □ In order to ensure that the result of widening operations can be contained in a single register, there are certain constraints (see
| below)
| • Constraints:
| □ SEW <= ELEN*VFLMUL
| ☆ Example 1: ELEN = e32, VFLMUL= ⅛
| ○ SEW <= ELEN*VLMUL = 4, VFLMUL ⅛ illegal for ELEN e32
| ☆ Example 2: ELEN e32, VFLMUL = ¼
| ○ SEW <= ELEN*VFLMUL = 8, therefore SEW must be e8
| ☆ Example 2: ELEN e32, VFLMUL = ½
| ○ SEW <= ELEN*VLMUL = 16, therefore SEW must either e8, e16
| □ Note: For architectures where ELEN > SEWLMUL1MAX, one would go throught the same exercise as above but with s/ELEN/SEWLMUL1MAX.
| • Where to store/how-to-derive of ELEN and/or SEWLMUL1MAX:
| □ ELEN/SEWLMUL1MAX are not stored in CSR’s, ELEN may be derived from the extension:
| ☆ Example: ELEN = e32 for ZVE32x
| □ SEWLMUL1MAX storage/derivation questions (this particular one is unclear to me):
| ☆ If ELEN > SEWLMUL1MAX, how would one derive SEWLMUL1MAX from the ELEN?
| ☆ Where (e.g. any CSR) would the SEWLMUL1MAX be stored?
| ☆ Would this be derived from knowing the specific extension and perhaps the Vlenb and held in a special architecture specific
| register?
| Suggested edits for discussion:
| • Adding SEW equation, possibly in mathematical notation, to clarify the policy
| • Adding some examples to clarify the policy
| • Adding goal/intent and advantages of using fractional-lmul vs lmul and vice versa
| Would be happy to contribute pull requests after confirming whether this understanding is correct, and clarifying questions about the
| SEWLMUL1MAX/ELEN derivation/storage.
| All the best,
| Gregory
|