Clarification of Fractional LMUL requirements, and the storage/derivation of ELEN/SEWLMUL1MAX values
Thanks for the suggestions.
I tried to clean up and clarify this section: https://github.com/riscv/riscv-v-spec/commit/3cc98373f954df996c2d7973ef0fc38bc866f620 Krste | Hi,On Wed, 8 Sep 2021 15:25:39 -0700, "Gregory Kielian via lists.riscv.org" <gkielian=google.com@...> said: | Re-reading section 3.3.2 in the documentation (link), would like to propose adding goal, constraints, steps for implementing Fraction LMUL | I think adding these would really help clarify both the VFLMUL idea and implementation. I've been having extensive discussions around this, | re-reading this section a bunch, and thinking it would probably be good to add additional lines to the vspec.adoc to clarify the idea. | Sharing my tentative understanding below (and some questions on ELEN and SEWLMUL1MAX), derived mainly from looking at the spike lmul checks and | 3.3.2, curious as well if this captures the main intent of the fractional-lmul or there are aspects which are missing or equations require | adjustment: | • Goal clarification: | □ Fractional LMUL allows the result of widening operations to be definitively contained within a single vector register. | □ The advantage this provides seems (at least) two-fold | ☆ Any register is usable for widening with fractional LMULs (opposed to integer LMUL can be used only for registers evenly divisible | by the LMUL, e.g. v0, v8, v16, v24 for LMUL = 8). | ☆ Related to above, less registers are locked down by the application of widening, reducing register availability bottlenecks and the | needed number of stores/loads to-and-from memory. | □ In order to ensure that the result of widening operations can be contained in a single register, there are certain constraints (see | below) | • Constraints: | □ SEW <= ELEN*VFLMUL | ☆ Example 1: ELEN = e32, VFLMUL= ⅛ | ○ SEW <= ELEN*VLMUL = 4, VFLMUL ⅛ illegal for ELEN e32 | ☆ Example 2: ELEN e32, VFLMUL = ¼ | ○ SEW <= ELEN*VFLMUL = 8, therefore SEW must be e8 | ☆ Example 2: ELEN e32, VFLMUL = ½ | ○ SEW <= ELEN*VLMUL = 16, therefore SEW must either e8, e16 | □ Note: For architectures where ELEN > SEWLMUL1MAX, one would go throught the same exercise as above but with s/ELEN/SEWLMUL1MAX. | • Where to store/how-to-derive of ELEN and/or SEWLMUL1MAX: | □ ELEN/SEWLMUL1MAX are not stored in CSR’s, ELEN may be derived from the extension: | ☆ Example: ELEN = e32 for ZVE32x | □ SEWLMUL1MAX storage/derivation questions (this particular one is unclear to me): | ☆ If ELEN > SEWLMUL1MAX, how would one derive SEWLMUL1MAX from the ELEN? | ☆ Where (e.g. any CSR) would the SEWLMUL1MAX be stored? | ☆ Would this be derived from knowing the specific extension and perhaps the Vlenb and held in a special architecture specific | register? | Suggested edits for discussion: | • Adding SEW equation, possibly in mathematical notation, to clarify the policy | • Adding some examples to clarify the policy | • Adding goal/intent and advantages of using fractional-lmul vs lmul and vice versa | Would be happy to contribute pull requests after confirming whether this understanding is correct, and clarifying questions about the | SEWLMUL1MAX/ELEN derivation/storage. | All the best, | Gregory | |
|