i totally agree. if this is done, then instructions like: count bits, etc can directly apply to the mask register.

also, from a hardware implementation, the VM register can be implemented with LATÇHES. this facilitates a better implementation (imho) for operations under mask

and yes load and store VM are required


If separate loads and stores are introduced for mask, then separate vmask register can be introduced to avoid dual use of v0 (as a regular vector register and as a mask register) and its complications.


