But I think this is problematic for sew=8 as there may be overflow if vlmax(sew=8)>256.
The mask could be built with sew=16, as the mask is ordinal based.
And there are tricks to set it up, for example a direct load (register move) to v0 to set the correct bit.
The mask could be built in v2 and transfered under mask to clear lower or higher aliasing.
It may be possible for lmul={1,2,4} sew=8 to compute vid and vmseq using lmul={2,4,8} sew=16, respectively but the lmul=8,sew=8 case won't work as there is no lmul=16,sew=16.
I also came up with this other sequence but doesn't look great to me: