This mask loading instruction is exactly the one we look forward.
I got some confusion on hiccups, why machines with internal dynamic data striping require hiccups whenever used as a mask? Does it mean we have different arrangements of mask register and normal vector registers then we have to distinguish it while loading? How the proposed instructions help reduce these hiccups?