This is an excellent point, but there are only 8 SSE/AVX/AVX2 registers in 32 bit mode and 16 in 64 bit.
Therefore a 32 bit RISC-V could use 32 bit VLEN and LMUL=4 to directly translate SSE code without stripmining, and a 64 bit RISC-V could use 64 bit VLEN and LMUL=2. For AVX/AVX2 VLEN=64 is required on 32 bit and VLEN=128 on 64 bit, using the same LMUL.
Similarly, 32 bit ARM NEON works as sixteen 128 bit registers or thirty two 64 bit registers. Thus a 32 bit RISC-V with VLEN=64 can directly translate NEON code using LMUL=1 or LMUL=2.
Aarch64 has thirty two registers of 128 bits each, which can also be treated as thirty two registers of 64 bits each (effectively just setting a smaller VL, the upper half is zeroed). So directly porting 64 bit ARM Advanced SIMD code does require 128 bit registers.
For maximum SIMD-porting compatibility with both ARM and x86 code a 64 bit RISC-V needs VLEN=128 but a 32 bit RISC-V is fine with VLEN=64.