i was made aware of this paper. risc-v vectors are mentioned.

one of the key conclusions are (from the abstract)

Our experiments show that VLA code reaches about 90% of the performance of vector length specific code, i.e. a 10% overhead is inferred due to global predication of instructions. Furthermore, we show that code performance is not increasing proportionally with increasing vector lengths due to the higher memory demands. 

my experience is just the opposite. (based on memory system design)

i am curious to hear other opinions

