Sequence to insert an element


Roger Ferrer Ibanez
 

Hi,

what is a reasonable sequence to insert an element into an arbitrary position in the vector?

I considered the following sequence (assume the input vector is v12)

vid.v v1
vmseq.vx v0, v1, <index>
vmerge.vxm v1, v12, <value>, v0.t

But I think this is problematic for sew=8 as there may be overflow if vlmax(sew=8)>256.

It may be possible for lmul={1,2,4} sew=8 to compute vid and vmseq using lmul={2,4,8} sew=16, respectively but the lmul=8,sew=8 case won't work as there is no lmul=16,sew=16.

I also came up with this other sequence but doesn't look great to me:

vslidedown.vx v1, v12, <index>
vmv.s.x v1, <value>
vslideup.vx v1, v1, <index>
vsetvl x0, <index>,sew,lmul,tu,mu
vmv.v.v v1, v12    # should leave the tail undisturbed

Thanks a lot,

--
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


http://bsc.es/disclaimer


David Horner
 

On 2020-10-16 11:10 a.m., Roger Ferrer Ibanez wrote:
Hi,

what is a reasonable sequence to insert an element into an arbitrary position in the vector?

I considered the following sequence (assume the input vector is v12)

vid.v v1
vmseq.vx v0, v1, <index>
vmerge.vxm v1, v12, <value>, v0.t

But I think this is problematic for sew=8 as there may be overflow if vlmax(sew=8)>256.
The mask could be built with sew=16, as the mask is ordinal based.

And there are tricks to set it up, for example a direct load (register move) to v0 to set the correct bit.

The mask could be built in v2 and transfered under mask to clear lower or higher aliasing.




It may be possible for lmul={1,2,4} sew=8 to compute vid and vmseq using lmul={2,4,8} sew=16, respectively but the lmul=8,sew=8 case won't work as there is no lmul=16,sew=16.

I also came up with this other sequence but doesn't look great to me:

vslidedown.vx v1, v12, <index>
vmv.s.x v1, <value>
vslideup.vx v1, v1, <index>
vsetvl x0, <index>,sew,lmul,tu,mu
vmv.v.v v1, v12    # should leave the tail undisturbed

Thanks a lot,