Date   

Re: 回复:Re: 回复:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

Guy Lemieux
 

for zip, you don’t need to use vrgather. instead, use vector widening (eg vwaddu with x0) to double the element size (ensuring the MSBs are 0) for one half of the set.  
apply vslide1up and widening to the second set, then add the two sets (using original SEW).

for unzip, you can do the reverse. use narrowing for one half and vslide1dn (in SEW/2) + narrowing for the other half.

sorry, what is trn?

g


On Tue, Jun 15, 2021 at 12:08 AM Roger Ferrer Ibanez <roger.ferrer@...> wrote:










Hi,



I agree that computing those indexes is not always trivial



Some ideas you can consider (not claiming these are the most

efficient ways)






  • reverse is not too complex: vid.v + vrsub.vx using vl as the

    scalar to subtract




  • zip is harder: start with the "halved indexes" vid + vsrl.vi

    (0, 0, 1, 1, 2, 2, 3, 3, ...) then compute the "even elements"

    vector (0, 1, 0, 1, 0, 1, ...) and then multiply it (or if a

    power of two, shift) with the first index of the second vector

    (which maybe is vl/2 in your case). So you get 0, vl/2, 0, vl/2,

    0, vl/2, .... Then add this vector to the halved indexes so you

    get 0, vl/2, 1, 1+vl/2, 2, 2+vl/2, ...




  • unzip worst of the cases you can reverse what you did for zip




  • trn I don't have any ideas from the top of my head, /cc Romain

    Dolbeau who may recall how he worked around the cases in FFTW

    where he needed a trn-like operation




Hope this helps.



Kind regards,




On 15/6/21 8:14, Linjie Yu via

lists.riscv.org wrote:












Dear  Craig and Roger,







     ​    ​Thanks a lot for providing  me  good

solutionI have tried them,  they are all good solutions of 

upsample application.


    ​    ​ But, when it comes to other applications, such  as zip/unzip,

trn,reverse and  so on.  The Index value  is 
still difficult to be initialized.









Best Regards




Damon Yu




















------------------原始邮件

------------------




发送时间:06/11/21 18:51:12




主题:Re: 回复:[RISC-V] [tech-vector-ext] RISC-V Vector

Spec version 1.0-rc1-20210608




Hi Linjie,



I'm not sure I understood your question. I think a vid.v (with a vl of your choice) that then you

(logical) shift right 1 bit (vsrl.vi)

would generate an index like the one you have now the "index[]" array.




This looks like it does not require to hardcode any size and

you don't have to load a materialised value from memory (you

compute it instead).



Hope this helps.




Kind regards,




On 11/6/21 9:22, Linjie Yu via

lists.riscv.org wrote:












Hi, all







    I encountered a difficulty of applying

 "vrgather" instruction recently.  The details are shown blow:


    The date from source should be duplicated as pair

in a upsample application. 


     Eg:  src = [0, 1, 2, 3, 4, 5, 6, 7, 8]


           dst =  [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5,

5, 6, 6, 7, 7, 8, 8]


     So, my relazation is:


--------------------------------------------------------------------------------------------------


  int inex[64] = {0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5

........,31,31};// to be compatibled of all the VLEN(128

~ 1024)







   vfloat32m2_t data = vundefined_f32m2();




    vfloat32m1_t  zero =

( vfloat32m1_t )vmv_v_i_i32m1(0);


   while(length >0)


    {


         int gvl = vsetvl_e32m1(length);


        vuint32m2_t v_index = vle32_v_u32m2(index,

gvl);




         vfloat32m1_t src_data = vle32_v_f32m1(src,

gvl);


         data = vset_f32m2(data, src_data, 0);




         data  = vset_f32m2(data, zero, 1);


         vfloat32m2_t res =

vrrgather_vv_f32m2(data, v_index, gvl); 
 


         length -=gvl;




         src += gvl;




        vse32_v_f32m2(out, res, gvl);


        dst +=gvl;




    }


-----------------------------------------------------------------------------------------------

  


    As shown before, the index data should be

 initialized as the max VLEN to make the code  compatibled.


So do all the

applications, that need a constant.


    I think it is

contrary to the idea of RISC-V, that one code can run

on all the RISC-V hardware.  Does anyone have a better

method ? 













Best Regards




Damon Yu




















------------------原始邮件

------------------




发送时间:06/09/21 14:46:30




主题:[RISC-V] [tech-vector-ext] RISC-V Vector Spec

version 1.0-rc1-20210608





I've just tagged the first release candidate for v1.0 of the vector


spec in github.  PDF attached below.





I've included the TG agreed updates and handled almost all of the


outstanding issues for v1.0.  Thanks for all the feedback.





I would appreciate if folks could read through the whole thing and


give comments over email and through Github issues.  Please also


submit PRs for typos and other wording improvements.





I'd like to try and settle most concerns over email if possible, and


assume it'll take a little while for everyone to go through the doc.





I'll tentatively schedule a vector TG meeting on Friday June 25 to go


over issues best dealt with on a call.  I'm hoping we can enter public


review at the same point in time.  Remember, we don't have to reach


agreement on all the issues before starting public review, just be OK


as a group with putting this out for public review.





Krste
























-- Roger Ferrer Ibáñez - roger.ferrer@...Barcelona Supercomputing Center - Centro Nacional de Supercomputación








WARNING / LEGAL TEXT: This message is intended only for the use

of theindividual or entity to which it is addressed and may

containinformation which is privileged, confidential,

proprietary, or exemptfrom disclosure under applicable law. If

you are not the intendedrecipient or the person responsible for

delivering the message to theintended recipient, you are

strictly prohibited from disclosing,distributing, copying, or in

any way using this message. If you havereceived this

communication in error, please notify the sender anddestroy and

delete any copies you may have received.





http://www.bsc.es/disclaimer








-- 

Roger Ferrer Ibáñez - roger.ferrer@...

Barcelona Supercomputing Center - Centro Nacional de Supercomputación















Re: 回复:Re: 回复:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

Roger Ferrer Ibanez
 

Hi,

I agree that computing those indexes is not always trivial

Some ideas you can consider (not claiming these are the most efficient ways)

  • reverse is not too complex: vid.v + vrsub.vx using vl as the scalar to subtract
  • zip is harder: start with the "halved indexes" vid + vsrl.vi (0, 0, 1, 1, 2, 2, 3, 3, ...) then compute the "even elements" vector (0, 1, 0, 1, 0, 1, ...) and then multiply it (or if a power of two, shift) with the first index of the second vector (which maybe is vl/2 in your case). So you get 0, vl/2, 0, vl/2, 0, vl/2, .... Then add this vector to the halved indexes so you get 0, vl/2, 1, 1+vl/2, 2, 2+vl/2, ...
  • unzip worst of the cases you can reverse what you did for zip
  • trn I don't have any ideas from the top of my head, /cc Romain Dolbeau who may recall how he worked around the cases in FFTW where he needed a trn-like operation

Hope this helps.

Kind regards,

On 15/6/21 8:14, Linjie Yu via lists.riscv.org wrote:
Dear  Craig and Roger,

     ​    ​Thanks a lot for providing  me  good solutionI have tried them,  they are all good solutions of  upsample application.
    ​    ​ But, when it comes to other applications, such  as zip/unzip, trn,reverse and  so on.  The Index value  is still difficult to be initialized.

Best Regards
Damon Yu


------------------原始邮件 ------------------
发送时间:06/11/21 18:51:12
主题:Re: 回复:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

Hi Linjie,

I'm not sure I understood your question. I think a vid.v (with a vl of your choice) that then you (logical) shift right 1 bit (vsrl.vi) would generate an index like the one you have now the "index[]" array.

This looks like it does not require to hardcode any size and you don't have to load a materialised value from memory (you compute it instead).

Hope this helps.

Kind regards,

On 11/6/21 9:22, Linjie Yu via lists.riscv.org wrote:
Hi, all

    I encountered a difficulty of applying  "vrgather" instruction recently.  The details are shown blow:
    The date from source should be duplicated as pair in a upsample application. 
     Eg:  src = [0, 1, 2, 3, 4, 5, 6, 7, 8]
           dst =  [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8]
     So, my relazation is:
--------------------------------------------------------------------------------------------------
  int inex[64] = {0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 ........,31,31};// to be compatibled of all the VLEN(128 ~ 1024)

   vfloat32m2_t data = vundefined_f32m2();
    vfloat32m1_t  zero = ( vfloat32m1_t )vmv_v_i_i32m1(0);
   while(length >0)
    {
         int gvl = vsetvl_e32m1(length);
        vuint32m2_t v_index = vle32_v_u32m2(index, gvl);
         vfloat32m1_t src_data = vle32_v_f32m1(src, gvl);
         data = vset_f32m2(data, src_data, 0);
         data  = vset_f32m2(data, zero, 1);
         vfloat32m2_t res = vrrgather_vv_f32m2(data, v_index, gvl);  
         length -=gvl;
         src += gvl;
        vse32_v_f32m2(out, res, gvl);
        dst +=gvl;
    }
-----------------------------------------------------------------------------------------------   
    As shown before, the index data should be  initialized as the max VLEN to make the code  compatibled.
So do all the applications, that need a constant.
    I think it is contrary to the idea of RISC-V, that one code can run on all the RISC-V hardware.  Does anyone have a better method ? 

Best Regards
Damon Yu


------------------原始邮件 ------------------
发送时间:06/09/21 14:46:30
主题:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

I've just tagged the first release candidate for v1.0 of the vector
spec in github.  PDF attached below.

I've included the TG agreed updates and handled almost all of the
outstanding issues for v1.0.  Thanks for all the feedback.

I would appreciate if folks could read through the whole thing and
give comments over email and through Github issues.  Please also
submit PRs for typos and other wording improvements.

I'd like to try and settle most concerns over email if possible, and
assume it'll take a little while for everyone to go through the doc.

I'll tentatively schedule a vector TG meeting on Friday June 25 to go
over issues best dealt with on a call.  I'm hoping we can enter public
review at the same point in time.  Remember, we don't have to reach
agreement on all the issues before starting public review, just be OK
as a group with putting this out for public review.

Krste






-- Roger Ferrer Ibáñez - roger.ferrer@...Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of theindividual or entity to which it is addressed and may containinformation which is privileged, confidential, proprietary, or exemptfrom disclosure under applicable law. If you are not the intendedrecipient or the person responsible for delivering the message to theintended recipient, you are strictly prohibited from disclosing,distributing, copying, or in any way using this message. If you havereceived this communication in error, please notify the sender anddestroy and delete any copies you may have received.

http://www.bsc.es/disclaimer
-- 
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


回复:Re: 回复:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

Linjie Yu
 

Dear  Craig and Roger,

     ​    ​Thanks a lot for providing  me  good solutionI have tried them,  they are all good solutions of  upsample application.
    ​    ​ But, when it comes to other applications, such  as zip/unzip, trn,reverse and  so on.  The Index value  is still difficult to be initialized.

Best Regards
Damon Yu


------------------原始邮件 ------------------
发件人: <tech-vector-ext@...>
发送时间:06/11/21 18:51:12
收件人: <tech-vector-ext@...>
主题:Re: 回复:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

Hi Linjie,

I'm not sure I understood your question. I think a vid.v (with a vl of your choice) that then you (logical) shift right 1 bit (vsrl.vi) would generate an index like the one you have now the "index[]" array.

This looks like it does not require to hardcode any size and you don't have to load a materialised value from memory (you compute it instead).

Hope this helps.

Kind regards,

On 11/6/21 9:22, Linjie Yu via lists.riscv.org wrote:
Hi, all

    I encountered a difficulty of applying  "vrgather" instruction recently.  The details are shown blow:
    The date from source should be duplicated as pair in a upsample application. 
     Eg:  src = [0, 1, 2, 3, 4, 5, 6, 7, 8]
           dst =  [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8]
     So, my relazation is:
--------------------------------------------------------------------------------------------------
  int inex[64] = {0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 ........,31,31};// to be compatibled of all the VLEN(128 ~ 1024)

   vfloat32m2_t data = vundefined_f32m2();
    vfloat32m1_t  zero = ( vfloat32m1_t )vmv_v_i_i32m1(0);
   while(length >0)
    {
         int gvl = vsetvl_e32m1(length);
        vuint32m2_t v_index = vle32_v_u32m2(index, gvl);
         vfloat32m1_t src_data = vle32_v_f32m1(src, gvl);
         data = vset_f32m2(data, src_data, 0);
         data  = vset_f32m2(data, zero, 1);
         vfloat32m2_t res = vrrgather_vv_f32m2(data, v_index, gvl);  
         length -=gvl;
         src += gvl;
        vse32_v_f32m2(out, res, gvl);
        dst +=gvl;
    }
-----------------------------------------------------------------------------------------------   
    As shown before, the index data should be  initialized as the max VLEN to make the code  compatibled.
So do all the applications, that need a constant.
    I think it is contrary to the idea of RISC-V, that one code can run on all the RISC-V hardware.  Does anyone have a better method ? 

Best Regards
Damon Yu


------------------原始邮件 ------------------
发送时间:06/09/21 14:46:30
主题:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

I've just tagged the first release candidate for v1.0 of the vector
spec in github.  PDF attached below.

I've included the TG agreed updates and handled almost all of the
outstanding issues for v1.0.  Thanks for all the feedback.

I would appreciate if folks could read through the whole thing and
give comments over email and through Github issues.  Please also
submit PRs for typos and other wording improvements.

I'd like to try and settle most concerns over email if possible, and
assume it'll take a little while for everyone to go through the doc.

I'll tentatively schedule a vector TG meeting on Friday June 25 to go
over issues best dealt with on a call.  I'm hoping we can enter public
review at the same point in time.  Remember, we don't have to reach
agreement on all the issues before starting public review, just be OK
as a group with putting this out for public review.

Krste






-- Roger Ferrer Ibáñez - roger.ferrer@...Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of theindividual or entity to which it is addressed and may containinformation which is privileged, confidential, proprietary, or exemptfrom disclosure under applicable law. If you are not the intendedrecipient or the person responsible for delivering the message to theintended recipient, you are strictly prohibited from disclosing,distributing, copying, or in any way using this message. If you havereceived this communication in error, please notify the sender anddestroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Background for Policy/Workflow revisions on Github close concern.

David Horner
 

Andrew Waterman called Thursday and we discussed many issues including challenges with Issues in Github.

We determined that both were unaware of some relevant aspects [neither of us intentionally blind];  and neither were exaggerating the our concerns [neither of us Chicken Littles].

Andrew volunteered me to look into Github:

Ffunctionality and features that might diminish some of the contention/problems related to "workflow" and "issue expression".

I consider myself duly deputized to:

  Recommend modifications to a de facto proposal by Krste on how to use Github, and

  To make any further suggestions that I consider valuable.


But before I get into that specific concern, there are related concerns

a) timeliness and completion of TG/SIG/other minutes

b) github issues vs group/list discussions

c) Philosophies and World Views.


It is the latter that I want to expound to help us understand the origin of some of the conflicts and to plot directions for resolution/enhancement in ISA development.


On the call, I admitted to being Process and Enablement oriented,  Andrew to being Task and Results oriented.


Those familiar with Holism vs Reductionism, know/understand that

1) both views are necessary to achieve significant results

2) at any point [level] in analysis there is value in considering

   i) the holistic nature of the concern/situation/object; its behaviour as a whole and response in the context of its environment [a sloppy definition, I know] and

   ii) the examination of its components; what it is made of, and why the what is needed [if needed] for what it does and doesn't do. [an even worse definition].

3) [most] persons have a bias towards one or other mode of analysis, but no one [who analyzes] is wholly one or the other.


These points are applicable to Process|Enablement  vs Task|Results.

1) both views are necessary to achieve significant results

2) at any point [level] in accomplishment there is value in considering


i) the Process|Enablement context; what enables the activity/accomplishment to occur.
How can it be enhanced/leveraged to assist in providing the desired outcome.
Put into place [Enable] the resources to accomplish the goal and let the process deliver the results. 

ii) identify the Task/activity that will [hopefully] yield the result and work that activity to completion
Verify the results are as hope for and check the task off the list
[ or check the task off the list and move to the next Task of verifying the result].

3) [most] persons have a bias towards one or other mode of operation.


So, back to Github and Issue Resolution.

A. These different Word Views may conflict at any given point [level] of endeavor.


It occurred in this instance in the conflict of when an issue should be closed.

For the Task|Result oriented, Issue Resolution is closing the issue.
     Task done, move on to next.


For the Process|Enablement, Issue Resolution is completing all the concerns expressed by ensuring a process is in place to address them.

     Closing prematurely aborts that process.


B. Github does not provide  robust Worrkflow Lifecycle in its base product for Issues.

For pull requests there is a support structure with: validations,  review requests/responses, and "sign-off" provisions.

Issue Life-cycle support is bninary/open close status [no UnderInvestigation,RequestingConsutation,TentativeResolution, Awaiting SignOff].

It does provide links to/from other issues, and more significantly has a CloseRelatedIssue Button when a pull is being applied to the repository.

Andrew mentioned the strong temptation to use this button when "finalizing" a pull request.

External email scripting could provide some functionality that would enhance the workflow,

  but the use agreement appears to discourage if not explicitly forbid such "non-standard" interfaces.

Github may have a pay-for-feature process [I remember such before its recent acquisition], however,

  I believe we can make do with moderate behavioural changes and still be effective in serving all of the RISCV community.


C. Oh. Contrary to Andrew's interference from https://github.com/riscv/riscv-isa-manual/pull/657#issuecomment-858481023 ,

          I do not believe unilateral control is desirable nor essential to Issue management "workflow".


I will have specific proposals soon.


Re: 回复:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

Roger Ferrer Ibanez
 

Hi Linjie,

I'm not sure I understood your question. I think a vid.v (with a vl of your choice) that then you (logical) shift right 1 bit (vsrl.vi) would generate an index like the one you have now the "index[]" array.

This looks like it does not require to hardcode any size and you don't have to load a materialised value from memory (you compute it instead).

Hope this helps.

Kind regards,

On 11/6/21 9:22, Linjie Yu via lists.riscv.org wrote:
Hi, all

    I encountered a difficulty of applying  "vrgather" instruction recently.  The details are shown blow:
    The date from source should be duplicated as pair in a upsample application. 
     Eg:  src = [0, 1, 2, 3, 4, 5, 6, 7, 8]
           dst =  [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8]
     So, my relazation is:
--------------------------------------------------------------------------------------------------
  int inex[64] = {0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 ........,31,31};// to be compatibled of all the VLEN(128 ~ 1024)

   vfloat32m2_t data = vundefined_f32m2();
    vfloat32m1_t  zero = ( vfloat32m1_t )vmv_v_i_i32m1(0);
   while(length >0)
    {
         int gvl = vsetvl_e32m1(length);
        vuint32m2_t v_index = vle32_v_u32m2(index, gvl);
         vfloat32m1_t src_data = vle32_v_f32m1(src, gvl);
         data = vset_f32m2(data, src_data, 0);
         data  = vset_f32m2(data, zero, 1);
         vfloat32m2_t res = vrrgather_vv_f32m2(data, v_index, gvl);  
         length -=gvl;
         src += gvl;
        vse32_v_f32m2(out, res, gvl);
        dst +=gvl;
    }
-----------------------------------------------------------------------------------------------   
    As shown before, the index data should be  initialized as the max VLEN to make the code  compatibled.
So do all the applications, that need a constant.
    I think it is contrary to the idea of RISC-V, that one code can run on all the RISC-V hardware.  Does anyone have a better method ? 

Best Regards
Damon Yu


------------------原始邮件 ------------------
发送时间:06/09/21 14:46:30
主题:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

I've just tagged the first release candidate for v1.0 of the vector
spec in github.  PDF attached below.

I've included the TG agreed updates and handled almost all of the
outstanding issues for v1.0.  Thanks for all the feedback.

I would appreciate if folks could read through the whole thing and
give comments over email and through Github issues.  Please also
submit PRs for typos and other wording improvements.

I'd like to try and settle most concerns over email if possible, and
assume it'll take a little while for everyone to go through the doc.

I'll tentatively schedule a vector TG meeting on Friday June 25 to go
over issues best dealt with on a call.  I'm hoping we can enter public
review at the same point in time.  Remember, we don't have to reach
agreement on all the issues before starting public review, just be OK
as a group with putting this out for public review.

Krste






-- 
Roger Ferrer Ibáñez - roger.ferrer@...
Barcelona Supercomputing Center - Centro Nacional de Supercomputación


WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


回复:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

Linjie Yu
 

Hi, all

    I encountered a difficulty of applying  "vrgather" instruction recently.  The details are shown blow:
    The date from source should be duplicated as pair in a upsample application. 
     Eg:  src = [0, 1, 2, 3, 4, 5, 6, 7, 8]
           dst =  [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8]
     So, my relazation is:
--------------------------------------------------------------------------------------------------
  int inex[64] = {0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 ........,31,31};// to be compatibled of all the VLEN(128 ~ 1024)

   vfloat32m2_t data = vundefined_f32m2();
    vfloat32m1_t  zero = ( vfloat32m1_t )vmv_v_i_i32m1(0);
   while(length >0)
    {
         int gvl = vsetvl_e32m1(length);
        vuint32m2_t v_index = vle32_v_u32m2(index, gvl);
         vfloat32m1_t src_data = vle32_v_f32m1(src, gvl);
         data = vset_f32m2(data, src_data, 0);
         data  = vset_f32m2(data, zero, 1);
         vfloat32m2_t res = vrrgather_vv_f32m2(data, v_index, gvl);  
         length -=gvl;
         src += gvl;
        vse32_v_f32m2(out, res, gvl);
        dst +=gvl;
    }
-----------------------------------------------------------------------------------------------   
    As shown before, the index data should be  initialized as the max VLEN to make the code  compatibled.
So do all the applications, that need a constant.
    I think it is contrary to the idea of RISC-V, that one code can run on all the RISC-V hardware.  Does anyone have a better method ? 

Best Regards
Damon Yu


------------------原始邮件 ------------------
发件人: <tech-vector-ext@...>
发送时间:06/09/21 14:46:30
收件人: <tech-vector-ext@...>
主题:[RISC-V] [tech-vector-ext] RISC-V Vector Spec version 1.0-rc1-20210608

I've just tagged the first release candidate for v1.0 of the vector
spec in github.  PDF attached below.

I've included the TG agreed updates and handled almost all of the
outstanding issues for v1.0.  Thanks for all the feedback.

I would appreciate if folks could read through the whole thing and
give comments over email and through Github issues.  Please also
submit PRs for typos and other wording improvements.

I'd like to try and settle most concerns over email if possible, and
assume it'll take a little while for everyone to go through the doc.

I'll tentatively schedule a vector TG meeting on Friday June 25 to go
over issues best dealt with on a call.  I'm hoping we can enter public
review at the same point in time.  Remember, we don't have to reach
agreement on all the issues before starting public review, just be OK
as a group with putting this out for public review.

Krste







RISC-V Vector Spec version 1.0-rc1-20210608

Krste Asanovic
 

I've just tagged the first release candidate for v1.0 of the vector
spec in github. PDF attached below.

I've included the TG agreed updates and handled almost all of the
outstanding issues for v1.0. Thanks for all the feedback.

I would appreciate if folks could read through the whole thing and
give comments over email and through Github issues. Please also
submit PRs for typos and other wording improvements.

I'd like to try and settle most concerns over email if possible, and
assume it'll take a little while for everyone to go through the doc.

I'll tentatively schedule a vector TG meeting on Friday June 25 to go
over issues best dealt with on a call. I'm hoping we can enter public
review at the same point in time. Remember, we don't have to reach
agreement on all the issues before starting public review, just be OK
as a group with putting this out for public review.

Krste


Re: Smaller embedded version of the Vector extension

Bruce Hoult
 

On Fri, Jun 4, 2021 at 8:09 AM Zalman Stern via lists.riscv.org <zalman=google.com@...> wrote:
If the minimum VLEN is at least 128-bits, one can translate NEON/SSE intrinsics directly without having to have every vector instruction dominated by a loop over the vector length.

This is an excellent point, but there are only 8 SSE/AVX/AVX2 registers in 32 bit mode and 16 in 64 bit.

Therefore a 32 bit RISC-V could use 32 bit VLEN and LMUL=4 to directly translate SSE code without stripmining, and a 64 bit RISC-V could use 64 bit VLEN and LMUL=2. For AVX/AVX2 VLEN=64 is required on 32 bit and VLEN=128 on 64 bit, using the same LMUL.

Similarly, 32 bit ARM NEON works as sixteen 128 bit registers or thirty two 64 bit registers. Thus a 32 bit RISC-V with VLEN=64 can directly translate NEON code using LMUL=1 or LMUL=2.

Aarch64 has thirty two registers of 128 bits each, which can also be treated as thirty two registers of 64 bits each (effectively just setting a smaller VL, the upper half is zeroed). So directly porting 64 bit ARM Advanced SIMD code does require 128 bit registers.

For maximum SIMD-porting compatibility with both ARM and x86 code a 64 bit RISC-V needs VLEN=128 but a 32 bit RISC-V is fine with VLEN=64.


Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

If there was no cost, then supporting VLEN=64 on general apps
processor profile would be a good thing to do. But not allowing
standard software to assume VLEN>=128 imposes a non-trivial impact on
bigger cores, and expectation is the vast majority of apps cores will
want VLEN>=128.

As Zalman points out, the main advantage is removing stripmining code
when it is known vectors will fit, and translating existing code is
one important such use case though not the only one. Removing
stripmining reduces static and dynamic code size and increases
performance. While LMUL>1 allows more cases to be handled without
stripmining, it also reduces available arch registers.

Anyone can of course still build a compatible apps processor with
VLEN=64, but this would fail to run some of the code written for
VLEN>=128 case. And almost anything goes in embedded space.

Krste

On Thu, 3 Jun 2021 13:35:03 -0700, Zalman Stern <zalman@...> said:
| "...if written correctly" is precisely the point. If VLEN is specified as >=128, code that targets 128-bits explicitly by
| setting VL to an appropriate constant for a large swath *is* correct. This allows one to do basically what NEON/SSE do today as
| a baseline for performance.

| Whether this is worthwhile or not may be debated, but insisting that everything should be completely vector length agnostic or
| it is broken is missing the point. Ideally there would be a lot more quantitative data on this, but I'm not going to tilt at
| that windmill right now. The worst case for the overhead of hardware vector length independence occurs at the smallest sizes as
| well.

| In general it's pretty dubious that the same set of fully lowered instruction bits can efficiently cover everything from the
| bottom of the embedded space to HPC. Ideally we'd be moving to more sophisticated lowering -- e.g. dynamic and multi-stage
| compilation -- rather than forcing the issue in the ISA design.

| Another way to go would be to split 32-bit and 64-bit implementations such that the VLEN >= 64 for 32-bit implementations and
| VLEN >= 128 for 64-bit ones. (Application code is rarely going to target 32-bit these days. Minimal embedded implementations
| are probably 32-bit.) Though truth be told, code likely needs a scalar fallback anyway unless the V extension is required.
| (Which it almost certainly won't be if we're talking embedded space.) As such, VLEN not being large enough for the expectations
| code was compiled to is the same as not having the vector unit.

| -Z-

| On Thu, Jun 3, 2021 at 9:33 AM Tony Cole via lists.riscv.org <tony.cole=huawei.com@...> wrote:

| Software should still work with VLEN>=64 if written correctly, as it should be VLEN agnostic.
| Maybe it should be a recommendation that VLEN>=128, with a minimum of 64 for app processors?

| Lower performance is an implementation cost/benefit decision.

| Tony

| -----Original Message-----
| From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Krste Asanovic
| Sent: 03 June 2021 17:24
| To: Guy Lemieux <guy.lemieux@...>
| Cc: Andrew Waterman <andrew@...>; Tariq Kurd <tariq.kurd@...>; Shaofei (B) <shaofei1@...>;
| tech-vector-ext@...
| Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

|| On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@...> wrote:
||
|| What is the advantage to RVV requiring VLEN >= 128?
||
|| I think this should be changed to VLEN >= 64 because:
||
|| 1) VLEN = 64 is more likely for small implementations; creating a
|| mandatory expectation to improve software portability

| This is the requirement for app processors, which are not generally small cores.
| Most competing SIMD extensions are at least 128b per vector register.

||
|| 2) two implementations, each with VLEN >= 64, do not expose anything
|| new to software that is not already exposed by VLEN >= 128
||
|| 3) allowing VLEN =32 would expose something new to software (register
|| file data layout when SEW=64)
||
|| 4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
||| = 128)? (I can't see any)

| Lower performance on codes that work well on other app architectures.

| Krste

||
|| Guy
||
||
|| On Wed, Jun 2, 2021 at 11:13 AM <krste@...> wrote:
|||
|||
||| The VLEN>=128 constraint is only for the application processor "V"
||| extension for the app profile - not for embedded vectors which can
||| have VLEN=32.
|||
||| From spec Introduction:
||| '
||| The term base vector extension is used informally to describe the standard set of vector ISA components that will be
| required for the single-letter "V" extension, which is intended for use in standard server and application-processor
| platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I,
| RV64I) as described below.
|||
||| Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of
| mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when
| each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string
| names beginning with the "Zv" prefix.
||| '
|||
||| There are a set Zve* names for the embedded subsets (see github issue
||| #550).
|||
||| A minimal embedded implementaton using RV32E+Zfinx+vectors would be
||| same state size as ARM MVE.
|||
||| P extension does not have floating-point, but for short
||| integer/fixed-point SIMD makes sense as alternative.
|||
||| The software fragmentation issue is that some library routines that
||| expose VLEN might not be portable between app cores and embedded
||| cores, but these are different software ecosystems (e.g. ABI/calling
||| convention might be different) and only a few kinds of routine rely
||| on VLEN.
|||
||| For app cores that can afford VLEN>=128, the advantage is the removal
||| of stripmining code in cases that operate on fixed-size vectors.
|||
||| Krste
|||
|||
|||
|||||||| On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:
|||
||| | Allowing VLEN<128 would allow for smaller vector register files,
||| | bit it would also result in a profile that is not
||| | forward-compatible with the V spec. This would produce another fracture the software ecosystem.
|||
||| | To avoid such a fracture, there are two choices:
||| | (1) go with P instead
||| | (2) relax the V spec to allow smaller implementations
|||
||| | So the key question for this group is whether to relax the minimum
||| | VLEN to 32 or 64?
|||
||| | note: a possible justification for keeping 128 might be to
||| | recommend (1) instead. I don’t know anything about P, but it seems
||| | like it could be speced in a way that is competitive/comparable with Helium.
|||
||| | Guy
|||
||| | PS — I have started to design an “RVV-lite” profile which would be
||| | more amenable to embedded implementations. However, I have adopted
||| | a stance that it must remain forward compatible with the full V
||| | spec, so I have not considered VLEN below 128. I am happy to share
||| | my work on this and involve other contributors — email me if you would like to see a copy.
|||
||| | On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:
|||
||| |     The uppercase-V V extension is meant to cater to apps processors, where
||| |     the VLEN >= 128 constraint is not inappropriate and is sometimes
||| |     beneficial.  But there's nothing fundamental about the ISA design that
||| |     prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
||| |     same total amount of state as MVE.  (And if you set LMUL=4, then you even
||| |     get the same shape: 8 registers of 128 bits apiece.)
|||
||| |     Such a thing wouldn't be called V, but perhaps something like Zvmin.
||| |     Other than agreeing on a feature set and assigning it a name, the
||| |     architecting is already done.
|||
||| |     (If you search the spec for Zfinx, you'll see that a Zfinx variant is
||| |     planned, but only barely sketched out.)
|||
||| |     On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
||| |     huawei.com@...> wrote:
|||
||| |         Hi everyone,
|||
||| |
|||
||| |         Are there any plans for a cut-down configuration of the vector
||| |         extension suitable for embedded cores? It seems that the 32x128-bit
||| |         register file is suitable for application class cores but it very
||| |         large for embedded cores, especially if
|||
||| |         the F registers also need to be implemented (which I think is the
||| |         case, unless a Zfinx version is specified).
|||
||| |
|||
||| |         ARM MVE only has 8x128-bit registers for FP and Vector, so it much
||| |         more suitable for embedded applications.
|||
||| |         https://en.wikichip.org/wiki/arm/helium
|||
||| |
|||
||| |         What’s the approach here? Should embedded applications implement the
||| |         P-extension instead?
|||
||| |
|||
||| |         Tariq
|||
||| |
|||
||| |         Tariq Kurd
|||
||| |         Processor Design
|||
||| |         I RISC-V Cores, Bristol
|||
||| |         E-mail:
|||
||| |         Tariq.Kurd@...
|||
||| |         Company:
|||
||| |         Huawei technologies R&D (UK) Ltd
|||
||| |         I Address: 290
|||
||| |         Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
||| |         4TR, UK
|||
||| |
|||
||| |         315px-Huawei
|||
||| |         http://www.huawei.com
|||
||| |         cid:image002.jpg@...
|||
||| |         This e-mail and its attachments contain confidential information from
||| |         HUAWEI, which
|||
||| |         is intended only for the person or entity whose address is listed
||| |         above. Any use of the information contained herein in any way
||| |         (including, but not limited to, total or partial
||| |         disclosure,reproduction, or dissemination) by persons other than the
||| |         intended recipient(s)
|||
||| |         is prohibited. If you receive this e-mail in error, please notify the
||| |         sender by phone or email immediately and delete it !
|||
||| |         本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
||| |         或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
||| |         制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
||| |         发件人并删除本邮件!
|||
||| |
|||
||| |
||| | x[DELETED ATTACHMENT image001.png, PNG image] x[DELETED ATTACHMENT
||| | image002.jpg, JPEG image]

|


Re: Smaller embedded version of the Vector extension

Guy Lemieux
 

On Thu, Jun 3, 2021 at 1:08 PM Zalman Stern <zalman@...> wrote:

If the minimum VLEN is at least 128-bits, one can translate NEON/SSE intrinsics directly without having to have every vector instruction dominated by a loop over the vector length.
that's pretty handy, actually. I'm not sure it should be a property of
the V spec itself, rather it could be a requirement that software
which is translated in this method could require an implementation
with VLEN >= 128 else it would fall back to a scalar translation.

for RVV, I was pretty comfortable with the requirement that RVV
require VLEN >= 128 before this whole thread started. it seemed like a
good length (4 x 32b words) which matched other SIMD instructions sets
as you have noted.

with this post, Tariq indicated that he wants to reduce the amount of
state. from this, I started to think it might be better to shorten
this to VLEN >= 64 or perhaps VLEN >= max(XLEN,FLEN) rather than
reducing the number of named registers [*]

Regarding performance, VLEN=32 or 64 seems ridiculously small until
you consider register grouping. The RVV-lite profile that I'm
proposing would require SEW/LMUL=8, so VLMAX=4 for VLEN=32, and
VLMAX=8 for VLEN=64. These are reasonable vector lengths to get
reasonable amounts of parallelism.


[*] why not just restrict small implementations to 16 or 8 named
registers with VLEN >= 128? it is a consequence of how RVV has chosen
to implement widening and narrowing instructions, which require using
register grouping. in my RVV-lite profile, I considered eliminating
register groups entirely, but this would require some other way to do
widening/narrowing which would not be compatible with RVV. with
SEW/LMUL=32/4, a common setting, there are only 8 vector registers
available. to save register file area, restricting this to just 4
vector registers seems too restrictive. instead, I think relaxing
VLMAX >= 64 achieves the same effect (halving the required register
file size) without requiring such a restriction.

Guy


Re: Smaller embedded version of the Vector extension

Zalman Stern
 

"...if written correctly" is precisely the point. If VLEN is specified as >=128, code that targets 128-bits explicitly by setting VL to an appropriate constant for a large swath *is* correct. This allows one to do basically what NEON/SSE do today as a baseline for performance.

Whether this is worthwhile or not may be debated, but insisting that everything should be completely vector length agnostic or it is broken is missing the point. Ideally there would be a lot more quantitative data on this, but I'm not going to tilt at that windmill right now. The worst case for the overhead of hardware vector length independence occurs at the smallest sizes as well.

In general it's pretty dubious that the same set of fully lowered instruction bits can efficiently cover everything from the bottom of the embedded space to HPC. Ideally we'd be moving to more sophisticated lowering -- e.g. dynamic and multi-stage compilation -- rather than forcing the issue in the ISA design.

Another way to go would be to split 32-bit and 64-bit implementations such that the VLEN >= 64 for 32-bit implementations and VLEN >= 128 for 64-bit ones. (Application code is rarely going to target 32-bit these days. Minimal embedded implementations are probably 32-bit.) Though truth be told, code likely needs a scalar fallback anyway unless the V extension is required. (Which it almost certainly won't be if we're talking embedded space.) As such, VLEN not being large enough for the expectations code was compiled to is the same as not having the vector unit.

-Z-

On Thu, Jun 3, 2021 at 9:33 AM Tony Cole via lists.riscv.org <tony.cole=huawei.com@...> wrote:
Software should still work with VLEN>=64 if written correctly, as it should be VLEN agnostic.
Maybe it should be a recommendation that VLEN>=128, with a minimum of 64 for app processors?

Lower performance is an implementation cost/benefit decision.

Tony

-----Original Message-----
From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Krste Asanovic
Sent: 03 June 2021 17:24
To: Guy Lemieux <guy.lemieux@...>
Cc: Andrew Waterman <andrew@...>; Tariq Kurd <tariq.kurd@...>; Shaofei (B) <shaofei1@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension



> On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@...> wrote:
>
> What is the advantage to RVV requiring VLEN >= 128?
>
> I think this should be changed to VLEN >= 64 because:
>
> 1) VLEN = 64 is more likely for small implementations; creating a
> mandatory expectation to improve software portability

This is the requirement for app processors, which are not generally small cores.
Most competing SIMD extensions are at least 128b per vector register.

>
> 2) two implementations, each with VLEN >= 64, do not expose anything
> new to software that is not already exposed by VLEN >= 128
>
> 3) allowing VLEN =32 would expose something new to software (register
> file data layout when SEW=64)
>
> 4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
>> = 128)? (I can't see any)

Lower performance on codes that work well on other app architectures.

Krste

>
> Guy
>
>
> On Wed, Jun 2, 2021 at 11:13 AM <krste@...> wrote:
>>
>>
>> The VLEN>=128 constraint is only for the application processor "V"
>> extension for the app profile - not for embedded vectors which can
>> have VLEN=32.
>>
>> From spec Introduction:
>> '
>> The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.
>>
>> Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
>> '
>>
>> There are a set Zve* names for the embedded subsets (see github issue
>> #550).
>>
>> A minimal embedded implementaton using RV32E+Zfinx+vectors would be
>> same state size as ARM MVE.
>>
>> P extension does not have floating-point, but for short
>> integer/fixed-point SIMD makes sense as alternative.
>>
>> The software fragmentation issue is that some library routines that
>> expose VLEN might not be portable between app cores and embedded
>> cores, but these are different software ecosystems (e.g. ABI/calling
>> convention might be different) and only a few kinds of routine rely
>> on VLEN.
>>
>> For app cores that can afford VLEN>=128, the advantage is the removal
>> of stripmining code in cases that operate on fixed-size vectors.
>>
>> Krste
>>
>>
>>
>>>>>>> On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:
>>
>> | Allowing VLEN<128 would allow for smaller vector register files,
>> | bit it would also result in a profile that is not
>> | forward-compatible with the V spec. This would produce another fracture the software ecosystem.
>>
>> | To avoid such a fracture, there are two choices:
>> | (1) go with P instead
>> | (2) relax the V spec to allow smaller implementations
>>
>> | So the key question for this group is whether to relax the minimum
>> | VLEN to 32 or 64?
>>
>> | note: a possible justification for keeping 128 might be to
>> | recommend (1) instead. I don’t know anything about P, but it seems
>> | like it could be speced in a way that is competitive/comparable with Helium.
>>
>> | Guy
>>
>> | PS — I have started to design an “RVV-lite” profile which would be
>> | more amenable to embedded implementations. However, I have adopted
>> | a stance that it must remain forward compatible with the full V
>> | spec, so I have not considered VLEN below 128. I am happy to share
>> | my work on this and involve other contributors — email me if you would like to see a copy.
>>
>> | On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:
>>
>> |     The uppercase-V V extension is meant to cater to apps processors, where
>> |     the VLEN >= 128 constraint is not inappropriate and is sometimes
>> |     beneficial.  But there's nothing fundamental about the ISA design that
>> |     prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
>> |     same total amount of state as MVE.  (And if you set LMUL=4, then you even
>> |     get the same shape: 8 registers of 128 bits apiece.)
>>
>> |     Such a thing wouldn't be called V, but perhaps something like Zvmin.
>> |     Other than agreeing on a feature set and assigning it a name, the
>> |     architecting is already done.
>>
>> |     (If you search the spec for Zfinx, you'll see that a Zfinx variant is
>> |     planned, but only barely sketched out.)
>>
>> |     On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
>> |     huawei.com@...> wrote:
>>
>> |         Hi everyone,
>>
>> |
>>
>> |         Are there any plans for a cut-down configuration of the vector
>> |         extension suitable for embedded cores? It seems that the 32x128-bit
>> |         register file is suitable for application class cores but it very
>> |         large for embedded cores, especially if
>>
>> |         the F registers also need to be implemented (which I think is the
>> |         case, unless a Zfinx version is specified).
>>
>> |
>>
>> |         ARM MVE only has 8x128-bit registers for FP and Vector, so it much
>> |         more suitable for embedded applications.
>>
>> |         https://en.wikichip.org/wiki/arm/helium
>>
>> |
>>
>> |         What’s the approach here? Should embedded applications implement the
>> |         P-extension instead?
>>
>> |
>>
>> |         Tariq
>>
>> |
>>
>> |         Tariq Kurd
>>
>> |         Processor Design
>>
>> |         I RISC-V Cores, Bristol
>>
>> |         E-mail:
>>
>> |         Tariq.Kurd@...
>>
>> |         Company:
>>
>> |         Huawei technologies R&D (UK) Ltd
>>
>> |         I Address: 290
>>
>> |         Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
>> |         4TR, UK
>>
>> |
>>
>> |         315px-Huawei
>>
>> |         http://www.huawei.com
>>
>> |         cid:image002.jpg@...
>>
>> |         This e-mail and its attachments contain confidential information from
>> |         HUAWEI, which
>>
>> |         is intended only for the person or entity whose address is listed
>> |         above. Any use of the information contained herein in any way
>> |         (including, but not limited to, total or partial
>> |         disclosure,reproduction, or dissemination) by persons other than the
>> |         intended recipient(s)
>>
>> |         is prohibited. If you receive this e-mail in error, please notify the
>> |         sender by phone or email immediately and delete it !
>>
>> |         本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
>> |         或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
>> |         制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
>> |         发件人并删除本邮件!
>>
>> |
>>
>> |
>> | x[DELETED ATTACHMENT image001.png, PNG image] x[DELETED ATTACHMENT
>> | image002.jpg, JPEG image]












Re: Smaller embedded version of the Vector extension

Zalman Stern
 

If the minimum VLEN is at least 128-bits, one can translate NEON/SSE intrinsics directly without having to have every vector instruction dominated by a loop over the vector length.

-Z-


On Thu, Jun 3, 2021 at 9:38 AM Guy Lemieux <guy.lemieux@...> wrote:
Krste, to be clear,The issue



On Thu, Jun 3, 2021 at 9:24 AM Krste Asanovic <krste@...> wrote:
> > On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@...> wrote:
> >
> > What is the advantage to RVV requiring VLEN >= 128?
> >
> > I think this should be changed to VLEN >= 64 because:
> >
> > 1) VLEN = 64 is more likely for small implementations; creating a
> > mandatory expectation to improve software portability
>
> This is the requirement for app processors, which are not generally small cores.
> Most competing SIMD extensions are at least 128b per vector register.


The RVV spec should be inclusive, rather than exclusive. Setting VLEN
>= 128 is a higher threshold that makes it less inclusive.


> > 4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
> >> = 128)? (I can't see any)
>
> Lower performance on codes that work well on other app architectures.

Sorry I wasn't clear. Of course, an implementation with VLEN=64 would
likely be slower than one with VLEN=128.

To clarify: are there any disadvantages to allowing VLEN=64 in the
spec as a minimum threshold?

Software should be agnostic of VLEN, but the truth is programmers will
squeeze out every last bit where they can and they will latch on to
this minimum value when doing things like re-using LSBs of pointers,
setting minimum chunk sizes, etc. Hence, asking them to expect VLEN=64
as a minimum would be better (more inclusive).

I can't see how this would hurt performance.

Guy






Re: Smaller embedded version of the Vector extension

Guy Lemieux
 

Krste, to be clear,The issue



On Thu, Jun 3, 2021 at 9:24 AM Krste Asanovic <krste@...> wrote:
On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@...> wrote:

What is the advantage to RVV requiring VLEN >= 128?

I think this should be changed to VLEN >= 64 because:

1) VLEN = 64 is more likely for small implementations; creating a
mandatory expectation to improve software portability
This is the requirement for app processors, which are not generally small cores.
Most competing SIMD extensions are at least 128b per vector register.

The RVV spec should be inclusive, rather than exclusive. Setting VLEN
= 128 is a higher threshold that makes it less inclusive.

4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
= 128)? (I can't see any)
Lower performance on codes that work well on other app architectures.
Sorry I wasn't clear. Of course, an implementation with VLEN=64 would
likely be slower than one with VLEN=128.

To clarify: are there any disadvantages to allowing VLEN=64 in the
spec as a minimum threshold?

Software should be agnostic of VLEN, but the truth is programmers will
squeeze out every last bit where they can and they will latch on to
this minimum value when doing things like re-using LSBs of pointers,
setting minimum chunk sizes, etc. Hence, asking them to expect VLEN=64
as a minimum would be better (more inclusive).

I can't see how this would hurt performance.

Guy


Re: Smaller embedded version of the Vector extension

Tony Cole
 

Software should still work with VLEN>=64 if written correctly, as it should be VLEN agnostic.
Maybe it should be a recommendation that VLEN>=128, with a minimum of 64 for app processors?

Lower performance is an implementation cost/benefit decision.

Tony

-----Original Message-----
From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Krste Asanovic
Sent: 03 June 2021 17:24
To: Guy Lemieux <guy.lemieux@...>
Cc: Andrew Waterman <andrew@...>; Tariq Kurd <tariq.kurd@...>; Shaofei (B) <shaofei1@...>; tech-vector-ext@...
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension



On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@...> wrote:

What is the advantage to RVV requiring VLEN >= 128?

I think this should be changed to VLEN >= 64 because:

1) VLEN = 64 is more likely for small implementations; creating a
mandatory expectation to improve software portability
This is the requirement for app processors, which are not generally small cores.
Most competing SIMD extensions are at least 128b per vector register.


2) two implementations, each with VLEN >= 64, do not expose anything
new to software that is not already exposed by VLEN >= 128

3) allowing VLEN =32 would expose something new to software (register
file data layout when SEW=64)

4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
= 128)? (I can't see any)
Lower performance on codes that work well on other app architectures.

Krste


Guy


On Wed, Jun 2, 2021 at 11:13 AM <krste@...> wrote:


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can
have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue
#550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be
same state size as ARM MVE.

P extension does not have floating-point, but for short
integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that
expose VLEN might not be portable between app cores and embedded
cores, but these are different software ecosystems (e.g. ABI/calling
convention might be different) and only a few kinds of routine rely
on VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal
of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:
| Allowing VLEN<128 would allow for smaller vector register files,
| bit it would also result in a profile that is not
| forward-compatible with the V spec. This would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum
| VLEN to 32 or 64?

| note: a possible justification for keeping 128 might be to
| recommend (1) instead. I don’t know anything about P, but it seems
| like it could be speced in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be
| more amenable to embedded implementations. However, I have adopted
| a stance that it must remain forward compatible with the full V
| spec, so I have not considered VLEN below 128. I am happy to share
| my work on this and involve other contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial. But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128. A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE. (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin.
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@...> wrote:

| Hi everyone,

|

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|

| Tariq

|

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@...

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@...

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|

|
| x[DELETED ATTACHMENT image001.png, PNG image] x[DELETED ATTACHMENT
| image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

On Jun 3, 2021, at 9:16 AM, Guy Lemieux <guy.lemieux@...> wrote:

What is the advantage to RVV requiring VLEN >= 128?

I think this should be changed to VLEN >= 64 because:

1) VLEN = 64 is more likely for small implementations; creating a
mandatory expectation to improve software portability
This is the requirement for app processors, which are not generally small cores.
Most competing SIMD extensions are at least 128b per vector register.


2) two implementations, each with VLEN >= 64, do not expose anything
new to software that is not already exposed by VLEN >= 128

3) allowing VLEN =32 would expose something new to software (register
file data layout when SEW=64)

4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
= 128)? (I can't see any)
Lower performance on codes that work well on other app architectures.

Krste


Guy


On Wed, Jun 2, 2021 at 11:13 AM <krste@...> wrote:


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can
have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue
#550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be
same state size as ARM MVE.

P extension does not have floating-point, but for short
integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that
expose VLEN might not be portable between app cores and embedded
cores, but these are different software ecosystems (e.g. ABI/calling
convention might be different) and only a few kinds of routine rely on
VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal
of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit it would
| also result in a profile that is not forward-compatible with the V spec. This
| would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum VLEN to 32
| or 64?

| note: a possible justification for keeping 128 might be to recommend (1)
| instead. I don’t know anything about P, but it seems like it could be speced
| in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be more
| amenable to embedded implementations. However, I have adopted a stance that it
| must remain forward compatible with the full V spec, so I have not considered
| VLEN below 128. I am happy to share my work on this and involve other
| contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial. But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128. A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE. (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin.
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@...> wrote:

| Hi everyone,

|

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|

| Tariq

|

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@...

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@...

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|

|
| x[DELETED ATTACHMENT image001.png, PNG image]
| x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Guy Lemieux
 

What is the advantage to RVV requiring VLEN >= 128?

I think this should be changed to VLEN >= 64 because:

1) VLEN = 64 is more likely for small implementations; creating a
mandatory expectation to improve software portability

2) two implementations, each with VLEN >= 64, do not expose anything
new to software that is not already exposed by VLEN >= 128

3) allowing VLEN =32 would expose something new to software (register
file data layout when SEW=64)

4) are there any disadvantages to VLEN >= 64 (versus the current VLEN
= 128)? (I can't see any)
Guy


On Wed, Jun 2, 2021 at 11:13 AM <krste@...> wrote:


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can
have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue
#550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be
same state size as ARM MVE.

P extension does not have floating-point, but for short
integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that
expose VLEN might not be portable between app cores and embedded
cores, but these are different software ecosystems (e.g. ABI/calling
convention might be different) and only a few kinds of routine rely on
VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal
of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit it would
| also result in a profile that is not forward-compatible with the V spec. This
| would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum VLEN to 32
| or 64?

| note: a possible justification for keeping 128 might be to recommend (1)
| instead. I don’t know anything about P, but it seems like it could be speced
| in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be more
| amenable to embedded implementations. However, I have adopted a stance that it
| must remain forward compatible with the full V spec, so I have not considered
| VLEN below 128. I am happy to share my work on this and involve other
| contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial. But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128. A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE. (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin.
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@...> wrote:

| Hi everyone,

|

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|

| Tariq

|

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@...

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@...

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|

|
| x[DELETED ATTACHMENT image001.png, PNG image]
| x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Krste Asanovic
 

see github issue #550
Krste

On Jun 3, 2021, at 2:02 AM, Shaofei (B) <shaofei1@...> wrote:

Hi, Krste:

 The RISC-V V TG have the plan to support a lowcost vector extension in RVMxx profile?  

 Best Regards
 Shaofei
 2021.6.3

-----邮件原件-----
发件人: krste@... [mailto:krste@...]
发送时间: 2021年6月3日 2:13
收件人: Guy Lemieux <guy.lemieux@...>
抄送: Andrew Waterman <andrew@...>; Tariq Kurd <tariq.kurd@...>; Shaofei (B) <shaofei1@...>; tech-vector-ext@...
主题: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue #550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be same state size as ARM MVE.

P extension does not have floating-point, but for short integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that expose VLEN might not be portable between app cores and embedded cores, but these are different software ecosystems (e.g. ABI/calling convention might be different) and only a few kinds of routine rely on VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:

| Allowing VLEN<128 would allow for smaller vector register files, bit
| it would also result in a profile that is not forward-compatible with
| the V spec. This would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum
| VLEN to 32 or 64?

| note: a possible justification for keeping 128 might be to recommend
| (1) instead. I don’t know anything about P, but it seems like it could
| be speced in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be
| more amenable to embedded implementations. However, I have adopted a
| stance that it must remain forward compatible with the full V spec, so
| I have not considered VLEN below 128. I am happy to share my work on
| this and involve other contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:

|     The uppercase-V V extension is meant to cater to apps processors, where
|     the VLEN >= 128 constraint is not inappropriate and is sometimes
|     beneficial.  But there's nothing fundamental about the ISA design that
|     prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
|     same total amount of state as MVE.  (And if you set LMUL=4, then you even
|     get the same shape: 8 registers of 128 bits apiece.)

|     Such a thing wouldn't be called V, but perhaps something like Zvmin. 
|     Other than agreeing on a feature set and assigning it a name, the
|     architecting is already done.

|     (If you search the spec for Zfinx, you'll see that a Zfinx variant is
|     planned, but only barely sketched out.)

|     On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
|     huawei.com@...> wrote:

|         Hi everyone,

|          

|         Are there any plans for a cut-down configuration of the vector
|         extension suitable for embedded cores? It seems that the 32x128-bit
|         register file is suitable for application class cores but it very
|         large for embedded cores, especially if

|         the F registers also need to be implemented (which I think is the
|         case, unless a Zfinx version is specified).

|          

|         ARM MVE only has 8x128-bit registers for FP and Vector, so it much
|         more suitable for embedded applications.

|         https://en.wikichip.org/wiki/arm/helium

|          

|         What’s the approach here? Should embedded applications implement the
|         P-extension instead?

|          

|         Tariq

|          

|         Tariq Kurd

|         Processor Design

|         I RISC-V Cores, Bristol

|         E-mail:

|         Tariq.Kurd@...

|         Company:

|         Huawei technologies R&D (UK) Ltd

|         I Address: 290

|         Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
|         4TR, UK

|          

|         315px-Huawei

|         http://www.huawei.com

|         

|         This e-mail and its attachments contain confidential information from
|         HUAWEI, which

|         is intended only for the person or entity whose address is listed
|         above. Any use of the information contained herein in any way
|         (including, but not limited to, total or partial
|         disclosure,reproduction, or dissemination) by persons other than the
|         intended recipient(s)

|         is prohibited. If you receive this e-mail in error, please notify the
|         sender by phone or email immediately and delete it !

|         本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
|         或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
|         制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
|         发件人并删除本邮件!

|          

|  x[DELETED ATTACHMENT image001.png, PNG
| image] x[DELETED ATTACHMENT image002.jpg, JPEG image]







Re: Smaller embedded version of the Vector extension

Tariq Kurd <tariq.kurd@...>
 

This is a good question.
So if the RVM22 profile requires VLEN=32, ELEN=64, LMUL=8 then the vector registers will have the same amount of state as ARM MVE.

Tariq

-----Original Message-----
From: Shaofei (B)
Sent: 03 June 2021 10:03
To: krste@...; Guy Lemieux <guy.lemieux@...>; Shaofei (B) <shaofei1@...>
Cc: Andrew Waterman <andrew@...>; Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...
Subject: 答复: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

Hi, Krste:

The RISC-V V TG have the plan to support a lowcost vector extension in RVMxx profile?

Best Regards
Shaofei
2021.6.3

-----邮件原件-----
发件人: krste@... [mailto:krste@...]
发送时间: 2021年6月3日 2:13
收件人: Guy Lemieux <guy.lemieux@...>
抄送: Andrew Waterman <andrew@...>; Tariq Kurd <tariq.kurd@...>; Shaofei (B) <shaofei1@...>; tech-vector-ext@...
主题: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue #550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be same state size as ARM MVE.

P extension does not have floating-point, but for short integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that expose VLEN might not be portable between app cores and embedded cores, but these are different software ecosystems (e.g. ABI/calling convention might be different) and only a few kinds of routine rely on VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit
| it would also result in a profile that is not forward-compatible with
| the V spec. This would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum
| VLEN to 32 or 64?

| note: a possible justification for keeping 128 might be to recommend
| (1) instead. I don’t know anything about P, but it seems like it could
| be speced in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be
| more amenable to embedded implementations. However, I have adopted a
| stance that it must remain forward compatible with the full V spec, so
| I have not considered VLEN below 128. I am happy to share my work on
| this and involve other contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial.  But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE.  (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin. 
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@...> wrote:

| Hi everyone,

|  

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|  

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|  

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|  

| Tariq

|  

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@...

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|  

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@...

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|  

| x[DELETED ATTACHMENT image001.png, PNG
| image] x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: 答复: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

Shaofei (B)
 

Hi, Krste:

The RISC-V V TG have the plan to support a lowcost vector extension in RVMxx profile?

Best Regards
Shaofei
2021.6.3

-----邮件原件-----
发件人: krste@... [mailto:krste@...]
发送时间: 2021年6月3日 2:13
收件人: Guy Lemieux <guy.lemieux@...>
抄送: Andrew Waterman <andrew@...>; Tariq Kurd <tariq.kurd@...>; Shaofei (B) <shaofei1@...>; tech-vector-ext@...
主题: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension


The VLEN>=128 constraint is only for the application processor "V"
extension for the app profile - not for embedded vectors which can have VLEN=32.

From spec Introduction:
'
The term base vector extension is used informally to describe the standard set of vector ISA components that will be required for the single-letter "V" extension, which is intended for use in standard server and application-processor platform profiles. The set of mandatory instructions and supported element widths will vary with the base ISA (RV32I, RV64I) as described below.

Other profiles, including embedded profiles, may choose to mandate only subsets of these extensions. The exact set of mandatory supported instructions for an implementation to be compliant with a given profile will only be determined when each profile spec is ratified. For convenience in defining subset profiles, vector instruction subsets are given ISA string names beginning with the "Zv" prefix.
'

There are a set Zve* names for the embedded subsets (see github issue #550).

A minimal embedded implementaton using RV32E+Zfinx+vectors would be same state size as ARM MVE.

P extension does not have floating-point, but for short integer/fixed-point SIMD makes sense as alternative.

The software fragmentation issue is that some library routines that expose VLEN might not be portable between app cores and embedded cores, but these are different software ecosystems (e.g. ABI/calling convention might be different) and only a few kinds of routine rely on VLEN.

For app cores that can afford VLEN>=128, the advantage is the removal of stripmining code in cases that operate on fixed-size vectors.

Krste



On Wed, 2 Jun 2021 05:10:32 -0700, "Guy Lemieux" <guy.lemieux@...> said:
| Allowing VLEN<128 would allow for smaller vector register files, bit
| it would also result in a profile that is not forward-compatible with
| the V spec. This would produce another fracture the software ecosystem.

| To avoid such a fracture, there are two choices:
| (1) go with P instead
| (2) relax the V spec to allow smaller implementations

| So the key question for this group is whether to relax the minimum
| VLEN to 32 or 64?

| note: a possible justification for keeping 128 might be to recommend
| (1) instead. I don’t know anything about P, but it seems like it could
| be speced in a way that is competitive/comparable with Helium.

| Guy

| PS — I have started to design an “RVV-lite” profile which would be
| more amenable to embedded implementations. However, I have adopted a
| stance that it must remain forward compatible with the full V spec, so
| I have not considered VLEN below 128. I am happy to share my work on
| this and involve other contributors — email me if you would like to see a copy.

| On Wed, Jun 2, 2021 at 3:15 AM Andrew Waterman <andrew@...> wrote:

| The uppercase-V V extension is meant to cater to apps processors, where
| the VLEN >= 128 constraint is not inappropriate and is sometimes
| beneficial.  But there's nothing fundamental about the ISA design that
| prohibits VLEN < 128.  A minimal configuration is VLEN=ELEN=32, giving the
| same total amount of state as MVE.  (And if you set LMUL=4, then you even
| get the same shape: 8 registers of 128 bits apiece.)

| Such a thing wouldn't be called V, but perhaps something like Zvmin. 
| Other than agreeing on a feature set and assigning it a name, the
| architecting is already done.

| (If you search the spec for Zfinx, you'll see that a Zfinx variant is
| planned, but only barely sketched out.)

| On Wed, Jun 2, 2021 at 3:04 AM Tariq Kurd via lists.riscv.org <tariq.kurd=
| huawei.com@...> wrote:

| Hi everyone,

|  

| Are there any plans for a cut-down configuration of the vector
| extension suitable for embedded cores? It seems that the 32x128-bit
| register file is suitable for application class cores but it very
| large for embedded cores, especially if

| the F registers also need to be implemented (which I think is the
| case, unless a Zfinx version is specified).

|  

| ARM MVE only has 8x128-bit registers for FP and Vector, so it much
| more suitable for embedded applications.

| https://en.wikichip.org/wiki/arm/helium

|  

| What’s the approach here? Should embedded applications implement the
| P-extension instead?

|  

| Tariq

|  

| Tariq Kurd

| Processor Design

| I RISC-V Cores, Bristol

| E-mail:

| Tariq.Kurd@...

| Company:

| Huawei technologies R&D (UK) Ltd

| I Address: 290

| Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32
| 4TR, UK

|  

| 315px-Huawei

| http://www.huawei.com

| cid:image002.jpg@...

| This e-mail and its attachments contain confidential information from
| HUAWEI, which

| is intended only for the person or entity whose address is listed
| above. Any use of the information contained herein in any way
| (including, but not limited to, total or partial
| disclosure,reproduction, or dissemination) by persons other than the
| intended recipient(s)

| is prohibited. If you receive this e-mail in error, please notify the
| sender by phone or email immediately and delete it !

| 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人
| 或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复
| 制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知
| 发件人并删除本邮件!

|  

| x[DELETED ATTACHMENT image001.png, PNG
| image] x[DELETED ATTACHMENT image002.jpg, JPEG image]


Re: Smaller embedded version of the Vector extension

Nick Knight
 

Hi Tony,

All of the vector permutation instructions can be simulated using the memory system. For example, vslide can be simulated by storing the vector register and loading it at an offset; vrgather can be simulated by an indexed store followed by a unit-stride load (or unit-stride store and indexed load); etc. Whether or not this is more efficient depends on details of the microarchitecture and particular workload.

Best,
Nick Knight


On Wed, Jun 2, 2021 at 1:35 PM Tony Cole via lists.riscv.org <tony.cole=huawei.com@...> wrote:

Hi Bruce,

 

Do you mean vrgather instead of vslide?

 

I use vrgather_vx_* and vslidedown to perform a vector element rotate (and other things), see:

 

        https://github.com/riscv/riscv-v-spec/issues/671#issuecomment-837035001

 

-        I use vrgather_vx_i64m8( vec, 0, vl ) to splat the scalar in element 0 of vec to all elements in the result, I just want it in the top element but there isn’t a better instruction for that.

 

I think you are referring to: vrgather_vv_*  ??

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Tony Cole via lists.riscv.org
Sent: 02 June 2021 18:13
To: Bruce Hoult <bruce@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Hi Bruce,

 

“I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.”

 

I don't see what you mean, please can you elaborate with examples of why/how it exposes the size of the vector register in a very unfortunate way and breaking down if VLEN=1 (do you mean LMUL=1??).

 

The vslide instruction speeds up my code a lot as it reduce reloading (mostly the same) data over and over again.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 13:34
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

I an not a fan of the vslide instructions. It seems they expose the size of the vector registers in a very unfortunate way. In particular they break down if VLEN=1. Most code would be better off storing and loading with an offset.

 

I think I saw somewhere they are largely intended for debuggers.

 

On Thu, Jun 3, 2021 at 12:15 AM Tony Cole <tony.cole@...> wrote:

So, (on a 32x 32-bit vector register machine) the widening and narrowing instructions can use 64-bit elements (for destination and source respectively), but not any of other instructions, correct?

 

Note: I use many instructions while processing 64-bit “wide” and “quad” elements, e.g. vrgather_vx_i64m8, vslide1down_vx_i64m4, vslidedown_vx_i64m8, vredsum_vs_i64m8, etc.

 

Therefore, this code would not work on a 32x 32-bit vector register machine.

 

 

Tony

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 12:18
To: Tony Cole <tony.cole@...>
Cc: Tariq Kurd <tariq.kurd@...>; tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

Note that the effective LMUL is limited to 8, the same as the actual LMUL, so if you've set e32m4 (32 bit elements with LMUL=4) then you can only widen to 64 bit results, not 128 bit. 

 

On Wed, Jun 2, 2021 at 11:15 PM Bruce Hoult <bruce@...> wrote:

Yes. The Standard Element Width (SEW) would be limited to 32 bits, but the widening multiplies and accumulates produce the same number of wider results using multiple registers (higher effective LMUL)

 

See section 5.2. Vector Operands

 

Each vector operand has an effective element width (EEW) and an effective LMUL (EMUL) that is used to determine the size and location of all the elements within a vector register group. By default, for most operands of most instructions, EEW=SEW and EMUL=LMUL.


Some vector instructions have source and destination vector operands with the same number of elements but different widths, so that EEW and EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. For example, most widening arithmetic instructions have a source group with EEW=SEW and EMUL=LMUL but destination group with EEW=2*SEW and EMUL=2*LMUL. Narrowing instructions have a source operand that has EEW=2*SEW and EMUL=2*LMUL but destination where EEW=SEW and EMUL=LMUL.

Vector operands or results may occupy one or more vector registers depending on EMUL, but are always specified using the lowest-numbered vector register in the group. Using other than the lowest-numbered vector register to specify a vector register group is a reserved encoding.

 

 

 

On Wed, Jun 2, 2021 at 11:11 PM Tony Cole <tony.cole@...> wrote:

Having 32x 32 bit registers with LMUL=4, giving 8x 128 bits - does this allow for 64-bit elements?

I don't think it does, but it’s not clear in the spec.

 

I use 64-bit elements for “wide” and “quad” accumulators.

 

 

From: tech-vector-ext@... [mailto:tech-vector-ext@...] On Behalf Of Bruce Hoult
Sent: 02 June 2021 11:19
To: Tariq Kurd <
tariq.kurd@...>
Cc:
tech-vector-ext@...; Shaofei (B) <shaofei1@...>
Subject: Re: [RISC-V] [tech-vector-ext] Smaller embedded version of the Vector extension

 

There is nothing to prevent implementing 32x 32 bit registers on a 32 bit CPU. The application processor spec has quite

recently (a few months) specified a 128 bit minimum register size but I don't think there's any good reason for this,

especially in embedded.

 

With that configuration, LMUL=4 gives 8x 128 bits, the same as MVE.

 

If floating point is desired then Zfinx is available, sharing int & fp scalar registers instead of fp and vector registers.

 

Of course profiles (or just custom chips for custom applications) can define subsets of instructions.

 

On Wed, Jun 2, 2021 at 10:05 PM Tariq Kurd via lists.riscv.org <tariq.kurd=huawei.com@...> wrote:

Hi everyone,

 

Are there any plans for a cut-down configuration of the vector extension suitable for embedded cores? It seems that the 32x128-bit register file is suitable for application class cores but it very large for embedded cores, especially if the F registers also need to be implemented (which I think is the case, unless a Zfinx version is specified).

 

ARM MVE only has 8x128-bit registers for FP and Vector, so it much more suitable for embedded applications.

https://en.wikichip.org/wiki/arm/helium

 

What’s the approach here? Should embedded applications implement the P-extension instead?

 

Tariq

 

Tariq Kurd

Processor Design I RISC-V Cores, Bristol

E-mail: Tariq.Kurd@...

Company: Huawei technologies R&D (UK) Ltd I Address: 290 Park Avenue, Aztec West, Almondsbury, Bristol, Avon, BS32 4TR, UK      

 

315px-Huawei    http://www.huawei.com

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure,reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it !

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面 地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!

 

181 - 200 of 827