Re: Vector element groups

Abel Bernabeu


Yes, I double checked and you are right. Padding OpenCL vectors to POT is fine.

In terms of the instructions affected by the element groups semantics, I see three cases:
- dot (this was already taken into account in Zvediv)
- workgroup reduction operations
- shuffles, named vector gather in RVV

The impact of element groups is different depending on the case.

For dot, the semantics would define that the scope of the operation would be the work-item.

For workgroup reduction operations, the element groups are the leaves of a binary tree of workitems. The element group width defines the number of components for the vectors on the leaves.

For shuffles (vrgather), the element groups are only reordered within their group, like in this proposal for a graphics swizzle:

One good thing regarding shuffling in element groups is that there would be no need for changing existing vrgather implementations. Only new simple instructions are needed for generating the indices pattern taking into account the number of elements per group would be enough (similarly to the swizzle example above).

My question is, would you define the elements group infrastructure as an extension (more or less what you have) and later add the use cases in additional extensions? That would be four extensions:

- basic CSRs for element groups
- element groups aware dot
- element groups aware workgroup reductions
- element groups aware shuffles

I kind of like it like that, in four extensions, because the list of identified cases may grow during the discussion and you still want to release something within a reasonable time.


On Mon, Aug 29, 2022 at 1:34 AM <krste@...> wrote:

>>>>> On Fri, 26 Aug 2022 13:58:28 +0200, Abel Bernabeu <abel.bernabeu@...> said:

| Krste,
| Sorry it took me a long time to provide feedback.

| Yes, this is the kind of feature we could need for graphics and GPGPU-style SIMT. Many thanks for taking the
| time to think about how the idea behind Zediv can be introduced.

| This is the kind of concept that is needed for designing with vectors for things that are typically designed
| with warps.

| One comment I have is that groups of 3 elements are not power of two and turn out to be:
| - popular for graphics
| - demanded by OpenCL as well

- and a real pain to handle as an element group.

I tried looking at ways to incorporate non-POT group sizes, but they
just introduce too many corner cases that implementations will not
want to handle.

Handling them as four-element groups seems fairly common in other
graphics-oriented programmable hardware, at least old packed-SIMD
ISAs.  I took a quick look and OpenCL even specifies that the 3-vectors
are aligned on 4-element boundaries in memory, so that would work fine
with the element group model.

Of course, the 3-element vectors can instead be handled as 3-field
segments, loading the three components into separate vector registers.


| Is there anything we can do from the graphics SIG to help drive this work?

| Regards.

| On Fri, Jul 22, 2022 at 10:36 AM Krste Asanovic <krste@...> wrote:

|||||| On Fri, 15 Jul 2022 09:10:49 -0700, Earl Killian <earl.killian@...> said:
|     | While I share some concern about the cited language, as this is a concept, and not a spec, I think the
|     time to require checking
|     | would be when individual specs implement the concept. I would think it would require some pretty good
|     justification to not have
|     | an exception.

|     On further thought, I do think it makes sense to require raising of an
|     illegal instruction exception when vl is not a multiple of element
|     group size rather than leaving reserved.  Will be updating the doc
|     with rationale.

|     Krste


Join { to automatically receive all group messages.