poll on vstart management issues #493, #510 and #532
attached is what we did at convex and it worked quite well. worked well in the context of compiler generated code for stencils and for runtimers like convolution and correlation
i am not sure this answers the questions you posed.
hope this helps
Vector first register - C4600
The vector register set of the C4600 Series CPUs contains an additional vector register called the vector first register (VF).
VF specifies the first element of vector register Vi, Vj or Vk accessed by a vector instruction, provided that the MSB of the corresponding 5-bit register select field of the instruction is set. VF cannot be applied to operations on VM.
VF is seven bits in length and may contain a value between 0 and 127. If the value of VF plus the value of VL is greater than 128, the effective value ofVL for vector instructions that use VF is 128 minus VF. This effective VL value determines the number of results written to a vector register or VM, or the number of elements stored to memory.
If the value of VF plus Sj is greater than 127 in the mov
If Vi or Vj of an instruction specifies the same register as Vk of the instruction, and VF is applied to Vk, and VL is greater than VF, then elements of the shared register may be written (as Vk) before they are read (as Vi or Vj, depending of the hardware implementation). In this case, the result in Vk is architecturally undefined.Theinstructionmerg.x Vi,Vj,Vkhasthesame behavior if Vi or Vj are the same as Vk.
WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.
Ahead of the vector meeting I would like to see if we can address or at least get direction on some of the flagged for pre-v1.0 resolution.
There are 3 related flagged issues that all deal with vstart.493 - unbind vstart from element index
These have in common redefining vstart.
#493 proposes that the vstart value not have a one to one mapping from its value to the usual vl ordering.This was motivated in part by SLEN considerations that are now invisible to the ISA architecture.
However, included is the consideration that some operations , e.g. cryptography may desire restart part of the way through the operation.
This even if only one element is contained in the vlen*8 register set.
Clearly insufficient internal state is expressed it only zero and 1 are allowed values.
#510 proposes the element position within a segmented load/store is identified in vstart, not just the group position. **Most of the discussion relates to which should be identified, element position or group position.
POR is group position and it was substantially defended as the incumbent definition.
The POR substantially limits an implementations options, intended for the greater good.
Thus the question of where and how the additional "element within group" information should be stored did not progress.
However, even if segmented load/stores will settle on restart granularity of groups or elements,
the larger question of alternate representation of restart information for "special" cirumstances has been raised,
as it has in #493.
#532 proposes that vstart be defined as a value that will be treated as opaque at the ISA level.
No intrinsic meaning should be inferred directly from the value in vstart.
It is assured only to be a value used to instruct the implementation to restart the instruction from either a) the exception element or b) by adding 1 to the vstart value, the next element.
The proposal allows for implementations to provide a mechanism to provide additional trap information, including related element at exception.
Escape mechanisms to convey that the index is simply
embedded in the rightmost bits of vstart is discussed as a
to the expectation that plain text identification of the element "active" at the time of the exception is valuable information and sufficient in most cases for restart handling.
My request for the meeting is to poll the group to respond agree, disagree or abstain on the following..1) the POR [vstart is the plain text number indicating the next element at which to resume] is sufficient for v1.0,
Any augmenting of vstart or adding new facilities [e.g. csr] can be addressed later.
The ecosystem changes can also be made later as we expect the changes required to be specific to new functionality (crypto, ediv, etc.).
2) we identify specific special cases that we believe are desired by ecosystem [ element within segment group, phased restart of crypto ops, etc.]
and make specific allowance for supporting fields in vstart.
Low order bit will still identify the item [element, segment, crypto term, etc.] specific to the instruction.
Other fields will be populated according to the nature of the operation and the element type.
3) vstart be considered opaque as above with the escape mechanism that then reduces to the current POR.
The ecosystem will need to handle the general case in which the element of exception must be determined by other means than low bits in vstart.
Exception handlers can no longer resume at an arbitrary element in the instruction and have a reasonable expectation that the restart will work as expected.
4) opaque vstart without an escape signature in vstart. In all cases an alternate mechanism will be required to identify the element "of exception".
5) further investigation is still required before these decisions can be made.
** (and presumably any future segmented operation)