|
Re: On Vector Register Layout
I will make a stab at even and odd layout for widening.
5) two versions of the widening ops are defined one for even and one odd.
The registers are divided into even:odd pairs.
Two versions of the
I will make a stab at even and odd layout for widening.
5) two versions of the widening ops are defined one for even and one odd.
The registers are divided into even:odd pairs.
Two versions of the
|
By
David Horner
·
#227
·
|
|
VFRECIP/VFRSQRT instructions
The task group has recommended moving forward with adding instructions that estimate reciprocals and reciprocal square roots. These are both useful for -ffast-math code where it's acceptable to
The task group has recommended moving forward with adding instructions that estimate reciprocals and reciprocal square roots. These are both useful for -ffast-math code where it's acceptable to
|
By
andrew@...
·
#226
·
|
|
Re: Whole Register Loads and Stores
| Hi
Hi Kito,
|| 3) Spill code inside loop
||
|| This is the most problematic case. I wonder about how often the
|| compiler does not know the type and length of the values to be
|| restored? I
| Hi
Hi Kito,
|| 3) Spill code inside loop
||
|| This is the most problematic case. I wonder about how often the
|| compiler does not know the type and length of the values to be
|| restored? I
|
By
Krste Asanovic
·
#225
·
|
|
Re: Whole Register Loads and Stores
Hi
Some point from compiler developer's view, we've implemented spill
code gen with whole register load/store on GCC.
Compiler/GCC know the type when spilling register but length (AVL) is
unknown
Hi
Some point from compiler developer's view, we've implemented spill
code gen with whole register load/store on GCC.
Compiler/GCC know the type when spilling register but length (AVL) is
unknown
|
By
Kito Cheng
·
#224
·
|
|
Re: Whole Register Loads and Stores
It's a lesser issue, as you said, but the millicode case might want a
single store and single load of whole registers that expects prediction
in addition to the single whole register store and
It's a lesser issue, as you said, but the millicode case might want a
single store and single load of whole registers that expects prediction
in addition to the single whole register store and
|
By
Bill Huffman
·
#223
·
|
|
Re: Whole Register Loads and Stores
I agree it’s difficult to find an alternative, and I am OK with having this as an architected hint.
Dropping SLEN completely is a major win.
Krste
I agree it’s difficult to find an alternative, and I am OK with having this as an architected hint.
Dropping SLEN completely is a major win.
Krste
|
By
Krste Asanovic
·
#222
·
|
|
Re: Whole Register Loads and Stores
I've thought about this as a solution and I don't believe it is enough.
This will require an extra pair of vsetvli instructions around many
restores of a spilled register. I think that's too
I've thought about this as a solution and I don't believe it is enough.
This will require an extra pair of vsetvli instructions around many
restores of a spilled register. I think that's too
|
By
Bill Huffman
·
#221
·
|
|
Re: Whole Register Loads and Stores
The more I think through the options, the more I'm convinced we have
to support SLEN=VLEN, at least as an extension if not in all cases,
primarily for software.
Working through the design challenges
The more I think through the options, the more I'm convinced we have
to support SLEN=VLEN, at least as an extension if not in all cases,
primarily for software.
Working through the design challenges
|
By
Krste Asanovic
·
#220
·
|
|
Re: On Vector Register Layout
Nick,
The issue is that in a wide SIMD datapath, the microarchitecture is
going to want bits to be spread across physical datapath bits
differently depending on SEW, though software's view of where
Nick,
The issue is that in a wide SIMD datapath, the microarchitecture is
going to want bits to be spread across physical datapath bits
differently depending on SEW, though software's view of where
|
By
Krste Asanovic
·
#219
·
|
|
Re: Whole Register Loads and Stores
If SLEN=VLEN layout is in force, then whole vector register
load/stores don't need to be specified as using SEW=8. They can use
current SEW from vtype - this will reduce, though not
If SLEN=VLEN layout is in force, then whole vector register
load/stores don't need to be specified as using SEW=8. They can use
current SEW from vtype - this will reduce, though not
|
By
Krste Asanovic
·
#218
·
|
|
Vector TG meeting minutes for 2020/6/19
Attached below. Also a reminder we'll be meeting again on Friday per
group's calendar info.
Krste
Date: 2020/6/19
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number
Attached below. Also a reminder we'll be meeting again on Friday per
group's calendar info.
Krste
Date: 2020/6/19
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number
|
By
Krste Asanovic
·
#217
·
|
|
Minutes of 2020/6/12 vector TG meeting
We agreed to meet again today (Friday) in usual slot - see member
calendar for details,
Krste
Date: 2020/6/12
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of
We agreed to meet again today (Friday) in usual slot - see member
calendar for details,
Krste
Date: 2020/6/12
Task Group: Vector Extension
Chair: Krste Asanovic
Co-Chair: Roger Espasa
Number of
|
By
Krste Asanovic
·
#216
·
|
|
Re: On Vector Register Layout
These decisions are not made independently.
E.g. Removing expanding loads led to fractional register mode.
I believe there are other considerations that affect a definitive
These decisions are not made independently.
E.g. Removing expanding loads led to fractional register mode.
I believe there are other considerations that affect a definitive
|
By
David Horner
·
#215
·
|
|
Re: Whole Register Loads and Stores
Hi Andrew,
I've been thinking about this some more. It seems to me there's value in pursuing both element sized whole vector loads as well as predictors. Taking the cases that seem to matter here,
Hi Andrew,
I've been thinking about this some more. It seems to me there's value in pursuing both element sized whole vector loads as well as predictors. Taking the cases that seem to matter here,
|
By
Bill Huffman
·
#214
·
|
|
Re: Fault-Only-First Indexed Loads Instructions
This instruction would enable U-mode software to probe any page in memory to determine whether it was currently mapped or not, without giving the OS any opportunity to intervene. Currently it is only
This instruction would enable U-mode software to probe any page in memory to determine whether it was currently mapped or not, without giving the OS any opportunity to intervene. Currently it is only
|
By
Jonathan Behrens <behrensj@...>
·
#213
·
|
|
Re: Fault-Only-First Indexed Loads Instructions
while (True) {
if (unLo > unHi) break;
n = ((Int32)block[ptr[unLo]+d]) - med;
if (n == 0) {
mswap(ptr[unLo], ptr[ltLo]);
ltLo++;
while (True) {
if (unLo > unHi) break;
n = ((Int32)block[ptr[unLo]+d]) - med;
if (n == 0) {
mswap(ptr[unLo], ptr[ltLo]);
ltLo++;
|
By
lidawei14@...
·
#212
·
|
|
Duplicate Counting Instruction
Hi all,
For some certain cases such as histogram we might have duplicate runtime
memory dependences, and the current V extension may fail to vectorize such
cases. Therefore, I would like to propose
Hi all,
For some certain cases such as histogram we might have duplicate runtime
memory dependences, and the current V extension may fail to vectorize such
cases. Therefore, I would like to propose
|
By
lidawei14@...
·
#211
·
|
|
Re: Whole Register Loads and Stores
Yeah. I'm thinking into the future where pressure to avoid spilling over into 48-/64-bit instruction encodings will use up more code points in the 32b load/store encoding space, reducing their
Yeah. I'm thinking into the future where pressure to avoid spilling over into 48-/64-bit instruction encodings will use up more code points in the 32b load/store encoding space, reducing their
|
By
andrew@...
·
#210
·
|
|
Re: Whole Register Loads and Stores
On 6/15/20 11:14 PM, Andrew Waterman wrote:
I've also thought about prediction. If it works, it's just like a whole register load always of the correct size. The inserted cast will almost never be
On 6/15/20 11:14 PM, Andrew Waterman wrote:
I've also thought about prediction. If it works, it's just like a whole register load always of the correct size. The inserted cast will almost never be
|
By
Bill Huffman
·
#209
·
|
|
Re: Whole Register Loads and Stores
Yeah, the existing unit-stride loads and stores are probably an unsuitable solution for this problem on statically scheduled wide-issue machines with short chimes.
Is a microarchitectural solution out
Yeah, the existing unit-stride loads and stores are probably an unsuitable solution for this problem on statically scheduled wide-issue machines with short chimes.
Is a microarchitectural solution out
|
By
andrew@...
·
#208
·
|