mtvec question


Joe Xie
 

Hi Andrew, all,

 

The current priv spec reserves lower 2bits of mtvec (ad stvec) to indicate vectored interrupts, there’s an issue that if exception handler is word aligned but SW mis-program the lower 2bits to be non-0 value, it is still a valid value.

 

So now exception will start from an incorrect PC, and depending on the instruction word on that address you will see weird behaviors or the core will just fall into infinite loop. It is very annoying to debug this kind of issue.

 

Any advice on how to overcome the issue other than SW check?

 

 

 

 


Andrew Waterman
 

I have been bitten by this, too, but I have little in the way of advice.

There are various software approaches to reduce the likelihood of encountering this problem, even if the programmer forgets to insert the alignment directive.  The first one that comes to mind is to put a trap handler in its own ELF section so that the linker script can forcibly align them.

Unfortunately, sometimes the best you can do with these nitty-gritty low-level systems programming issues is: "don't fuck up".  Debugging them is inherently painful.

On Thu, Jun 18, 2020 at 6:22 PM Joe Xie <joxie@...> wrote:

Hi Andrew, all,

 

The current priv spec reserves lower 2bits of mtvec (ad stvec) to indicate vectored interrupts, there’s an issue that if exception handler is word aligned but SW mis-program the lower 2bits to be non-0 value, it is still a valid value.

 

So now exception will start from an incorrect PC, and depending on the instruction word on that address you will see weird behaviors or the core will just fall into infinite loop. It is very annoying to debug this kind of issue.

 

Any advice on how to overcome the issue other than SW check?

 

 

 

 


Joe Xie
 

Are we going to use bit1 soon in the future? We are wondering if we can use bit1 to indicate there’s illegal value (WLRL) – if bit1 is written with 1 then fire exception.

 

From: <tech-privileged@...> on behalf of Andrew Waterman <andrew@...>
Date: Friday, June 19, 2020 at 9:37 AM
To: Joe Xie <joxie@...>
Cc: "tech-privileged@..." <tech-privileged@...>, "James Xu (SW-GPU)" <jamesx@...>, Lucien Dunning <ldunning@...>
Subject: Re: [RISC-V] [tech-privileged] mtvec question

 

External email: Use caution opening links or attachments

 

I have been bitten by this, too, but I have little in the way of advice.

 

There are various software approaches to reduce the likelihood of encountering this problem, even if the programmer forgets to insert the alignment directive.  The first one that comes to mind is to put a trap handler in its own ELF section so that the linker script can forcibly align them.


Unfortunately, sometimes the best you can do with these nitty-gritty low-level systems programming issues is: "don't fuck up".  Debugging them is inherently painful.

 

On Thu, Jun 18, 2020 at 6:22 PM Joe Xie <joxie@...> wrote:

Hi Andrew, all,

 

The current priv spec reserves lower 2bits of mtvec (ad stvec) to indicate vectored interrupts, there’s an issue that if exception handler is word aligned but SW mis-program the lower 2bits to be non-0 value, it is still a valid value.

 

So now exception will start from an incorrect PC, and depending on the instruction word on that address you will see weird behaviors or the core will just fall into infinite loop. It is very annoying to debug this kind of issue.

 

Any advice on how to overcome the issue other than SW check?

 

 

 

 


Andrew Waterman
 


On Thu, Jun 18, 2020 at 10:16 PM Joe Xie <joxie@...> wrote:

Are we going to use bit1 soon in the future? We are wondering if we can use bit1 to indicate there’s illegal value (WLRL) – if bit1 is written with 1 then fire exception.

 

From: <tech-privileged@...> on behalf of Andrew Waterman <andrew@...>
Date: Friday, June 19, 2020 at 9:37 AM
To: Joe Xie <joxie@...>
Cc: "tech-privileged@..." <tech-privileged@...>, "James Xu (SW-GPU)" <jamesx@...>, Lucien Dunning <ldunning@...>
Subject: Re: [RISC-V] [tech-privileged] mtvec question

 

External email: Use caution opening links or attachments

 

I have been bitten by this, too, but I have little in the way of advice.

 

There are various software approaches to reduce the likelihood of encountering this problem, even if the programmer forgets to insert the alignment directive.  The first one that comes to mind is to put a trap handler in its own ELF section so that the linker script can forcibly align them.


Unfortunately, sometimes the best you can do with these nitty-gritty low-level systems programming issues is: "don't fuck up".  Debugging them is inherently painful.

 

On Thu, Jun 18, 2020 at 6:22 PM Joe Xie <joxie@...> wrote:

Hi Andrew, all,

 

The current priv spec reserves lower 2bits of mtvec (ad stvec) to indicate vectored interrupts, there’s an issue that if exception handler is word aligned but SW mis-program the lower 2bits to be non-0 value, it is still a valid value.

 

So now exception will start from an incorrect PC, and depending on the instruction word on that address you will see weird behaviors or the core will just fall into infinite loop. It is very annoying to debug this kind of issue.

 

Any advice on how to overcome the issue other than SW check?

 

 

 

 


Joe Xie
 

Lol

 

Do you feel it is worth to add a bit in sstatus to restrict csrw stvec to 1) mask bit1~0; 2) fire an exception when writing non-0 value to bit1~0?

 

A separate elf section can work in some cases, however the concern is that it may be difficult to force everyone to follow the guidance and it is pretty annoying to debug the issue on Silicon – It is a debug nightmare if that instruction is a jmp to some random address.

 


From: Andrew Waterman <andrew@...>
Sent: Friday, June 19, 2020 1:31:19 PM
To: Joe Xie <joxie@...>
Cc: tech-privileged@... <tech-privileged@...>; James Xu (SW-GPU) <jamesx@...>; Lucien Dunning <ldunning@...>
Subject: Re: [RISC-V] [tech-privileged] mtvec question

 

External email: Use caution opening links or attachments

 

On Thu, Jun 18, 2020 at 10:16 PM Joe Xie <joxie@...> wrote:

Are we going to use bit1 soon in the future? We are wondering if we can use bit1 to indicate there’s illegal value (WLRL) – if bit1 is written with 1 then fire exception.

 

From: <tech-privileged@...> on behalf of Andrew Waterman <andrew@...>
Date: Friday, June 19, 2020 at 9:37 AM
To: Joe Xie <joxie@...>
Cc: "tech-privileged@..." <tech-privileged@...>, "James Xu (SW-GPU)" <jamesx@...>, Lucien Dunning <ldunning@...>
Subject: Re: [RISC-V] [tech-privileged] mtvec question

 

External email: Use caution opening links or attachments

 

I have been bitten by this, too, but I have little in the way of advice.

 

There are various software approaches to reduce the likelihood of encountering this problem, even if the programmer forgets to insert the alignment directive.  The first one that comes to mind is to put a trap handler in its own ELF section so that the linker script can forcibly align them.


Unfortunately, sometimes the best you can do with these nitty-gritty low-level systems programming issues is: "don't fuck up".  Debugging them is inherently painful.

 

On Thu, Jun 18, 2020 at 6:22 PM Joe Xie <joxie@...> wrote:

Hi Andrew, all,

 

The current priv spec reserves lower 2bits of mtvec (ad stvec) to indicate vectored interrupts, there’s an issue that if exception handler is word aligned but SW mis-program the lower 2bits to be non-0 value, it is still a valid value.

 

So now exception will start from an incorrect PC, and depending on the instruction word on that address you will see weird behaviors or the core will just fall into infinite loop. It is very annoying to debug this kind of issue.

 

Any advice on how to overcome the issue other than SW check?

 

 

 

 


Andrew Waterman
 

I think this is one of dozens of little mistakes you can make in bare-metal RISC-V programming, and adding an sstatus bit for it is IMO not a great allocation of resources.

Hopefully you are developing your M-mode code with the help of a software simulator, in which case you could just add a feature to your software simulator to catch writes to mtvec that set mtvec[1] and issue a warning to the programmer.

On Thu, Jun 18, 2020 at 11:29 PM Joe Xie <joxie@...> wrote:

Lol

 

Do you feel it is worth to add a bit in sstatus to restrict csrw stvec to 1) mask bit1~0; 2) fire an exception when writing non-0 value to bit1~0?

 

A separate elf section can work in some cases, however the concern is that it may be difficult to force everyone to follow the guidance and it is pretty annoying to debug the issue on Silicon – It is a debug nightmare if that instruction is a jmp to some random address.

 


From: Andrew Waterman <andrew@...>
Sent: Friday, June 19, 2020 1:31:19 PM
To: Joe Xie <joxie@...>
Cc: tech-privileged@... <tech-privileged@...>; James Xu (SW-GPU) <jamesx@...>; Lucien Dunning <ldunning@...>
Subject: Re: [RISC-V] [tech-privileged] mtvec question

 

External email: Use caution opening links or attachments

 

 

On Thu, Jun 18, 2020 at 10:16 PM Joe Xie <joxie@...> wrote:

Are we going to use bit1 soon in the future? We are wondering if we can use bit1 to indicate there’s illegal value (WLRL) – if bit1 is written with 1 then fire exception.

 

From: <tech-privileged@...> on behalf of Andrew Waterman <andrew@...>
Date: Friday, June 19, 2020 at 9:37 AM
To: Joe Xie <joxie@...>
Cc: "tech-privileged@..." <tech-privileged@...>, "James Xu (SW-GPU)" <jamesx@...>, Lucien Dunning <ldunning@...>
Subject: Re: [RISC-V] [tech-privileged] mtvec question

 

External email: Use caution opening links or attachments

 

I have been bitten by this, too, but I have little in the way of advice.

 

There are various software approaches to reduce the likelihood of encountering this problem, even if the programmer forgets to insert the alignment directive.  The first one that comes to mind is to put a trap handler in its own ELF section so that the linker script can forcibly align them.


Unfortunately, sometimes the best you can do with these nitty-gritty low-level systems programming issues is: "don't fuck up".  Debugging them is inherently painful.

 

On Thu, Jun 18, 2020 at 6:22 PM Joe Xie <joxie@...> wrote:

Hi Andrew, all,

 

The current priv spec reserves lower 2bits of mtvec (ad stvec) to indicate vectored interrupts, there’s an issue that if exception handler is word aligned but SW mis-program the lower 2bits to be non-0 value, it is still a valid value.

 

So now exception will start from an incorrect PC, and depending on the instruction word on that address you will see weird behaviors or the core will just fall into infinite loop. It is very annoying to debug this kind of issue.

 

Any advice on how to overcome the issue other than SW check?

 

 

 

 


Allen Baum
 

I think the solution is even simpler.

Even if MTVEC[1]  had not been used, data dependent traps are prohibited in Risc-V (that might be too strong a word; I don't know that its explicited prohibited, but certainly is discouraged for good reason)
Note that Divide by 0 doesn't trap, and if anything would trap, you would think that would be first in line.

Nevertheless - MTVEC[1:0] is WARL, and you can restrict the legal values as you see fit (that is, transform anything that is not legal into a legal value, which can include leaving the value unchanged) - but you cannot trap.

And, the implementation gets to decide  what is legal.
Your implementation can declare that mtvec[1] is read-only zero (and so won't support CLIC)
Your implementation can declare that attempts to write 10 or 11 will be ignored, or will be converted to the legal 00 and 01 or even always to 00, or declare that only 00 is legal (If you didn't support vectored interrupts) so it is read-only 0.

So you can prevent that case you're worried about fairly easily.

You could go further and restrict the address to be cache line aligned regradless of vectoring (so mtvec [5:2]=0; or page aligned (mtvec[11:2]=0) (see the last note in priv spec 3.1.7 that discusses this).


On Fri, Jun 19, 2020 at 12:01 AM Andrew Waterman <andrew@...> wrote:
I think this is one of dozens of little mistakes you can make in bare-metal RISC-V programming, and adding an sstatus bit for it is IMO not a great allocation of resources.

Hopefully you are developing your M-mode code with the help of a software simulator, in which case you could just add a feature to your software simulator to catch writes to mtvec that set mtvec[1] and issue a warning to the programmer.

On Thu, Jun 18, 2020 at 11:29 PM Joe Xie <joxie@...> wrote:

Lol

 

Do you feel it is worth to add a bit in sstatus to restrict csrw stvec to 1) mask bit1~0; 2) fire an exception when writing non-0 value to bit1~0?

 

A separate elf section can work in some cases, however the concern is that it may be difficult to force everyone to follow the guidance and it is pretty annoying to debug the issue on Silicon – It is a debug nightmare if that instruction is a jmp to some random address.

 


From: Andrew Waterman <andrew@...>
Sent: Friday, June 19, 2020 1:31:19 PM
To: Joe Xie <joxie@...>
Cc: tech-privileged@... <tech-privileged@...>; James Xu (SW-GPU) <jamesx@...>; Lucien Dunning <ldunning@...>
Subject: Re: [RISC-V] [tech-privileged] mtvec question

 

External email: Use caution opening links or attachments

 

 

On Thu, Jun 18, 2020 at 10:16 PM Joe Xie <joxie@...> wrote:

Are we going to use bit1 soon in the future? We are wondering if we can use bit1 to indicate there’s illegal value (WLRL) – if bit1 is written with 1 then fire exception.

 

From: <tech-privileged@...> on behalf of Andrew Waterman <andrew@...>
Date: Friday, June 19, 2020 at 9:37 AM
To: Joe Xie <joxie@...>
Cc: "tech-privileged@..." <tech-privileged@...>, "James Xu (SW-GPU)" <jamesx@...>, Lucien Dunning <ldunning@...>
Subject: Re: [RISC-V] [tech-privileged] mtvec question

 

External email: Use caution opening links or attachments

 

I have been bitten by this, too, but I have little in the way of advice.

 

There are various software approaches to reduce the likelihood of encountering this problem, even if the programmer forgets to insert the alignment directive.  The first one that comes to mind is to put a trap handler in its own ELF section so that the linker script can forcibly align them.


Unfortunately, sometimes the best you can do with these nitty-gritty low-level systems programming issues is: "don't fuck up".  Debugging them is inherently painful.

 

On Thu, Jun 18, 2020 at 6:22 PM Joe Xie <joxie@...> wrote:

Hi Andrew, all,

 

The current priv spec reserves lower 2bits of mtvec (ad stvec) to indicate vectored interrupts, there’s an issue that if exception handler is word aligned but SW mis-program the lower 2bits to be non-0 value, it is still a valid value.

 

So now exception will start from an incorrect PC, and depending on the instruction word on that address you will see weird behaviors or the core will just fall into infinite loop. It is very annoying to debug this kind of issue.

 

Any advice on how to overcome the issue other than SW check?