Troubleshooting Guide for Arm Abort Exceptions in Traveo I MCUs - KBA224420

Community-Team · ‎Jul 02, 2018

Author: maheshb_81 NambirajanN_11 Version: **

This guide explains how different Arm^® Cortex^® R5 Abort exceptions can be analyzed and managed in Traveo I family MCUs.

1. What are exceptions?

Exceptions occur when the normal flow of a program must temporarily halt, for example, to service an interrupt from a peripheral. Before attempting to handle an exception, the processor preserves the critical parts of the current processor state so that the original program can resume when the handler routine has finished.

In practical situations, exceptions can be mainly categorized into the following:

Interrupts (Normal Interrupts IRQs and Fast Interrupts FIQs/NMIs)
Aborts
Undefined Instruction (UNDEF) exceptions

1.1 Interrupts

Interrupts are comparatively easy to handle because designers have control on how to service interrupts based on the application. Even non maskable interrupts (NMIs or FIQs) can be managed because designers can easily track the source of any NMI in Traveo MCUs using the IRC NMI Status Register (IRCn_NMIST).

1.2 Aborts

Aborts are usually unintended exceptions resulting due to invalid or unsuccessful access of memory. Some of the causes for aborts are as follows:

Permission fault indicated by the MPU (Memory Protection Unit)
Error detected in the data by the ECC checking logic

This article guides you in understanding the nature of aborts generated and preventing them.

1.3 Undefined Instruction Exceptions (UNDEF)

An Undefined Instruction Exception can happen if the CPU does not understand the fetched instruction.

The following sections are based on the assumption that you are familiar with the Cortex-R5 and Cortex-R5F Technical Reference Manual or can refer it when needed, and GHS (Green Hills Software) has been used for code development and testing.

2. Did code execution really end in an abort?

As the first step, verify whether code execution really ended in an abort. When an abort happens, the program gets halted at the Exception Vector Table.

When an abort occurs, Program Counter halts at the address 0xFFFF00xx if a breakpoint is set at the exception vector address or vector catch (explained in later section of the article) is enabled. The last two nibbles in the address (0xFFFF00xx) indicates the type of abort as shown in Table 1:

Table 1. Exception Vectors

For example, in Figure 1, the abort is a Prefetch Abort as the Offset to the Halted Program Counter is 0x0C.

Figure 1. Example of Prefetch Abort

All exceptions end up in the address specified in the Exception Vector Table (See Table 1). The program execution can then branch to application -specific handlers. The branch target address differs based on the application. Two such examples are discussed below:

Example 1:

An application may use the default exception handlers designed as a trap where the execution gets stuck. For example, see the function defined below by the application to trap a prefetch abort:

static void DefaultPrefetchAbortExceptionHandler(void)

{

while (1)

{

NOP ();

}

This happens if there is a re-mapping of the BootROM address specified in the Exception Vector Table to the application exception handlers (like DefaultPrefetchAbortExceptionHandler used in the function listed above) by the application startup code. For details on configuration of exception handlers, see the “BootROM Hardware Interface” chapter in the Platform Hardware Manual.

In such cases, execution gets stuck in the handler, so Program Counter (PC) will have the address where the handler is placed.

Example 2:

An implementation may have advanced exception handling capabilities in the OS, where the details of the exception will be read and a corresponding error code will be notified by the custom OS error handler. In such cases, details of the error codes can be checked to understand the actual exception which was triggered. R13, R14, and SPSR registers of the corresponding exception can be read for debugging the issue.

There are two important Arm^® Cortex^® R5 registers that can also be used to confirm the current state of the processor:

Current Program Status Register (CPSR)

Use CPSR to verify the current mode of the processor. Figure 2 shows the bit arrangement in CPSR:

Figure 2. CPSR Bit Arrangement

The mode bits of the CPSR register can be used to check if the current mode is Abort.

Table 2. CPSR Mode Bits

Saved Program Status Register (SPSR)

Use SPSR to check the previous mode just before entering the exception. For example, if the processor moves from System to Abort Mode (in the case of an exception), SPSR shows the mode as “System” while CPSR shows the mode as “Abort”.

The bit definitions of SPSR register are the same as that of the CPSR register.

R14 Register

Use the R14 register to find the actual instruction or function call that caused the abort. The actual address of the instruction that triggered the Exception will be R14 – x, where “x” depends on the type of exception. For details, see Table 3.4, “Exception Entry and Exit” in the Cortex-R5 and Cortex-R5F Technical Reference Manual.

Note that this value provides the address of the actual instruction that triggered the exception only in the case of synchronous aborts. The actual address of the memory, which when accessed resulted in abort, will be analyzed using different registers and is explained in the subsequent sections of the article.

For more details and usage of the CPSR, SPSR and R14 registers in Aborts, see the sections “Taking an exception” and “Leaving an exception” in the Cortex-R5 and Cortex-R5F Technical Reference Manual.

Note: GHS debugger offers a command to enable/disable vector/exception catch:

target set catch_type on|off

where type is undef, abort, prefetch, fiq, or svc

undef: Undefined Instruction Exception

abort: Data Abort

prefetch: Prefetch Abort

fiq: NMI in Traveo

For example, to catch a prefetch abort, use the following command:

target set catch_prefetch on

The advantage of stopping directly on abort occurrence and not putting a breakpoint anywhere in the exception handler is that no CPU registers are modified by the exception handler code itself. This increases the chance of retrieving valuable information about the program state at the time of the exception. Once it is confirmed that the abort occurred, you can analyze what exactly caused the abort.

3. Data Abort Exception

A Data Abort Exception is a response by a memory system to an invalid data access. If the exception is confirmed to be a Data Abort, as the first step, check the value of the Data Fault Status Register (DFSR) of the Cortex R5 CPU.

DFSR Register

Figure 3 shows the DFSR register bit assignments:

Figure 3. DFSR Bit Arrangement

Use the “S” Bit [10] and “Status Bits” [0:3] to understand the nature of the Data Abort. See Table 3.

Table 3. DFSR Mode Status Bits

SD Bit

The SD Bit distinguishes between an AXI Decode or Slave error on an external abort. This bit is valid only for external aborts. For all other types of abort, this bit is set to zero:

0 = AXI Decode error (DECERR) or AHB error caused the abort

1 = AXI Slave error (SLVERR) or unsupported exclusive access caused the abort. Example: exclusive access using the AHB peripheral port

RW Bit

The RW bit indicates whether a read or write access caused the abort.

0 = read access caused the abort

1 = write access caused the abort

ADFSR Register

Use the ADFSR (Auxiliary DFSR) register for additional details on the Abort. See Cortex-R5 and Cortex-R5F Technical Reference Manual.

3.1 Common Types of Data Abort

Alignment: This indicates that the memory access does not follow alignment requirements, which differ according to the memory attribute of the region:

“Normal” supports unaligned support (This is configurable)
“Device” / “Strongly Ordered” supports only aligned access

This means that if a region is configured as “Strongly Ordered” and you try to do an Un-Aligned Memory access, an Alignment Data Abort occurs.

The default memory map of a region is shown in Table 7-1, “Default memory map” of Cortex-R5 and Cortex-R5F Technical Reference Manual.

Background: Memory Protection Unit (MPU) settings must be correct for any region that the CPU is going to access. If MPU settings are not defined correctly, the result can be a Background Data Abort. The need for MPU also depends on the mode of the CPU: Privileged or Non-Privileged. MPUs can be configured using c6 MPU Region Access Control Registers of Cortex-R5.

Permission: This can happen when MPU settings prevent the access of a region.

Synchronous/Asynchronous External: This happens when the access has been transferred from the CPU to the AXI/AHB Bus and encountered an error. This is the most common fault type that happens with Data Abort. If the Abort is Synchronous, you can check the actual memory address that when accessed resulted in Data Abort using Data Fault Address Register (DFAR), which holds the address of the fault when a synchronous abort occurred.

Synchronous/Asynchronous ECC: This happens if an ECC error is detected at TCM interfaces or in the cache.

3.2 Common examples of Data Abort Exception and handling them

3.2.1 Synchronous Faults

In general, “load” instructions (LDB, LDH, LDR, LDM/POP) from areas with any memory attribute or “store” instructions (STB, STH, STR, STM/PUSH) to areas with “Strongly Ordered” memory attributes causing an error are synchronous. DFAR shows the target address of the access. Also, as described in the previous section, R14_abt – 8 points to the instruction that caused that access. Figure 4 shows an example.

Figure 4. Synchronous Data Abort Example

A write operation shown below triggered the abort:

As shown in Figure 4, the DFAR register shows the address that triggered the Data Abort because it is a Synchronous Abort (verified using DFSR). R14_abt – 8 (0x05a00192) points to the instruction that caused that access. It shows an STR operation.

As the Memory Map of the Traveo MCU in the Series Hardware Manual shows, the address 0xFFFFF210 falls in the BootROM area and is not writable; therefore, the Write Operation triggered the exception.

Figure 5. BootROM Region in Traveo MCU

3.2.2 Asynchronous faults

Asynchronous faults are comparatively difficult to analyze because you cannot trace the exact location that resulted in the abort unlike the DFAR register that is used in Synchronous Faults. In general, “store” instructions (STB, STH, STR, STM/PUSH) to areas with “Normal” or “Device” memory attributes causing an error are asynchronous.

The following line of code will result in an Asynchronous Abort:

Figure 6. Asynchronous Data Abort Example

From the DFSR Register, you can confirm the following:

Asynchronous External Abort
Write Access
Decode Error

From the value of ADFSR, you can confirm that access was made via AXI Peripheral Port, i.e. in the case of Traveo MCU, to the address range: 0xb0000000 –0xb3ffffff.

Tracking the instruction that caused the abort

When you compute R14_abt – 8, the resultant value is 0x05a001e2. This is the only point near the instruction that caused the exception. As Figure 6 shows, this instruction at this address is a branch instruction which cannot cause Data Abort.
Near 0x05a001e2, you can find a “store” instruction at 0x05a001da, which can likely cause the exception.

The register R8 has not been modified after the instruction and before the exception occurred. So, it currently still holds the address used by that instruction.

Check the exact value of R8:

Figure 7. R8 register contents during Abort

R8 indeed holds an address inside the AXI Peripheral Port area (0xb0000000 –0xb3ffffff). The Traveo Memory Map shows that this area is reserved, which indicates the “Decode” error you saw in the Fault Status Register.

3.3 Changing an asynchronous abort to a synchronous abort

It is always easy if the abort is synchronous as the exact address whose access resulted in an abort can be checked using DFAR. If store instructions to peripheral areas cause the exception (confirmed by checking ADFSR), you can make all accesses synchronous by configuring the corresponding peripheral area as “Strongly Ordered” using the Arm MPU. However, this may impact the performance because a “Strongly Ordered” MPU attribute waits for the access to be completed before processing the next data access.

3.4 Aborts due to issues with MPU

Consider the following cases where the MPU can play a role in aborts.

3.4.1 When MPU of a region is not configured

You should define valid MPU settings for the regions accessed in the application so that CPU can access that region accordingly. If you do not define the MPU of a used region, it can cause a Background Fault Data Abort Exception depending on whether privileged access or non-privileged access is used:

For privileged accesses:

If the BR bit (Bit 17 of the SCTLR Arm register) is set, the default memory map serves as the background region for any access that does not hit a specified region; if the BR bit is 0, a Background Fault exception occurs for any access outside specified regions.

For non-privileged accesses:

A Background Fault exception occurs for any access outside specified MPU regions. To prevent a Background Fault exception for such accesses, define Region 0 as a Background region covering the entire memory map, which will then be used as the background region for regions outside defined MPUs.

3.4.2 Aborts due to issues with alignment

If you get an alignment fault (which is confirmed using the DFSR register), as the first step, check the target address that causes the fault by using the DFAR register. After this, check the memory attributes of this target address followed by the instruction that caused the access. The following possible reasons apply:

Wrong MPU attribute for the region
The instruction performing a wrong access which can be due to several reasons such as a corrupted pointer, or an incorrect function parameter that is used to create, or an offset a pointer

For proper operation or access of some of the Traveo peripherals such as Backup RAM, the MPU attribute of those peripherals must be configured as “Device” and/or “Strongly Ordered”.

In such cases, alignment aborts may occur if the compiler generates misaligned instructions. Therefore, you should instruct the compiler not to generate misaligned instructions using specific build options.

If you use the GHS compiler, disable the “misalign_pack” option (which may be enabled by default) by adding the “-no_misalign_pack” build option.

3.5.1 Accessing Traveo peripherals in User Mode without the correct Peripheral Protection Unit (PPU)

The CPU can operate in the System mode (Privileged Mode) or the User mode.

Access of peripherals in User mode can result in a Data Abort if the “PPU User Write or Access Attribute Register (PPU0_PUWAi)” is not correctly configured. See the Platform Manual for more details on this register.

The default or initial value of this register is "0", which indicates that in User mode, you cannot write to the peripheral until you enable the access by modifying this register to "1" for the channel you want to use. See the device’s Hardware Manual for the PPU Channel Number of a peripheral.

3.5.2 Accessing external memories with improper state

A Data Abort can happen when a HyperFlash device connected to the Traveo MCU is written to or read from while the Reset pin of the HyperFlash device is activated (the HyperFlash device is in Reset State).

In general, a data abort can also happen when you try to access a peripheral which is shut down for a RUN/PSS profile.

By default, these two examples result in asynchronous data aborts. You can make them synchronous by configuring the MPUs for the regions as “Strongly Ordered”.

4. Undefined Instruction Exception

Undefined instruction exception can occur if the CPU does not understand the fetched instruction.

There are no Fault Status and Fault address registers associated with this exception; only Link register (R14_und) provides relevant information.

4.1 Possible reasons for the execution of a faulty instruction

Branch to RAM code that has been corrupted or not yet initialized with required functions
Corrupted branch address.

Trying to execute code in the wrong instruction set or Thumb branch inside a 32-bit Thumb instruction.
Branch to non-code area or inline literals.
Return address on the stack has been corrupted (for example, stack overflow or pop/push count mismatch).
Function pointers not initialized or corrupted.

Issue with interworking (Thumb > Arm, Arm > Thumb)

4.2 Handling Undefined Instruction Exception in Traveo MCUs

1. Confirm whether the CPU control is stuck in an Undefined Instruction exception by checking the halt address with respect to the Base offset (0xFFFF0000). If the Offset is 0x04, then the control has ended in an Undefined Instruction Exception.

Figure 8. Undefined Instruction Exception

2. Check the value of the R14_und register. R14_und – X provides the address of the instruction which caused the undefined instruction exception. “X” depends on the previous mode (Arm or Thumb). See Table 3.4, “Exception Entry and Exit” in Cortex-R5 and Cortex-R5F Technical Reference Manual.

3. Check the mode in which the exception occurred by reading the SPSR_und register.

§ In the case of modes other than User, System or Supervisor, check whether appropriate handlers are properly configured, mapped, and whether the handler was triggered.

§ If the previous mode was IRQ mode, check the IRCn_IRQST register, which has the details of the IRQ that the CPU accepted most recently.

§ If the previous mode was NMI/FIQ, check the IRCn_NMIST register, which has the details of the NMI that the CPU accepted most recently.

§ Check the validity of the NMI/IRQ function definition, mapping, and contents of the triggered handler function.

4. Check the instruction that is available at the address read from R14_und – X.

§ If this is a valid instruction, check whether the mode used (ARM or THUMB) for execution is correct (A mode mismatch for a valid instruction can cause undefined instruction exception).

§ If the instruction is invalid, check the trace and call the stack to understand the path that led to the undefined instruction to check for address corruption or RAM corruption.

5. In the case of non-availability of trace, use the link register or stack pointer in the mode where the exception has occurred to determine the path leading to the exception.

5. Prefetch Abort Exception

Prefetch Abort (PABT) Exception occurs when an instruction fetch causes an error. When a Prefetch Abort occurs, the processor marks the prefetched instruction as invalid, but does not take the exception until the instruction is to be executed. If the instruction is not executed, for example because a branch occurs while it is in the pipeline, an abort does not occur. All prefetch aborts are synchronous.

The difference between Undefined Instruction Abort and Prefetch Abort exception is that in case of prefetch, CPU is unable to fetch the instruction from the address; in an Undefined Instruction Exception, the CPU does not know what the instruction does.

Prefetch Abort or Undefined Instruction Exception often follows a common path:

1. Return address being corrupted

2. General branch address is corrupted

If the instruction fetch does not go through, it results in a Prefetch Abort. If it goes through and finds an instruction which could not be resolved, the control ends up in Undefined Instruction Exception.

The reason for Prefetch Abort can be analyzed by reading the Instruction Fault Status Register (IFSR) and Instruction Fault Address Register (IFAR).

Figure 9 shows the IFSR Register Bit Assignments:

Figure 9. IFSR Bit Arrangement

IFAR contains the address where the CPU was trying to fetch an instruction from. The contents of IFAR is always valid for a Prefetch Abort, because all Prefetch Aborts are synchronous.

5.1 Possible reasons for the failure of instruction fetch

5.1.1 Improper MPU setting

If a permission fault has occurred based on the IFSR status, it is possible that one of the following conditions has occurred:

§ An instruction is being fetched from a location for which “Execute Never” attribute is set. The MPU region under which the address read from IFAR falls must be read.

§ The target address read from IFAR has “Device” or “Strongly-Ordered” memory attribute. This implicitly means that these areas do not have executable code.

Reasons for execution from faulty memory

Possible reasons are as follows:

§ Stack corruption/overflow (return address on the stack could be corrupted)

§ Code with function pointers not properly initialized.

5.1.2 Errors on instruction cache read

All parity or ECC errors detected on the instruction cache reads are correctable. If aborts are enabled, a synchronous prefetch abort exception occurs. The IFAR register provides the address that caused the error to be detected. The IFSR register indicates a parity error on read. The auxiliary FSR indicates that the error was in the cache and which cache Way the error was in. Also, CEC bits in the Auxiliary Control Register (ACTLR) control whether an abort is generated at all.

5.2 Handling Prefetch Abort Exception in Traveo MCUs

Use the same procedure to debug a prefetch abort as that used for undefined instruction exceptions. In this case, you have IFAR and IFSR registers provide additional information.

1. Confirm whether the CPU control is stuck in Prefetch Abort Exception by checking the halt address with respect to the Base offset: 0xFFFF0000. If the Offset is 0x0C, it indicates that the control has ended in a Prefetch Abort.

Figure 10. Prefetch Abort Exception

2. Check the status from IFSR and IFAR to determine the type of fault and the address leading to the abort.

§ In the case of a “permission” fault, find the region in which the address read from the IFAR register falls under. The region can be checked for MPU violations for code area. (Execute Never setting, Device, Strongly-ordered memory).

§ For other types of fault, further analysis would be required.

3. Check the mode in which the exception occurred by reading the SPSR_abt register.

§ For modes other than user, system, or supervisor, check whether appropriate handlers are properly defined, mapped, and if the handler was triggered.

§ If the previous mode was IRQ, check the IRCn_IRQST register, which has the details of the IRQ that the CPU accepted most recently.

§ If the previous mode was NMI/FIQ, check the IRCn_NMIST register, which has the details of the NMI that the CPU accepted most recently.

§ Check the validity of NMI/IRQ function definition, mapping, and contents of the triggered handler function.

4. Check the trace and call stack to understand the path that led to the faulty instruction address to check for address corruption or RAM corruption. Use Trace to find out the code that was executed before; data trace may also show data popped from the stack or loaded into the register used for branching.

5. If trace is unavailable, check the link register(R14) of the mode causing the abort or the vicinity of stack pointer(R13) of the mode causing the abort to determine the path leading to the exception.

5.3 Example: Prefetch Abort Exception

The following example demonstrates the steps to debug a Prefetch Abort. Here, CPU execution was stuck in the Prefetch Abort handler. Relevant register values are as follows:

SPSR_Abt: 0x600001F1: Mode – (10001) FIQ mode. This implies that the CPU was in FIQ/NMI mode when the abort was triggered.

Figure 11. SPSR during abort

IFSR: 0x00000008: The status indicates a synchronous external abort. The address captured in IFAR is valid and is the actual address that led to the abort.

Figure 12. IFSR showing a Synchronous External Abort

IFAR: 0x00450BBA: This address falls under the reserved address space. The instruction was being fetched from a reserved address space resulting in the abort.

Next level of investigation:

1. Check the actual NMI triggered in the Traveo MCU using the IRC NMI Status Register (IRCn_NMIST).

2. Check the configuration of the NMI handler. . Note that NMI handlers cannot be declared directly like normal functions. It must be declared with proper attributes, depending on the compiler used. For example, in the case of the GHS compiler, the “interrupt” attribute should be used as follows:

__interrupt void NMIx_Handler ()

This attribute instructs the compiler to generate a function in a manner that is suitable for use as an exception handler. It affects the alignment of the stack pointer on entry, the register set to be saved, and the exception return path. If the attribute is not specified, the behavior is not predictable when the handler is executed; this can lead to prefetch abort.

3. In the example under consideration, the NMI handler was declared like a normal function. This led to an improper return path leading to the prefetch abort.

This example is applicable to “Undefined Instruction” Exception also. Based on the error in the return path, it could either have resulted in a “Prefetch Abort” or “Undefined Instruction” exception.