Cortex M4F, FPU, and ThreadX

user_108962310 · ‎Oct 30, 2017

Hi folks,

I would like to get the FPU working on the STM32F4xx chip that I am using, but I am stuck, and I believe the problem has to do with the version of ThreadX that is bundled with the SDK.

I found this old thread, but it seemed like there was no resolution: How to use HW FPU on STM32F4xx?

I have had partial success by adding a USE_FPU_CM4F flag to the build system, and then changing the chip-specific flags in wiced_toolchain_ARM_GNU.mk:

# Chip specific flags for GCC

ifeq ($(HOST_ARCH),ARM_CM4)

# flag added to support building for CM4 micros with FPU; define in platform makefile

ifeq ($(USE_FPU_CM4F),1)

__FPU_PRESENT := 1

__FPU_USED := 1

CPU_CFLAGS := -mthumb -mcpu=cortex-m4 -mfloat-abi=softfp

CPU_CXXFLAGS := -mthumb -mcpu=cortex-m4 -mfloat-abi=softfp

CPU_ASMFLAGS := -mcpu=cortex-m4 -mfloat-abi=softfp -mfpu=fpv4-sp-d16

CPU_LDFLAGS := -mthumb -mcpu=cortex-m4 -Wl,-A,thumb

else

CPU_CFLAGS := -mthumb -mcpu=cortex-m4

CPU_CXXFLAGS := -mthumb -mcpu=cortex-m4

CPU_ASMFLAGS := -mcpu=cortex-m4 -mfpu=softvfp

CPU_LDFLAGS := -mthumb -mcpu=cortex-m4 -Wl,-A,thumb

endif #USE_FPU_CM4F

endif #ARM_CM4

Then, in my platform makefile, I define USE_FPU_CM4F . The idea was that I wouldn't have to define a whole new CM4F variant in the build system, just tweak the CM4 one a little bit. Based on what I read about GCC flags, this should work, since soft-fp will still be link-compatible with existing libraries built without FPU support. And the choice of -mfpu was just from other STM32F4xx examples I found.

Then, in system_stm32f4xx.c, at the top of SystemInit(), FPU access is enabled:

/* FPU settings ------------------------------------------------------------*/

#if (__FPU_PRESENT == 1) && (__FPU_USED == 1)

SCB->CPACR |= ((3UL << 10*2)|(3UL << 11*2)); /* set CP10 and CP11 Full Access */

#endif

So, then the compiler does indeed generate FPU instructions, and they even appear to run OK.

BUT: the system hits a hard fault after running for not very long.
Using ETM on the target, I am seeing a crash that appears to originate in tx_thread_create, when the PC is popped, it gets a bad address, spirals out of control, and eventually ends up at a hard fault. See attached screenshot. I could provide a longer trace history privately, if that would help.

I contacted ExpressLogic to get some insight into this, and they indicate that for FPU usage, there must be a call to void tx_thread_fpu_enable(void) to set up correct FPU context saves.
Unfortunately, looking at the tx_port.h file provided in the WICED SDK, it seems like there is a single file that covers both CM3 and CM4, and there's no mention of this function anywhere. Looking at the headers from the objdump of ThreadX.ARM , there's no mention of the tx_thread_fpu_enable() function anywhere.
And the ThreadX User Guide doesn't mention it, either.

So ... any chance that there is a solution for this?

It seems like it is going to take an upgrade to ThreadX. Or possibly a rebuild of the existing archive(s), with the FPU enables functions not being dropped.

RaktimR_11 · ‎Nov 28, 2017

We understand the point that you are mentioning. I will put forward an internal request to enable the FPU support in the next version (5.8) of ThreadX library.

View solution in original post

AxLi_1746341 · ‎Oct 30, 2017

Does it work if you use FreeRTOS build?

user_108962310 · ‎Oct 31, 2017

I'll give FreeRTOS a shot, although I won't be changing underlying OS's for this current project, at this point. So, I would really like to find a solution for ThreadX.

The guys at ExpressLogic assured me that the tx_thread_fpu_enable function should exist in v5.6 of ThreadX. I just need the Cortex M4 version of tx_port.h and possibly a rebuild of the library archive with that function included (prevent optimization out).

user_108962310 · ‎Nov 02, 2017

mifo ^^^ (although the whole team is probably gearing up for the Shenzen event right now?)

RaktimR_11 · ‎Nov 27, 2017

Hello user_108962310

Right now the 5.2 version of ThreadX provided in WICED does not support the FPU as you mentioned. Could you please let us know that why your application requires an FPU so that we can try providing an alternative for the time being?

user_108962310 · ‎Nov 27, 2017

I don't strictly "need" the FPU working. When we did part selection, we ended up with a micro that had an FPU anyway, so we'd like to get working if we're already paying for it and it's already in the device.

No one killer requirement, but several smaller ones:
- We are doing some transforms using expf and scaling, and with output clamping in float format, it comes out to around 5,000 cycles. But this is only done at 1hz.

- A fair amount of math around LED fading, at 50fps. I've made one pass at implementing this using all q16 integer operations, but it's *a lot* to keep track of to artificially boost the dynamic range, which even then is still limited. It would have been nice to be able to just do it all with easy floats from the beginning ...

- Some filtering (moving window, IIR), again implemented in q8.8 to boost the dynamic range.
- Using some advanced CMSIS-DSP functions which are available in q15, q31, and float ; q15 and q31 have been evaluated, but I'd like to try the float library, too.

- Flash size; we are nearing our on-chip flash size, and tests have suggested that the output is a couple of KB smaller if all the IEEE functions can be dropped in favor of FPU instructions.

Those are all issues where the workarounds are already developed and shipping.
But, tbh, having to do all of said workarounds on a modern high-feature 32bit micro is kind of a let-down and it ups the NRE's for me.

RaktimR_11 · ‎Nov 28, 2017

We understand the point that you are mentioning. I will put forward an internal request to enable the FPU support in the next version (5.8) of ThreadX library.

user_108962310 · ‎Jan 23, 2018

Is there an ETA on the ThreadX 5.8 incorporation into an SDK release?

I am about to start building out a feature for which it would be *way* better to use f32 throughout, rather than q31 and all the mitigation for dealing with the loss of precision in FFT operations

AxLi_1746341 · ‎Feb 23, 2018

Raktim Roy wrote:
We understand the point that you are mentioning. I will put forward an internal request to enable the FPU support in the next version (5.8) of ThreadX library.

rroy

Can you confirm if sdk-6.1 (with ThreadXv5.8) have FPU support enabled or not?

The changelog does not mention about that.

RaktimR_11 · ‎Feb 25, 2018

ThreadX 5.8 does not have FPU support enabled in SDK 6.1. It's still a work in progress which has not yet been integrated.

AxLi_1746341 · ‎Oct 06, 2021

@RaktimR_11 wrote:

ThreadX 5.8 does not have FPU support enabled in SDK 6.1. It's still a work in progress which has not yet been integrated.

So how about sdk6.6.1? Does it enable FPU? (The changelog does not mention this)

Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX

Re: Cortex M4F, FPU, and ThreadX