9 Replies Latest reply on Feb 25, 2018 9:25 PM by RaktimR_11

    Cortex M4F, FPU, and ThreadX

    user_108962310

      Hi folks,

       

      I would like to get the FPU working on the STM32F4xx chip that I am using, but I am stuck, and I believe the problem has to do with the version of ThreadX that is bundled with the SDK.

       

      I found this old thread, but it seemed like there was no resolution: How to use HW FPU on STM32F4xx?

       

      I have had partial success by adding a USE_FPU_CM4F flag to the build system, and then changing the chip-specific flags in wiced_toolchain_ARM_GNU.mk:

       

      # Chip specific flags for GCC

      ifeq ($(HOST_ARCH),ARM_CM4)

      # flag added to support building for CM4 micros with FPU; define in platform makefile

      ifeq ($(USE_FPU_CM4F),1)

      __FPU_PRESENT := 1

      __FPU_USED := 1

      CPU_CFLAGS     := -mthumb -mcpu=cortex-m4 -mfloat-abi=softfp

      CPU_CXXFLAGS   := -mthumb -mcpu=cortex-m4 -mfloat-abi=softfp

      CPU_ASMFLAGS   := -mcpu=cortex-m4 -mfloat-abi=softfp -mfpu=fpv4-sp-d16

      CPU_LDFLAGS    := -mthumb -mcpu=cortex-m4 -Wl,-A,thumb

      else

      CPU_CFLAGS     := -mthumb -mcpu=cortex-m4

      CPU_CXXFLAGS   := -mthumb -mcpu=cortex-m4

      CPU_ASMFLAGS   := -mcpu=cortex-m4 -mfpu=softvfp

      CPU_LDFLAGS    := -mthumb -mcpu=cortex-m4 -Wl,-A,thumb

      endif #USE_FPU_CM4F

      endif #ARM_CM4

       

      Then, in my platform makefile, I define USE_FPU_CM4F . The idea was that I wouldn't have to define a whole new CM4F variant in the build system, just tweak the CM4 one a little bit. Based on what I read about GCC flags, this should work, since soft-fp will still be link-compatible with existing libraries built without FPU support. And the choice of -mfpu was just from other STM32F4xx examples I found.

       

      Then, in system_stm32f4xx.c,  at the top of SystemInit(), FPU access is enabled:

      /* FPU settings ------------------------------------------------------------*/

      #if (__FPU_PRESENT == 1) && (__FPU_USED == 1)

          SCB->CPACR |= ((3UL << 10*2)|(3UL << 11*2));  /* set CP10 and CP11 Full Access */

      #endif

       

      So, then the compiler does indeed generate FPU instructions, and they even appear to run OK.


      BUT: the system hits a hard fault after running for not very long.
      Using ETM on the target, I am seeing a crash that appears to originate in tx_thread_create, when the PC is popped, it gets a bad address, spirals out of control, and eventually ends up at a hard fault. See attached screenshot. I could provide a longer trace history privately, if that would help.

       

      I contacted ExpressLogic to get some insight into this, and they indicate that for FPU usage, there must be a call to void tx_thread_fpu_enable(void) to set up correct FPU context saves.
      Unfortunately, looking at the tx_port.h file provided in the WICED SDK, it seems like there is a single file that covers both CM3 and CM4, and there's no mention of this function anywhere. Looking at the headers from the objdump of ThreadX.ARM , there's no mention of the tx_thread_fpu_enable() function anywhere.
      And the ThreadX User Guide doesn't mention it, either.

       

      So ... any chance that there is a solution for this?

      It seems like it is going to take an upgrade to ThreadX. Or possibly a rebuild of the existing archive(s), with the FPU enables functions not being dropped.

        • 1. Re: Cortex M4F, FPU, and ThreadX
          AxLi_1746341

          Does it work if you use FreeRTOS build?

          • 2. Re: Cortex M4F, FPU, and ThreadX
            user_108962310

            I'll give FreeRTOS a shot, although I won't be changing underlying OS's for this current project, at this point. So, I would really like to find a solution for ThreadX.

             

            The guys at ExpressLogic assured me that the tx_thread_fpu_enable function should exist in v5.6 of ThreadX. I just need the Cortex M4 version of tx_port.h and possibly a rebuild of the library archive with that function included (prevent optimization out).

            • 3. Re: Cortex M4F, FPU, and ThreadX
              user_108962310

              MichaelF_56  ^^^ (although the whole team is probably gearing up for the Shenzen event right now?)

              • 4. Re: Cortex M4F, FPU, and ThreadX
                RaktimR_11

                Hello user_108962310

                 

                Right now the 5.2 version of ThreadX provided in WICED does not support the FPU as you mentioned. Could you please let us know that why your application requires an FPU so that we can try providing an alternative for the time being?

                1 of 1 people found this helpful
                • 5. Re: Cortex M4F, FPU, and ThreadX
                  user_108962310

                  I don't strictly "need" the FPU working. When we did part selection, we ended up with a micro that had an FPU anyway, so we'd like to get working if we're already paying for it and it's already in the device.

                   

                  No one killer requirement, but several smaller ones:
                  - We are doing some transforms using expf and scaling, and with output clamping in float format, it comes out to around 5,000 cycles. But this is only done at 1hz.

                  - A fair amount of math around LED fading, at 50fps. I've made one pass at implementing this using all q16 integer operations, but it's *a lot* to keep track of to artificially boost the dynamic range, which even then is still limited. It would have been nice to be able to just do it all with easy floats from the beginning ...

                  - Some filtering (moving window, IIR), again implemented in q8.8 to boost the dynamic range.
                  - Using some advanced CMSIS-DSP functions which are available in q15, q31, and float ; q15 and q31 have been evaluated, but I'd like to try the float library, too.

                  - Flash size; we are nearing our on-chip flash size, and tests have suggested that the output is a couple of KB smaller if all the IEEE functions can be dropped in favor of FPU instructions.


                  Those are all issues where the workarounds are already developed and shipping.
                  But, tbh, having to do all of said workarounds on a modern high-feature 32bit micro is kind of a let-down and it ups the NRE's for me.

                  • 6. Re: Cortex M4F, FPU, and ThreadX
                    RaktimR_11

                    We understand the point that you are mentioning. I will put forward an internal request to enable the FPU support in the next version (5.8) of  ThreadX library.

                    2 of 2 people found this helpful
                    • 7. Re: Cortex M4F, FPU, and ThreadX
                      user_108962310

                      Is there an ETA on the ThreadX 5.8 incorporation into an SDK release?

                       

                      I am about to start building out a feature for which it would be *way* better to use f32 throughout, rather than q31 and all the mitigation for dealing with the loss of precision in FFT operations

                      • 8. Re: Cortex M4F, FPU, and ThreadX
                        AxLi_1746341

                        Raktim Roy wrote:

                         

                        We understand the point that you are mentioning. I will put forward an internal request to enable the FPU support in the next version (5.8) of  ThreadX library.

                        RaktimR_11

                         

                        Can you confirm if sdk-6.1 (with ThreadXv5.8) have FPU support enabled or not?

                        The changelog does not mention about that.

                        • 9. Re: Cortex M4F, FPU, and ThreadX
                          RaktimR_11

                          ThreadX 5.8 does not have FPU support enabled in SDK 6.1. It's still a work in progress which has not yet been integrated.