1 2 Previous Next 15 Replies Latest reply on Nov 20, 2016 10:28 PM by pach_1977636

    PSOC 4 inline assembly ARM or Thumb?

      Hi folks,


      I'm trying to embed some inline assembly in my project to work around areas where I can't get the compiler and optimizer to do what I want.


      I pulled the lst file for the main loop, and I can see a couple of places that I want to tweak, to save a few precious clock cycles. What has me confused is that the lst formatting looks like ARM assembly, but AN89610 (Code Optimization document for PSOC 4) says that PSOC 4 uses Thumb 2.


      Is there a way to choose ARM or Thumb?


      I modified the main loop portion of the lst code, and swapped it in within asm(" .... "); The result is a "Cannot represent THUMB_OFFSET relocation in this object file" error during build. There's a referenced .s file and line number, but I can't find the actual file. I'm guessing it's just temporary during the build process....? 


      At this point, I'm getting almost the performance I need from my solution in C, but I have tried many variants, and only been able to find options that don't work. I do think I need the extra savings I can get from using assembly in the critical sections.


      Thanks for the help,


      Edit: I think I see the cause of the error: I missed that an array I'm using is represented as a label within the assembly block. I'll need to fix the assembly to correct address the array. I'm still rather confused about the ARM vs. Thumb stuff, though.





        • 1. Re: PSOC 4 inline assembly ARM or Thumb?

          OK. Here's what I'm seeing now.


          The label I had overlooked (.L61 in my project) is the base address for the addresses of the port status and data registers. Some of these are pre-loaded into registers before the block of code that I'd like to modify, but a couple of them are read "on the fly" (.L61+16 or .L61+20).


          Any recommendations on accessing the port registers reliably through inline assembly? I would think even the pre-loaded ones should be something I'm not counting on.

          • 2. Re: PSOC 4 inline assembly ARM or Thumb?

            I think in CortexM (ARMv7) devices you can only go for Thumb instruction set (ARM was used in older versions).


            Labels like .L61 seems to be literal pools, and i think using literal pools are the best way to access registers, anyway, you can not load 32bit immediate values in ARM asm, so you have to load it "by pieces", first the lower half and then the higher half, that's what the ldr instruction do under the hood (it´s a macro, not an instruction).


            Here are some useful links:








            i'm trying to learn inline asm too, remember to use asm volatile (); this way the compiler will respect your inline asm.


            Hope it helps

            • 3. Re: PSOC 4 inline assembly ARM or Thumb?



              I started with the ethernut cookbook before posting here. It seemed helpful, but is fairly contradictory versus the inline assembly comments in http://www.cypress.com/file/46521/download.


              According to the Cypress documentation, something like this:


                          "ldr r4, =CYREG_PRT2_PS\n"    //read address high
                          "ldr r3, [r4]\n"


              Should work, but, in fact, causes a no-information compiler failure. That's the piece I'm trying to figure out at the moment.

              • 4. Re: PSOC 4 inline assembly ARM or Thumb?

                Then again, I'm not too sure about the Cypress doc. I tried this example from it (page 15), won't even compile, let alone build:


                    int foo = 5L;
                    int bar;
                    bar = foo + 1;
                    /* bar = foo + 1 */   
                    asm("LDR r0, =foo\n"        
                        "LDR r1, =bar\n"        
                        "LDR r2, [r0]\n"        
                        "ADD r2, r2 #1\n"        
                        "STR r2, [r1]"); 


                Removing the extra(?) r2 in the ADD line allows it to pass the first pass of the compiler, but still fails with the same general error as I was getting from my code.

                • 5. Re: PSOC 4 inline assembly ARM or Thumb?

                  This almost works:


                              "ldr r3, [%[datawrite]]\n"  //Get address of data write function
                              "mov    r2, #125\n"
                              "str r2, [r3]\n"  //write data bus from r2
                              :   [addhigh] "l" (Pin_Address_High_PS),
                                  [addlow] "l" (Pin_Address_Low_PS),
                                  [datawrite] "l" (Pin_Data_DR),
                                  [dataread] "l" (Pin_Data_PS),
                                  [pinrw] "l" (CYREG_PRT0_PS),
                                  [buffer] "l" (buffer)


                  It compiles, and builds, and programs, but it doesn't actually set the 125 value on the data bus. However, the resultant lst output does look pretty similar, so I think I'm close to getting this right.


                  On the other hand, one difference I observed between what's in the lst files and what will compile is f after the label of a forward jump. The generated assembly in the lst files had this, but it caused a compile failure when I used it in my code.

                  • 6. Re: PSOC 4 inline assembly ARM or Thumb?

                    OK. This is not making much sense. I tried really simplifying the C code, to compare the lst output to the lst output from my really simplified inline assembly. To my eye, the resultant lst content looks logically equivalent, but the C works, and the inline assembly does not. By "works", I mean that the C writes 125 to the register associated with Pins_Data, and the assembly leaves that as all high values (255).


                    The two results (source and lst) are attached (because of the Cypress spam filter that seems to fire whenever too much example code is embedded in a post).

                    • 7. Re: PSOC 4 inline assembly ARM or Thumb?

                      I got the really simple case working: writing a byte to a pin set:


                                  "mov r3, %[datawrite]\n"  //Get address of data write function
                                  "mov    r2, #125\n"
                                  "str r2, [r3]\n"  //write data bus from r2
                                  : [datawrite] "l" (&Pin_Data_DR)


                      The optimizer still messes it up a bit, but it works. Now just to figure out all the other issues....

                      • 8. Re: PSOC 4 inline assembly ARM or Thumb?

                        Well, Paul, I can assure you that it is a challenge to be better than the GCC optimizer! Did you try to set (you can do that on a .c file basis)  the optimization level to "speed" or "size"? There are even other settings to try mentioned in the GNU compiler manual.





                        • 9. Re: PSOC 4 inline assembly ARM or Thumb?

                          Yes. It's built using speed optimization. I also had to add some noinline optimizer hints to keep the optimizer from really messing up parts of it.


                          The inline assembly is to deal with areas where an if/else would be more efficient than the current code, but the optimize won't accept it. The optimizer is also doing things like checking an if condition both at the top and bottom of a block of code,

                          • 10. Re: PSOC 4 inline assembly ARM or Thumb?

                            OK. So, the current piece I'm fighting with is:


                            1. Variables passed through using symbolic names (or position identifiers) in the input section get mapped automatically with ldr statements into registers the compiler picks


                            2. I don't see a way to control these ldr commands or predict which registers will be used for which variables


                            3. My code picks up with the mov commands to put the variable addresses into particular registers


                            4. The compiler doesn't care which registers I've selected.


                            The current state of things is that I want to use r3 and r4 for the variable addresses, but the compiler uses r2 and r3 for its ldr commands, and then executes my mov commands in such a way that r2 overwrites r3 before I ever get a chance to use it.


                            So, the main question I have is, is there a way to know or to symbolically use the registers that the compiler is going to select for its ldrs? 


                            Edit: If I look at the lst code, and then swap my selected registers to match what the compiler picked for its ldr targets, things work (although there's a pointless mov r3, r3); but this seems like the wrong way to do things, and likely to break easily when any code is changed.


                            Edit 2: Nevermind. I figured it out. I was using the mov commands because the ARM Thumb2 documentation was pretty clear about needing to first load a variable address into a register before you could do anything with it, but the compiler is actually doing that for me, so the mov commands are not needed. The correct simple example is this:


                                        "mov    r2, #125\n"
                                        "str r2, [%[datawrite]]\n"  //write data bus from r2
                                        : [datawrite] "l" (&Pin_Data_DR)


                            The compiler will pick a register for &Pin_Data_DR, add an ldr into it, and then will substitute that register into the str command.

                            • 11. Re: PSOC 4 inline assembly ARM or Thumb?

                              However, I'm still have problems with the optimizer (even with volatile), doing annoying things, like using r3 for a variable address when I'm using r3 in my loop. The address gets overwritten on the first iteration of the loop.


                              Edit: I've tried working around this by using the registers the compiler is not taking, but this isn't working out. The compiler is only leaving me r6 and r7 to work with. I need more than two registers. I would have thought push and pop would help, but, when I add a push call, the code beyond that call stops behaving correctly, and there's no indication why in the lst code.


                              In an overlapping problem, I can't figure out the correct notation for addressing an array in the inline assembly. The code to read a byte from the array is an ldrb with an offset, e.g.


                              ldr r6, [%[buffer], r7] 


                              Does buffer link to &buffer, or buffer, or buffer[0], or &buffer[0], or ....?


                              Edit 2: Though push and pop don't seem to work, moving low registers into high registers, e.g. mov r8, r4 , does work.


                              For the buffer address. The compiled C code is doing something with the stack pointer.... in the assembly block for it's code, and well outside of it for my code. It's difficult to see where it's pointing... but, at least, now that I have a solution for the register shortage, I can try a bunch of things to see if one of them works. 

                              • 12. Re: PSOC 4 inline assembly ARM or Thumb?

                                OK. Pretty well stuck on this part. I've searched, but have only really found people facing the same problem with unhelpful answers, or no answers.


                                I can pass a regular variable value into a register using something like MOV r6, %[buffer] where [buffer] is mapped to a a regular variable (test) or a particular value in the array (buffer[0]). I can't find the correct way, though, to pass &buffer into the assembly so I can index it with ldrb.

                                • 13. Re: PSOC 4 inline assembly ARM or Thumb?

                                  ARM is a RISC design. You need to define a location containing the base address, load it into a register and then issue your ldrb instruction.





                                  • 14. Re: PSOC 4 inline assembly ARM or Thumb?

                                    Do you have an example of that?


                                    For instance, this works:


                                    uint8 test = 125;


                                    asm ("mov r6, %[test]\n" : : [test] "l" (test));


                                    This does not:


                                    uint8 test = 125;


                                    asm ("ldrb r6, [%[test]]\n" : : [test] "l" (&test));


                                    The second example seems like it should work, but it gets 0 into r6 instead of the value of test. I also tried adding uint8 *testptr = &test and passing testptr into [test]. No difference.




                                    And... the answer is.... Clobber list!!!!


                                    Even though I couldn't see any reordering or clobbering happening in the lst files, the final code was apparently doing other things than what the lst shows. I had to add everything to the clobber list, and then I started getting correct results.


                                    In short, things like like this:




                                                "ldrb   r7, [%[buffer], r6]\n"  //read byte from buffer at offset r6
                                                "str r7, [%[datawrite]]\n"  //write data bus from r7


                                                :   [addhigh] "l" (&Pin_Address_High_PS),
                                                    [addlow] "l" (&Pin_Address_Low_PS),
                                                    [datawrite] "l" (&Pin_Data_DR),
                                                    [dataread] "l" (&Pin_Data_PS),
                                                    [pinrw] "l" ((reg32*)CYREG_PRT0_PS),
                                                    [buffer] "l" (buffer)
                                                : "r6", "r7", "r8", "cc", "memory"


                                    With the key difference being my calling out that I may modify r6, r7, r8, flags, and memory in my code.

                                    1 2 Previous Next