Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob

How to up the CODE efficient.

How to up the CODE efficient.

Anonymous
Not applicable
Answer:

 

  1. Using the the 20bit address mode
  2. Option specification at using division instruction (div step)
  3. Adjust number of local variable in order to not exceed 512 bytes for number of stack use of function
  4. Avoid to use a lot of signed 1 byte/2 byte data
  5. Control of loop-unrolling optimization
  6. Review of necessity for inline expansion
  7. Control of standard library expansion
  8. Others

Using the 20bit address mode

Generally FR processes with following 3 steps at operation.

  1. Resister set of memory address
  2. Load data to resister
  3. Operation

Especially when using a lot of external variable, there is case of large code size, because a lot of instruction, which load 32-bit address, is used

                                                                                                                                                                                                 
[C source ][In case of FR]
a=b+c;LDI:32,#_b R12
 LD@R12, R0
 LDI:32#_c, R12
 LD@R12, R1
 ADDR1, R0
 LDI:32#_a, R12
 STR0, @R12

 

Therefore when the code or data is possible t to locate to RAM/ROM in locating to 20-bit address space (0x0 to 0xFFFFF), set of 20-bit address mode (-K shortaddress option) is recommended. If the location is impossible, the use of external variable should be changed to local variable if possible.

                                                                                                                                                                                                                                                                                                                          
[C source][default][-Kshortaddress specifying]
a=b+c;LDI:32#_b, R12LDI:20#_b, R12
 LD@R12, R0LD@R12, R0
 LDI:32#_c, R12LDI:20#_c, R12
 LD@R12, R1LD@R12, R1
 ADDR1, R0ADDR1, R0
 LDI:32#_a, R12LDI:20#_a, R12
 STR0, @R12STR0, @R12
 --------------------------------
 26 byte20byte

Option specification at using division instruction (div step)

FR has div step instruction for division. But when this instruction is used, more code size than 72 bytes by division is made, because of 1 division with 36 instructions.
Compiler makes the code in order to call the library at executing at default for division process. Therefore if there are some division instruction, reduced code size is outputted at default set.
However if the optimization of speed priority (-Kspeed) is specified, it is directly expanded div step instruction. When to increase of code size for division process at specifying the optimization of speed priority has a problem, to not specify the optimization of speed priority is recommended.

                                                                                                                                                                                                                                                                                                                                                                                                                                                        
[C source][In case of speed priority][default]
a=b/c;LDI:20#_b, R12LDI:20#_b, R12
 LD@R12, R0LD@R12, R4
 LDI:20#_c, R12LDI:20#_c, R12
 LD@R12, R1LD@R12, R5
 MOVR0, MDLCALL20_divi, R12
 DIV0SR1LDI:20#_a, R12
 DIV1R1STR4, @R12
 DIV11  
 DIV1R1  
 DIV1R1  
 IV1R1  
 ----------------------------------------------
 74 byte20 byte*

*:divi function of 78 bytes is made separately.
(When divi function is used as library, to reduce code size at executing some division instructions is possible to expect.)


Adjust number of local variable in order to not exceed 512 bytes for number of stack use of function

LD/ST instruction is possible to use FP relative address. However the offset, which is possible to specify, is in maximum -512 to +508 (at 4 bytes type), because of restriction of 16-bit instruction length. Therefore when local variable area, which is exceeded 512 bytes, is used, the operation in order to calculate stack address is increased, and code size is larger and access efficiency is decreased.
So in order to not exceed 512 bytes for number of stack use of function, code size is reduced and access efficiency is improved by adjusting number of local variable.

Number of stack use for each function is possible to confirm with SOFTUNE C/C++ Analyzer.

(Note) When local variable is the type of 2 bytes or 1 byte, the offset, which is possible to specify, is -256 to 254 or -128 to 127 for each type. Therefore the size, which is possible to generate of effective code, is different.

 

                                                                                                                                                                                                       
[C source][In case of -520 for offset][In case of -4 for offset]
 (at using larger size than above mention)(at using the size within above mention)
a=10;LDI#10, R0LDI#10, R0
 LDI#-520, R13STR0, @(FP,-4)
 STR0, @(R13,FP)  
 --------------------------------------------------
 8 byte4 byte

Avoid to use a lot of signed 1 byte/2 byte data

FR architecture does not have load instruction of signed data. Therefore when loading signed 1 byte/2 bytes data, sign expansion is needed after loading. When using a lot of signed 1 byte/2 bytes data, code size is increased at comparing as unsigned data.
So code size is reduced and access efficiency is improved by using unsigned type as possible.

(Note) For Softune Compiler char type is use as unsigned char type. Therefore char type is possible to use as it is.

                                                                                                                                                                                                                                                                                                                                                                                      
[C source][In case of signed char type][In case of char type]
a=b+c;LDI:20#_b, R12LDI:20#_b, R12
 LDUB@R12, R0LDUB@R12, R0
 EXTSBR0LDI:20#_c, R12
 LDI:20#_c, R12LDUB@R12, R1
 LDUB@R12, R1ADDR1, R0
 EXTSBR1LDI:20#_a, R12
 ADDR1, R0STBR0, @R12
 LDI:20#_a, R12  
 STBR0, @R12  
 --------------------------------------------
 24 byte20 byte

Control of loop-unrolling optimization

Loop-unrolling optimization is improved of execution speed by reducing number of loop. But object size is increased.
How to describe the code in case of speed priority and code size priority should be reviewed as an aim.

                                                                                                                                                             
[Before unrolling]
 for(i=0;i<6;i++){ a=0;}
[After unrolling]
 for(i=0;i<6;i+3){
  a=0;
  a[i+1]=0;
  a[i+2]=0;
 }

And when unrolling control is not specified even above [Before unrolling] description, code size is larger. Therefore corresponded compiler to code size is possible with specifying size priority optimization (-Ksize) or loop-unrolling control (-Knounroll).

                                  
[C source]
for(i=0;i<6;i++){a=0;}
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[Loop unrolling optimization][unrolling determent]
 LDI:20#_a, R6 LDI#0, R4
 LDI#0, R4L_26:LDI#0, R0
 LDI#2, R5 LDI:20#_a, R13
L_32:LDI#0, R0 STBR0, @(R13,R4
 MOVR4, R13 ADD#1, R4
 STBR0, @(R13,R6) CMP#6, R4
 MOVR6, R0 BLT20L_26, R12
 ADDR4, R0 LDI#0, R4
 LDI#0, R1   
 LDI#1, R13   
 STBR1, @(R13,R0)   
 MOVR6, R0   
 ADDR4, R0   
 LDI#0, R1   
 LDI#2, R13   
 STBR1, @(R13,R0)   
 ADD#3, R4   
 ADD#-1, R5   
 CMP#1, R5   
 BGE20L_32, R12   
 --------------------- ---------------------
 42 byte 18 byte

Review of necessity for inline expansion

Inline expansion optimization is expanded the process of function for call ahead instead of function call to defined function in C source. When the process of expanded function is very small, code size after inline expansion may be small. But generally object size is increased.

In case of object size priority, this optimization is not recommended.
(Not use -xauto option, -x option, #pragma inline, inline type qualifier (only C++))

                                                                                                                                                                                
[C source]
unsigned short ADD_sat16(unsigned short a, unsigned short b){
 int tmp;
 if((tmp=a+b)>0xffff) return 0xffff;
 return (unsigned short)tmp;
}
unsigned short a,b,c,d,e,f;
func(){
 a=ADD_sat16(b,c);
 d=ADD_sat16(e,f);
}
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
[In-line expansion optimization][In-line optimization control]
_func:LDI:20#_b, R12_func:STRP, @-SP
 LDUH@R12, R4 LDI:20#_b, R12
 LDI:20#_c, R12 LDUH@R12, R4
 LDUH@R12, R5 LDI:20#_c, R12
 ADDR5, R4 LDUH@R12, R5
 LDI#65535, R0 CALL20_ADD_sat16, R12
 CMPR0, R4 LDI:20#_a, R12
 BLE20L_32, R12 STHR4, @R12
 LDI#65535, R4 LDI:20#_e, R12
 BRA20L_28, R12 LDUH@R12, R4
L_32:EXTUHR4 LDI:20#_f, R12
L_28:LDI:20#_a, R12 LDUH@R12, R5
 STHR4, @R12 CALL20_ADD_sat16, R12
 LDI:20#_e, R12 LDI:20#_d, R12
 LDUH@R12, R4 STHR4, @R12
 LDI:20#_f, R12 LD@SP+, RP
 LDUH@R12, R5RET
 ADDR5, R4   
 LDI#65535, R0   
 CMPR0, R4   
 BLE20L_36, R12   
 LDI#65535, R4   
 BRA20L_34, R12   
L_36:EXTUHR4   
L_34:LDI:20#_d, R12   
 STHR4, @R12   
 RET    
 ----------------------- -----------------------
 74 byte 46 byte

However with argument and by using static function for the function of small code size or by specifying #plagma inline, code size is possible to reduce. To use "inline candidate selection function" in Softune C Analyzer is recommended.


Control of standard library expansion

Standard library expansion replaces to standard function of higher speed, which inline expansion of standard function and same operating is performed, with recognizing the operating of standard function. In case of object size priority, not use this optimization. Use standard library inline expansion control (-Knolib).


Others

Locate structure member, which number of reference is large, to head.

Access of structure member is fixed actual location address by calculating head address + offset.
Head member is not needed the calculation because of offset=0. When there is high member for static access frequency, review whether it is possible to locate to head.

Void of function, which is returned structure.

The function, which is returned structure, is occurred structure transfer into work area. address of structure to substitution destination is handled with argument, and it is possible to make void function by directly substituting.

Within 4 argument

Within 4 argument, it is not needed the code for access because of handling with resister. Therefore execution speed is improved. When there is argument, which is handled uselessly, review to reduce it.


0 Likes
219 Views
Contributors