How to up the CODE efficient.
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
- Using the the 20bit address mode
- Option specification at using division instruction (div step)
- Adjust number of local variable in order to not exceed 512 bytes for number of stack use of function
- Avoid to use a lot of signed 1 byte/2 byte data
- Control of loop-unrolling optimization
- Review of necessity for inline expansion
- Control of standard library expansion
- Others
Using the 20bit address mode
Generally FR processes with following 3 steps at operation.
- Resister set of memory address
- Load data to resister
- Operation
Especially when using a lot of external variable, there is case of large code size, because a lot of instruction, which load 32-bit address, is used
[C source ] | [In case of FR] | |
a=b+c; | LDI:32, | #_b R12 |
LD | @R12, R0 | |
LDI:32 | #_c, R12 | |
LD | @R12, R1 | |
ADD | R1, R0 | |
LDI:32 | #_a, R12 | |
ST | R0, @R12 |
Therefore when the code or data is possible t to locate to RAM/ROM in locating to 20-bit address space (0x0 to 0xFFFFF), set of 20-bit address mode (-K shortaddress option) is recommended. If the location is impossible, the use of external variable should be changed to local variable if possible.
[C source] | [default] | [-Kshortaddress specifying] | ||
a=b+c; | LDI:32 | #_b, R12 | LDI:20 | #_b, R12 |
LD | @R12, R0 | LD | @R12, R0 | |
LDI:32 | #_c, R12 | LDI:20 | #_c, R12 | |
LD | @R12, R1 | LD | @R12, R1 | |
ADD | R1, R0 | ADD | R1, R0 | |
LDI:32 | #_a, R12 | LDI:20 | #_a, R12 | |
ST | R0, @R12 | ST | R0, @R12 | |
---------------- | ---------------- | |||
26 byte | 20byte |
Option specification at using division instruction (div step)
FR has div step instruction for division. But when this instruction is used, more code size than 72 bytes by division is made, because of 1 division with 36 instructions.
Compiler makes the code in order to call the library at executing at default for division process. Therefore if there are some division instruction, reduced code size is outputted at default set.
However if the optimization of speed priority (-Kspeed) is specified, it is directly expanded div step instruction. When to increase of code size for division process at specifying the optimization of speed priority has a problem, to not specify the optimization of speed priority is recommended.
[C source] | [In case of speed priority] | [default] | ||
a=b/c; | LDI:20 | #_b, R12 | LDI:20 | #_b, R12 |
LD | @R12, R0 | LD | @R12, R4 | |
LDI:20 | #_c, R12 | LDI:20 | #_c, R12 | |
LD | @R12, R1 | LD | @R12, R5 | |
MOV | R0, MDL | CALL20 | _divi, R12 | |
DIV0S | R1 | LDI:20 | #_a, R12 | |
DIV1 | R1 | ST | R4, @R12 | |
DIV1 | 1 | |||
DIV1 | R1 | |||
DIV1 | R1 | |||
IV1 | R1 | |||
----------------------- | ----------------------- | |||
74 byte | 20 byte* |
*:divi function of 78 bytes is made separately.
(When divi function is used as library, to reduce code size at executing some division instructions is possible to expect.)
Adjust number of local variable in order to not exceed 512 bytes for number of stack use of function
LD/ST instruction is possible to use FP relative address. However the offset, which is possible to specify, is in maximum -512 to +508 (at 4 bytes type), because of restriction of 16-bit instruction length. Therefore when local variable area, which is exceeded 512 bytes, is used, the operation in order to calculate stack address is increased, and code size is larger and access efficiency is decreased.
So in order to not exceed 512 bytes for number of stack use of function, code size is reduced and access efficiency is improved by adjusting number of local variable.
Number of stack use for each function is possible to confirm with SOFTUNE C/C++ Analyzer.
(Note) When local variable is the type of 2 bytes or 1 byte, the offset, which is possible to specify, is -256 to 254 or -128 to 127 for each type. Therefore the size, which is possible to generate of effective code, is different.
[C source] | [In case of -520 for offset] | [In case of -4 for offset] | ||
(at using larger size than above mention) | (at using the size within above mention) | |||
a=10; | LDI | #10, R0 | LDI | #10, R0 |
LDI | #-520, R13 | ST | R0, @(FP,-4) | |
ST | R0, @(R13,FP) | |||
------------------------- | ------------------------- | |||
8 byte | 4 byte |
Avoid to use a lot of signed 1 byte/2 byte data
FR architecture does not have load instruction of signed data. Therefore when loading signed 1 byte/2 bytes data, sign expansion is needed after loading. When using a lot of signed 1 byte/2 bytes data, code size is increased at comparing as unsigned data.
So code size is reduced and access efficiency is improved by using unsigned type as possible.
(Note) For Softune Compiler char type is use as unsigned char type. Therefore char type is possible to use as it is.
[C source] | [In case of signed char type] | [In case of char type] | ||
a=b+c; | LDI:20 | #_b, R12 | LDI:20 | #_b, R12 |
LDUB | @R12, R0 | LDUB | @R12, R0 | |
EXTSB | R0 | LDI:20 | #_c, R12 | |
LDI:20 | #_c, R12 | LDUB | @R12, R1 | |
LDUB | @R12, R1 | ADD | R1, R0 | |
EXTSB | R1 | LDI:20 | #_a, R12 | |
ADD | R1, R0 | STB | R0, @R12 | |
LDI:20 | #_a, R12 | |||
STB | R0, @R12 | |||
---------------------- | ---------------------- | |||
24 byte | 20 byte |
Control of loop-unrolling optimization
Loop-unrolling optimization is improved of execution speed by reducing number of loop. But object size is increased.
How to describe the code in case of speed priority and code size priority should be reviewed as an aim.
[Before unrolling] | ||
for(i=0;i<6;i++){ a=0;} | ||
[After unrolling] | ||
for(i=0;i<6;i+3){ | ||
a=0; | ||
a[i+1]=0; | ||
a[i+2]=0; | ||
} |
And when unrolling control is not specified even above [Before unrolling] description, code size is larger. Therefore corresponded compiler to code size is possible with specifying size priority optimization (-Ksize) or loop-unrolling control (-Knounroll).
[C source] |
for(i=0;i<6;i++){a=0;} |
[Loop unrolling optimization] | [unrolling determent] | ||||
LDI:20 | #_a, R6 | LDI | #0, R4 | ||
LDI | #0, R4 | L_26: | LDI | #0, R0 | |
LDI | #2, R5 | LDI:20 | #_a, R13 | ||
L_32: | LDI | #0, R0 | STB | R0, @(R13,R4 | |
MOV | R4, R13 | ADD | #1, R4 | ||
STB | R0, @(R13,R6) | CMP | #6, R4 | ||
MOV | R6, R0 | BLT20 | L_26, R12 | ||
ADD | R4, R0 | LDI | #0, R4 | ||
LDI | #0, R1 | ||||
LDI | #1, R13 | ||||
STB | R1, @(R13,R0) | ||||
MOV | R6, R0 | ||||
ADD | R4, R0 | ||||
LDI | #0, R1 | ||||
LDI | #2, R13 | ||||
STB | R1, @(R13,R0) | ||||
ADD | #3, R4 | ||||
ADD | #-1, R5 | ||||
CMP | #1, R5 | ||||
BGE20 | L_32, R12 | ||||
--------------------- | --------------------- | ||||
42 byte | 18 byte |
Review of necessity for inline expansion
Inline expansion optimization is expanded the process of function for call ahead instead of function call to defined function in C source. When the process of expanded function is very small, code size after inline expansion may be small. But generally object size is increased.
In case of object size priority, this optimization is not recommended.
(Not use -xauto option, -x option, #pragma inline, inline type qualifier (only C++))
[C source] | |
unsigned short ADD_sat16(unsigned short a, unsigned short b){ | |
int tmp; | |
if((tmp=a+b)>0xffff) return 0xffff; | |
return (unsigned short)tmp; | |
} | |
unsigned short a,b,c,d,e,f; | |
func(){ | |
a=ADD_sat16(b,c); | |
d=ADD_sat16(e,f); | |
} |
[In-line expansion optimization] | [In-line optimization control] | ||||
_func: | LDI:20 | #_b, R12 | _func: | ST | RP, @-SP |
LDUH | @R12, R4 | LDI:20 | #_b, R12 | ||
LDI:20 | #_c, R12 | LDUH | @R12, R4 | ||
LDUH | @R12, R5 | LDI:20 | #_c, R12 | ||
ADD | R5, R4 | LDUH | @R12, R5 | ||
LDI | #65535, R0 | CALL20 | _ADD_sat16, R12 | ||
CMP | R0, R4 | LDI:20 | #_a, R12 | ||
BLE20 | L_32, R12 | STH | R4, @R12 | ||
LDI | #65535, R4 | LDI:20 | #_e, R12 | ||
BRA20 | L_28, R12 | LDUH | @R12, R4 | ||
L_32: | EXTUH | R4 | LDI:20 | #_f, R12 | |
L_28: | LDI:20 | #_a, R12 | LDUH | @R12, R5 | |
STH | R4, @R12 | CALL20 | _ADD_sat16, R12 | ||
LDI:20 | #_e, R12 | LDI:20 | #_d, R12 | ||
LDUH | @R12, R4 | STH | R4, @R12 | ||
LDI:20 | #_f, R12 | LD | @SP+, RP | ||
LDUH | @R12, | R5 | RET | ||
ADD | R5, R4 | ||||
LDI | #65535, R0 | ||||
CMP | R0, R4 | ||||
BLE20 | L_36, R12 | ||||
LDI | #65535, R4 | ||||
BRA20 | L_34, R12 | ||||
L_36: | EXTUH | R4 | |||
L_34: | LDI:20 | #_d, R12 | |||
STH | R4, @R12 | ||||
RET | |||||
----------------------- | ----------------------- | ||||
74 byte | 46 byte |
However with argument and by using static function for the function of small code size or by specifying #plagma inline, code size is possible to reduce. To use "inline candidate selection function" in Softune C Analyzer is recommended.
Control of standard library expansion
Standard library expansion replaces to standard function of higher speed, which inline expansion of standard function and same operating is performed, with recognizing the operating of standard function. In case of object size priority, not use this optimization. Use standard library inline expansion control (-Knolib).
Others
Locate structure member, which number of reference is large, to head.
Access of structure member is fixed actual location address by calculating head address + offset.
Head member is not needed the calculation because of offset=0. When there is high member for static access frequency, review whether it is possible to locate to head.
Void of function, which is returned structure.
The function, which is returned structure, is occurred structure transfer into work area. address of structure to substitution destination is handled with argument, and it is possible to make void function by directly substituting.
Within 4 argument
Within 4 argument, it is not needed the code for access because of handling with resister. Therefore execution speed is improved. When there is argument, which is handled uselessly, review to reduce it.
-
This widget could not be displayed.Anonymous