2024年2月2日发(作者:)

of debug information available in the compiled code, and the performance of the code. The following optimization levels

are available:

o - O0 applies minimum optimizations.

Most optimizations are switched off, and the code generated has the best debug view.

o - O1 applies restricted optimization.

For example, unused inline functions and unused static functions are removed. At this level of optimization, the

compiler also applies automatic optimizations such as removing redundant code and re -ordering instructions s o

as to avoid an interlock situation. The code generated is reasonably optimized, with a good debug view.

o - O2 applies high optimization (This is the default setting).

Optimizations applied at this level take advantage of ARM’s in-depth knowledge of the processor architecture,

to exploit processor -specific behavio r of the given target. It generates well optimized code, but with limited

debug view.

o - O3 applies th e most aggressive optimization.

The optimization is in accordance with the user’s – Ospace/- Otime choice . By default, multi - file compilation is

enabled, which leads to a longer compile time, but gives the highest levels of optimization.

· The Optimize for Time checkbox causes the compiler to optimize with a greater focus on achieving the best

performance when checked ( - O time) or the smallest code siz e when unchecked ( -O space).

Unchecking Optimize for Time selects the – Ospace option which instructs the compiler to perform optimizations to

reduce the image size at the expense of a poss ible increase i n execution time. F or example, using out -of -line function

calls instead of inline code for large structure copies. This is the default option. When running the compiler from the

command line, this option is invoked using ‘ -Ospace’

Checking Optimize for Time selects the – Otime option which instructs the compiler to optimize the code for the fastest

execution time, at the risk of an increase in the image size. It is recommended that you compile the time -critical parts of

your code with – Otime, and the rest us ing the – Ospace directive .

· Split Load and Store Multiples instructs the compiler to split LDM and STM instructions involving a large number of

registers into a series of loads/stores of fewer multiple registers. This means that an LDM of 16 registers can be split into

4 separate LDMs of 4 registers each. This option helps to reduce the interrupt latency on ARM systems which do not

have a cache or write buffer, and systems which use zero - wait state 32-bit memory.

For example, the ARM7 and ARM9 processor s t ake can only take an exception on an instruction boundary. If an

exception occurs at the start of an LDM of 16 registers in a cacheless ARM7 /ARM9 system, the system will finish

making 16 accesses to memory before taking the exception. Depending on the memory arbitration system, this can result

in a very high interrupt latency. Breaking the LDM into 4 individual LDMs for 4 registers means that the processor will

take the exception after loading a maximum of 4 registers, thereby greatly reducing the interrupt latency.

Selecting this option improves the overall performance of the system.

· The One ELF Section per Function option tells the compiler to put all functions into their own individual ELF

sections. This allows the linker to remove unused functions.

An ELF code section typically contains the code for a number of functions. The linker is normally only able to remove

unused ELF sections, not unused functions. An ELF section can only be removed if all its contents are unused.

Therefore, splitting each function into its own ELF section allows the compiler to easily identify which ones are unused,

and remove them.

Selecting this option increases the time required to compile your code, but results in improved performance .

The combination of options applied will depend on your optimization goal – whether you are optimizing for smallest code

size, or best performance.

The next section illustrates the best optimization options for each of these goals.

Optimizing for Smallest Code Size

To optimize your code for the smallest size, the best options to apply are:

· The MicroLIB C library

· Cross- module optimization

· Optimization level 2 ( -O2)

Compile the Measure example without any optimizations

The Measure example uses analog and digital inputs to simulate a data l ogger.

File -- Open Project

C: Keil ARMBoards Keil 2

Click the Options for Target button

In the Target tab:

· Uncheck Cross- Module Optimization

· Uncheck Use MicroLIB

· Uncheck Use Link- Time Code Generation

In the C/C++ tab:

· Set Optimization Level to Zero

Then click OK to save your changes.

Project – Build target

Without any compiler optimizations applied, the initial code size is 13,656 Bytes.

MDK Compiler Optimizations

Optimize the Measur e example for Size

Apply the compiler optimizations in turn, and re-compile each time to see their effect in reducing the code size for the

example.

· Options for Target – Target tab: Use the MicroLIB C library

· Options for Target – Target tab: Use cross - mod ule optimization - Remember to compile twice

· Options for Target – C/C++ tab: Enable Optimization level 2 ( -O2)

Optimization Applied Compile Size Size Reduction Improvement

MicroLIB C library 8,960 Bytes 4,696 Bytes 34% smaller

Cross- Module Compilation 13,500 Bytes 156 Bytes 1.1% smaller

Optimization level – O2 12,936 Bytes 720 Bytes 5.3% smaller

All 3 optimization options 8,116 Bytes 5,540 Bytes 40.6% smaller

Applying all the optimizations will reduce the code size down to 8,116 Bytes.

The fully optimized code is 5,540 Bytes smaller, a total code size reduction of 40.6%

MDK Compiler Optimizations

Optimizing for Best Performance

To optimize your code for performance, the best options to apply are:

· Cross- module optimization

· Optimization level 3 ( -O3)

· Optimize for time

Run the Dhrystone benchmark without any optimizations

The Dhrystone benchmark is used to measure and compare the performance of different computers, or the efficiency ofthe

code generated for the same computer by different compilers.

File – Open Project

C: Keil ARMExamples DHRY 2

Click the Options for Target button

Turn off optimization settings in the Target and C/C++ tabs , then click OK

Project – Build target

Enter D ebug mode

View – Se rial Windows – UART #1

Open the UART #1 window

View – Analysis Windows – Performance Analyzer

Open the Performance Analyzer

Debug – Run

Start running the application

When prompted:

Enter 50000 in the UART#1 window and press Enter

In the Performance Analyzer window, note that

· The drhy_1 loop took 2.829s

· The dhry_2 took 2.014s

In the UAR T #1 window, note that

· It took 138.0 ms for 1 run through Dhrystone

· The application is executing 7246.4 Dhrystones per second

Optimize the Dhrystone example for Performance

Re-compile the example with all three of the following optimizations applied:

· Options f or Target – Target tab: Cross - module optimization – Remember to compile twice

· Options for Target – C/C++ tab: Optimization level 3 ( -O3)

· Options for Target – C/C++ tab: Optimize for Time

Re-run the application, and examine the performance.

Measurement Without optimizations With Optimizations Improvement

dhry_1 2.829s 1.695s 40.1% faster

dhry_2 2.014s 1.011s 49.8% faster

Microseconds for 1 run

through Dhrystone

138.0 70 49.3% faster

Dhrystones per second 7246.4 14,285.7 97.1% more

The fu lly optimize d code achieves approximate ly 2x the performance of the un -optimized code.

Summary

The ARM Compilation Tools offer a range of options to apply when compiling your code. These options can be combined to

optimize your code for best performance, for smallest code size, or for any performance point between these two extremes,to

best suit your targeted microcontroller device and market.

When optimizing your code, MDK- ARM makes it easy and convenient to measure the effect of the different optimization

sett ings on your application. The code size is clearly displayed after compilation, and a range of analysis tools such as the

Performance Analyzer enable you to measure performance.

The optimization options in the ARM Compilation Tools, together with the easy- to - use analysis tools in MDK - ARM, help

you to easily optimize your application to meet your specific requirements.