Tuesday, October 20, 2009

Optimizing Mathematical Calculations

















































Optimizing Mathematical Calculations



Assembly language presents an efficient way for
optimizing mathematical operations, and also the operations for data
array processing and hardware interaction. Most of the applications
make intensive use of mathematical calculations, and assembly language
is often a good way to optimize the performance of these calculations.
For creating highly efficient applications, the crucial factor is
expertise in the Floating-Point Unit (FPU) hardware and program
architecture, as well as in the SIMD technology.


In this chapter, we will focus on the principles of the
FPU operation and the options for application optimization. Practical
examples of how to make use of the FPU features will be considered in Chapter 2.




Using the Floating-Point Unit (FPU)


The earliest models of Intel processors did not
have hardware support for the floating-point operations. All operations
of this kind were implemented as procedures made up of the ordinary
mathematical commands. For early models, a special additional chip was
developed, which got the name of mathematical coprocessor.
It included the commands enabling the computer to perform the
floating-point operations much faster than was done by the procedures
containing ordinary mathematical commands.


Starting from the 486DX processors, the mathematical
coprocessor no longer exists as a separate device. Instead, the
processors contain the FPU, but it is programmed as a separate module.
The FPU program model can be described as a combination of the
following registers:





  • FPU stack registers. There are 8 of them, and their names are ST(0), ST(1), ST(2) ST(7). The floating-point numbers are stored as 80-bit numbers of the extended format. The ST(0) register always points to the top of the stack. As the numbers are received by the FPU, they are added on top of the stack.





  • Control/status registers. These include the
    status register reflecting the information on the processor status, the
    control register (for controlling the FPU operation modes), and the tag
    status register that reflects the status of the ST(0)ST(7) registers.





  • Data point register and instruction point register. These are intended for processing the exceptions.





Any of the registers listed above can be accessed by
the program either directly or indirectly. In FPU programming, the most
frequently used elements are the ST(0)ST(7) registers and the C0, C1, C2, and C3 bits of the status register.


The FPU registers operate as an ordinary stack of the
CPU. But this stack has a limited number of positions—only 8 of them.
The FPU has one more register, which is difficult for the programmer to
access. This is a word containing the “labels” of each of the stack
positions. This register enables the FPU to trace, which of the stack
positions are currently in use and which are not engaged. Any attempt
to place an object into a stack position that is already engaged
creates an exception.


To place the data into the FPU stack, the program uses
the load command that places the data on top of the stack. If a number
stored in memory has a format other than the temporary float format,
then (during the loading) the FPU converts this number to the 80-bit
form.


The write commands extract the values from the FPU
stack and place them into memory. If data format conversion is needed,
it is performed as part of the write operation. Some forms of the write
operation leave the top of the stack intact for further operations.


After being placed into the FPU stack, the data can be
accessed and used by any command. The processor instructions allow both
the operations between the registers and the operations between the
memory and the registers. In the same way as in the CPU, between any
two operands, one should be stored in a register. For the FPU, one of
the operands should always be a top element of the stack, and another
operand may be taken either from the memory or from the stack of
registers.


Any arithmetic operation should always have the
stack of registers as the destination. The FPU, being a processor unit
for numeric operations, cannot write the result into memory by using
the same command that performed the calculations. To send the operand
back to the memory, it is necessary to use either a separate write
command or a command that extracts data from the stack and then writes
it into memory.





FPU Commands


All FPU commands start with the F letter to be distinguished from the CPU commands. The FPU commands can be conventionally arranged into several groups:




  • Data transfer commands




  • Addition and subtraction commands




  • Multiplication and division commands




  • Comparison commands




  • Transcendental functions commands




  • Control flow commands





The FPU provides the developer with hardware-level
support for the algorithms that calculate trigonometric functions,
logarithms, and powers. Such calculations are entirely transparent for
the software developer and do not require writing any additional
algorithms.


The FPU makes it possible to perform mathematical
calculations with very high precision level (up to 18 digits). If you
perform such calculations without using the FPU functions, the result
will be less precise.


The use of assembly language for FPU programming can
give you considerable gain in application performance. This is because
the system of FPU instructions contains different groups of commands,
providing the developer with virtually all the tools for implementing
most calculation algorithms. Even if some of the needed commands are
missing, you can easily find an equivalent operation made up of several
assembly instructions. It should be noted that by programming the FPU
with assembly commands, you could implement even the operations that
are difficult or even impossible to write in C++.


With regard to mathematical functions in the C++
standard libraries, we need to note that their assembly analogs often
let you obtain an even higher performance, as well as a smaller program
size. Assembly language also lets developers create custom functions,
which often appear more efficient than their analogs from the
mathematical library in Visual C++ .NET.








































No comments: