Thursday, October 22, 2009

Chapter 11: Optimizing Multimedia Applications with Assembly Language
















































Chapter 11: Optimizing Multimedia Applications with Assembly Language



 Download CD Content

Multimedia applications are the most
performance-dependent. They work in real time and have strict
requirements regarding hardware and software. This chapter will focus
on a few methods for optimizing multimedia applications using the
assembler. First, we will review a few notes concerning performance
improvement, regardless of which language is used for programming.


Multimedia applications’ code must be as simple as
possible. The same is true for the data structures used in these
applications. Avoid data type conversion if possible. Converting
integers to floating-point numbers and vice versa decreases performance
because additional commands are required.


Use 32-bit data for operations on variables. 8-bit or
16-bit data require less memory, but operations on 32-bit data are the
best for Pentium processors.


In your multimedia applications, avoid floating-point operations because integer operations are faster.


Pass parameters to your functions by reference, rather
than by value. Also, align your data on a double-word boundary. Prefer
global variables to local.


The performance of multimedia applications can be
improved by optimizing the vector conversion algorithms and by using
multithreading.



The concept of multithreading is a very important
aspect of writing multimedia applications. No serious multimedia
applications can do without threads. Threads are usually used to
implement the following tasks:




  • Creating controls and menus




  • Creating sound effects




  • Updating data structures




  • Updating animation frames




The use of threads is not confined to these tasks;
there are other ways of using them. Multithread applications are
discussed in more detail in Chapter 12.
For now, we will consider an example of a program with two threads
(main and auxiliary), in which a 3D vector is scaled. The vector’s
coordinates are stored in the a1 array, and
the scale factor is 4. Additionally, the vector’s length is computed.
First, we will consider a variant of the application with common C++
.NET statements (Listing 11.1).




Listing 11.1: Using multithreading for scaling a vector and computing its length (in C++)






// MHTHREAD_GRAPHICS.cpp : Defines the entry point for the console. 
// application

#include "stdafx.h"
#include <windows.h>
#include <math.h>

int i1;

// The vector's coordinates (x, y, z)

int a1[4] = {4, 7, 3};

void myFunc(LPVOID k1)
{
for(i1 = 0; i1 < 4; i
al [il] = (int) kl*al [il];
}

int _tmain(int argc, _TCHAR* argv[])
{
HANDLE mythread;
DWORD mythread_id;
double vec_len;

printf("CHANGING THE LENGTH OF VECTOR a = (a0, a1, a2) (DirectX
Optimizing Tips) \n");
printf("\nBefore scaling vector a=(%d, %d, %d)\n", a1[0], a1[1],
a1[2]);
vec_len = sqrt((double) (a1 [0] *al [0]+a1 [1] *al [1]+a1 [2] *al [2]));
printf("\Length of a1 = %.2f\n", vec_len) ;

printf("\n\n Starting thread \n\n");

mythread = CreateThread(NULL, 0, (PTHREAD_START_ROUTINE)myFunc,
(LPVOID) (4) , 0, &mythread_id);
while(true)
{
if (WaitForSingleObject (mythread, 0) == WAIT_OB JECT_0)
{
vec_len = sqrt((double) (a1 [0] *al [0]+a1 [1] *al [1]+a1 [2] *al [2]));
break;
}

// Any useful operations

}
CloseHandle(mythread);

printf("After scaling vector a1= (%d, %d, %d) ", a1[0], a1[1], a1[2])
printf("\nLength of a1=%.2f\n", vec_len);
printf("\n Thread terminated \n");
getchar();
return 0;
}















In this program, the main process uses the auxiliary thread mythread. This thread computes the new coordinates of the 3D vector. The main thread waits for mythread to complete computation and then computes the length of the vector with the statement




vec_len = sqrt ( (double) (a1 [0] *al [0] + a1 [1] *al [1] + a1 [2] *al [2])) 




The window of the program is shown in Fig. 11.1.






Fig. 11.1: Window of a program demonstrating operations on vectors

The program can be improved by optimizing a few
fragments of the code that are related to mathematical calculations.
First, the computation in the thread function can be simplified if the
assembler is used. The function



void myFunc(LPVOID k1) 
{
for (i1 = 0;i1 < 4; il++)
al [il] = (int)kl*al[il];
}


can be rewritten in the assembler as follows:



void myFunc(LPVOID*) 
{
_asm {
lea ESI, a1
lea EDI, c1
mov ECX, 3
sub ESI, 4
next:
add ESI, 4
fild DWORD PTR [ESI]
fimul DWORD PTR c1
fistp DWORD PTR [ESI]
dec ECX
jnz next
}
}



For optimization, it is best to use a mathematical
coprocessor or one of the extensions (MMX or SSE). The source code of
the modified program is shown in Listing 11.2.




Listing 11.2: A modified variant of the vector operations






// OPTIMIZING_VECTOR_OPERATIONS.cpp : Defines the entry point for the 
// console application.

#include "stdafx.h"
#include <windows . h>
#include <math.h>

int i1;
int a1[4] = {4, 1, –3}; // Size of vector = sqrt ( (a1–a0) * (a1–a0) +.)
const int c1 = 4;

void myFunc(LPVOID k1)
{
_asm {
lea ESI, a1
lea EDI, c1
mov ECX, 3
sub ESI, 4
next:
add ESI, 4
fild DWORD PTR [ESI]
fimul DWORD PTR c1
fistp DWORD PTR [ESI]
dec ECX
jnz next
}
}

int _tmain(int argc, _TCHAR* argv[])
{
HANDLE mythread ;
DWORD mythread_id;
double vec_len;

printf ("MOD. VARIANT VECTOR OPERATIONS with a = (a0, a1, a2) (DirectX
Tips)\n") ;
printf ("\nBefore scaling vector a = (%d, %d, %d)\n", a1[0], a1[1],
a1[2];
vec_len = sqrt ( (double) (a1 [0] *a1 [0]+a1 [1] *a1 [1]+a1 [2] *a1 [2]));
printf("\Length of a1 = %.2f\n", vec_len);

printf("\n\n Starting thread\n\n");

mythread = CreateThread(NULL,0, (PTHREAD_START_ROUTINE)myFunc,
(LPVOID)(4), 0, &mythread_id);
while(true)
{
if (WaitForSingleObject(mythread, 0) == WAIT_OBJECT_0)
{
vec_len = sqrt ( (double) (a1 [0] *a1 [0]+a1 [1] *a1 [1]+a1 [2]*a1 [2])) ;
break;
}
// any useful operations
}
CloseHandle(mythread);
printf("After scaling vector a1 = (%d, %d, %d)", a1[0]f a1[1]f a1[2])
printf("\nLength of a1 = %.2f\n", vec_len);
printf("\n Thread terminated\n");
getchar();
return 0;
}















Further improvement of the program code can be done
using the MMX extension assembly commands. They can be used to optimize
the vector scaling code. The source code of the program is shown in Listing 11.3.




Listing 11.3: Using the MMX extension for vector operations






// OPTIMIZING_VECTOR_OPERATIONS.cpp : Defines the entry point for the 
// console application.

#include "stdafx.h"
#include <windows.h>
#include <math.h>

int i1;

int a1[4] = {4, 7, 3, 0};
int c1[4] = {4,4,4,4};

void myFunc(LPVOID*)
{
_asm {
mov ECX, 3
lea ESI, a1
sub ESI, 4
next:
add ESI, 4
pxor mm0, mm0
movd mm0, DWORD PTR [ESI]
packssdw mm0, mm0
pxor mm1, mm1
movd mm1, DWORD PTR c1
packssdw mm1, mm1

pmaddwd mm0 , mm1
movd DWORD PTR [ESI], mm0
dec ECX
jnz next
emms
};

}

int _tmain(int argc, _TCHAR* argv[])
{
HANDLE mythread ;
DWORD mythread_id;
double vec_len;

printf("MOD. VARIANT VECTOR OPERATIONS with a = (a0, a1, a2) (DirectX
Tips)\n") ;
printf("\nBefore scaling vector a = (%d, %d, %d)\n", a1[0], a1[1],
a1[2]);
vec_len = sqrt ( (double) (a1[0] *a1[0] +a1[1] *a1[1] +a1[2] *a1[2])) ;
printf("\Length of a1 = %.2f\n", vec_len) ;

printf("\n\n Starting thread. .. \n\n”);
mythread=CreateThread(NULL, 0, (PTHREAD_START_ROUTINE)myFunc,
(LPVOID)(4), 0, &mythread_id);
while (true)
{
if (WaitForSingleObject (mythread, 0) == WAIT_OBJECT_0)
{
vec_len = sqrt ( (double) (a1[0] *a1[0] + a1[1] *a1[1] + a1[2] *a1[2]));
break;
}

// Any useful operations

}
CloseHandle (mythread) ;

printf("After scaling vector a1 = (%d, %d, %d) ", a1[0], a1[1], a1[2])
printf("\nLength of a1 = %.2f\n", vec_len) ;
printf("\n Thread terminated \n") ;
getchar();
return 0;
}















In the examples above, the statement



if (WaitForSingleObject (mythread, 0) == WAIT_OBJECT_0) 


is used, in which a WIN API function, WaitForSingleObject, plays an important role. This function waits for setting a signal by the mythread
thread. If the signal is not set, the function immediately passes
control to the next program statement. Such a design makes it possible
to run several threads without decreasing performance.


The window of the program is shown in Fig. 11.2.






Fig. 11.2: Window of the program demonstrating the use of the MMX extension for operations on vectors

When developing multimedia applications, special
DirectX function libraries are widely used. Combining the assembler
with DirectX functions makes it possible to develop high-performance
applications. Although the assembler interface to DirectX functions has
a few distinct features, it is quite similar to common function calls.






































No comments: