Programmer's Life: 1.1 An Introduction to Computer Architecture

< Free Open Study >

1.1 An Introduction to Computer Architecture

A full discussion of computer

architecture is far beyond the level of this text. Periodically,

we'll go into architectural matters, in order to

provide the conceptual underpinnings of the system under discussion.

However, if this sort of thing interests you, there are a great many

excellent texts on the topic. Perhaps the most commonly used are two

textbooks by John Hennessy and David Patterson: they are titled

Computer Organization and Design: The Hardware/Software

Interface and Computer Architecture: A

Quantitative Approach (both published by Morgan

Kaufmann).

In this section, we'll focus on the two most

important general concepts of architecture: the general means by

which we approach a problem (the levels of transformation), and the

essential model around which computers are designed.

1.1.1 Levels of Transformation

When we approach a problem, we

must reduce it to something that a computer can understand and work

with: this might be anything from a set of logic gates, solving the

fundamental problem of "How do we build a

general-purpose computing machine?" to a few million

bits worth of binary code. As we proceed through these logical steps,

we transform the problem into a

"simpler" one (at least from the

computer's point of view). These steps are the

levels of transformation.

1.1.1.1 Software: algorithms and languages

When

faced with a problem where we think a computer will be of assistance,

we first develop an algorithm for completing

the task in question. Algorithms are, very simply, a repetitive set

of instructions for performing a particular task -- for example, a

clerk inspecting and routing incoming mail follows an algorithm for

how to properly sort the mail.

This algorithm must then be translated by

a programmer into a program written in a

language.^[1] Generally, this is a high-level language, such as C or

Perl, although it might be a low-level language, such as assembler.

The language layer exists to make our lives easier: the structure and

grammar of high-level languages lets us easily write complex

programs. This high-level language program, which is usually portable

between different systems, is then transformed by a compiler into the

low-level instructions required by a specific system. These

instructions are specified by the Instruction Set Architecture.

^[1] It has been

conjectured that mathematicians are devices for transforming coffee

into theorems. If this is true, then perhaps programmers are devices

for transforming caffeine and algorithms into source code.

1.1.1.2 The Instruction Set Architecture

The

Instruction Set Architecture, or ISA, is the

fundamental language of the microprocessor: it defines the basic,

indivisible instructions that we can execute. The ISA serves as the

interface between software and hardware. Examples of instruction set

architectures include IA-32, which is used by Intel and AMD CPUs;

MIPS, which is implemented in the Silicon Graphics/MIPS R-series

microprocessors (e.g., the R12000); and the SPARC V9 instruction set

used by the Sun Microsystems UltraSPARC series.

1.1.1.3 Hardware: microarchitecture, circuits, and devices

At this level, we are firmly in

the grasp of electrical and computer engineering. We concern

ourselves with functional units of microarchitecture and the

efficiency of our design. Below the microarchitectural level, we

worry about how to implement the functional units through circuit

design: the problems of electrical interference become very real. A

full discussion of the hardware layer is far beyond us here; tuning

the implementations of microprocessors is not something we are

generally able to do.

1.1.2 The von Neumann Model

The von Neumann model has served as the

basic design model for all modern computing systems: it provides a

framework upon which we can hang the abstractions and flesh generated

by the levels of transformation.^[2] The model

consists of four core components:

^[2] A good book to read

more about the von Neumann model is William Aspray's

John von Neumann and the Origins of Modern

Computing (MIT Press).

A memory

system, which stores both instructions and data. This is

known as a stored program computer. This

memory is accessed by means of the memory address

register (MAR), where the system puts the address of a

location in memory, and a memory data register

(MDR), where the memory subsystem puts the data stored at the

requested location. I discuss memory in more detail in Chapter 4.
At least one processing

unit, often known as the arithmetic and logic

unit (ALU). The processing units are more commonly called

the central processing unit (CPU).^[3] It is

responsible for the execution of all instructions. The processor also

has a small amount of very fast storage space, called the

register file. I discuss processors in detail

in Chapter 3.

^[3] In modern implementations, the

"CPU" includes both the central

processing unit itself and the control unit.
A control unit,

which is responsible for controlling cross-component operations. It

maintains a program counter, which contains

the next instruction to be loaded, and an instruction

register, which contains the current instruction. The

peculiarities of control design are beyond the scope of this text.
The system needs a nonvolatile way to

store data, as well as ways to represent it to the user and to accept

input. This is the domain of the input/output

(I/O) subsystem. This book primarily concerns itself with disk drives

as a mechanism for I/O; I discuss them in Chapter 5. I also discuss network I/O in Chapter 7.

Despite all the advances in computing over the last sixty years, they

all fit into this framework. That is a very powerful statement:

despite the fact that computers are orders of magnitude faster now,

and being used in ways that weren't even imaginable

at the end of the Second World War, the basic ideas, as formulated by

von Neumann and his colleagues, are still applicable today.

1.1.3 Caches and the Memory Hierarchy

As you'll see in

Section 1.2 later in this chapter,

one of the principles of performance tuning is that there are

always trade-offs. This problem was recognized

by the pioneers in the field, and we still do not have a perfect

solution today. In the case of data storage, we are often presented

with the choice between cost, speed, and size. (Physical parameters,

such as heat dissipation, also play a role, but for this discussion,

they're usually subsumed into the other variables.)

It is possible to build extremely large, extremely fast memory

systems -- for example, the Cray 1S supercomputer used very fast

static RAM exclusively for memory.^[4] This is not

something that can be adapted across the spectrum of computing

devices.

^[4] Heat issues with

memory was the primary reason that the system was liquid cooled. The

memory subsystem also comprised about three-quarters of the cost of

the machine in a typical installation.

The problem we are trying to solve is that storage size tends to be

inversely proportional to performance, particularly relative to the

next highest level of price/performance. A modern microprocessor

might have a cycle time measured in fractions of a nanosecond, while

making the trip to main memory can easily be fifty times slower.

To try and work around this problem, we employ something known as the

memory hierarchy. It is based on creating a

tree of storage areas (Figure 1-1). At the top of

the pyramid, we have very small areas of storage that are exceedingly

fast. As we progress down the pyramid, things become increasingly

slow, but correspondingly larger. At the foundation of the pyramid,

we might have storage in a tape library: many terabytes, but it might

take minutes to access the information we are looking for.

Figure 1-1. The memory hierarchy

From the point of view of the microprocessor, main memory is very

slow. Anything that makes us go to main memory is bad -- unless

we're going to main memory to prevent going to an

even slower storage medium (such as disk).

The function of the

pyramid is to cache the most frequently used data and instructions in

the higher levels. For example, if we keep accessing the same file on

tape, we might want to store a temporary copy on the next fastest

level of storage (disk). We can similarly store a file we keep

accessing from disk in main memory, taking advantage of main

memory's substantial performance benefit over disk.

1.1.4 The Benefits of a 64-Bit Architecture

Companies that produced computer hardware and software often make a

point of mentioning the size of their systems'

address space (typically 32 or 64 bits). In the last five years, the

shift from 32-bit to 64-bit microprocessors and operating systems has

caused a great deal of hype to be generated by various marketing

departments. The truth is that although in certain cases 64-bit

architectures run significantly faster than 32-bit architectures, in

general, performance is equivalent.

1.1.4.1 What does it mean to be 64-bit?

The number of

"bits" refers to the width of a

data path. However, what this actually means is subject to its

context. For example, we might refer to a 16-bit data path (for

example, UltraSCSI). This means that the interconnect can transfer 16

bits of information at a time. With all other things held constant,

it would be twice as fast as an interconnect with a 8-bit data path.

The

"bitness" of a memory system refers

to how many wires are used to transfer a memory address. For example,

if we had an 8-bit path to the memory address, and we wanted the 19th

location in memory, we would turn on the appropriate wires (1, 2, and

5; we derive this from writing 19 in binary, which gives

00010011 -- everywhere there is a one, we turn on that wire).

Note, however, that since we only have 8 bits worth of addressing, we

are limited to 64 (2⁸) addresses in

memory. 32-bit systems are, therefore, limited to 4,294,967,296

(2³²) locations in memory. Since memory is

typically accessible in 1-byte blocks, this means that the system

can't directly access more than 4 GB of memory. The

shift to 64-bit operating systems and hardware means that the maximum

amount of addressable memory is about 16 petabytes (16777216 GB),

which is probably sufficient for the immediately forseeable future.

Unfortunately,

it's often not quite this simple in practice. A

32-bit SPARC system is actually capable of having more than 4 GB of

memory installed, but, in Solaris, no single process can use more

than 4 GB. This is because the hardware that controls memory

management actually uses a 44-bit addressing scheme, but the Solaris

operating system can only give any one process the amount of memory

addressable in 32 bits.

1.1.4.2 Performance ramifications

The

change from 32-bit to 64-bit architectures, then, expanded the size

of main memory and the amount of memory a single process can have. An

obvious question is, how did applications benefit from this? Here are

some kinds of applications that benefitted from larger memory spaces:

Applications that could not use the most time-efficient algorithm for

a problem because that algorithm would use more than 4 GB of memory.
Applications where caching large data sets is critically important,

and therefore the more memory available to the process, the more can

be cached.
Applications where the system is short on memory due to overwhelming

utilization (many small processes). Note that in SPARC systems, this

was not a problem: each process could only see 4 GB, but the system

could have much more installed.

In general, the biggest winners from 64-bit systems are

high-performance computing and corporate database engines. For the

average desktop workstation, 32 bits is plenty.

Unfortunately,

the change to 64-bit systems also meant that the underlying operating

system and system calls needed to be modified, which sometimes

resulted in a slight slowdown (for example, more data needs to be

manipulated during pointer operations). This means that there may be

a very slight performance penalty associated with running in 64-bit

mode.

< Free Open Study >

Programmer's Life

Friday, November 27, 2009

1.1 An Introduction to Computer Architecture

1.1 An Introduction to Computer Architecture

1.1.1 Levels of Transformation

1.1.1.1 Software: algorithms and languages

1.1.1.2 The Instruction Set Architecture

1.1.1.3 Hardware: microarchitecture, circuits, and devices

1.1.2 The von Neumann Model

1.1.3 Caches and the Memory Hierarchy

Figure 1-1. The memory hierarchy

1.1.4 The Benefits of a 64-Bit Architecture

1.1.4.1 What does it mean to be 64-bit?

1.1.4.2 Performance ramifications

No comments:

Blog Archive

About Me

Link