1.1 An Introduction to Computer Architecture
A full discussion of computer
architecture is far beyond the level of this text. Periodically,
we'll go into architectural matters, in order to
provide the conceptual underpinnings of the system under discussion.
However, if this sort of thing interests you, there are a great many
excellent texts on the topic. Perhaps the most commonly used are two
textbooks by John Hennessy and David Patterson: they are titled
Computer Organization and Design: The Hardware/Software
Interface and Computer Architecture: A
Quantitative Approach (both published by Morgan
Kaufmann).
In this section, we'll focus on the two most
important general concepts of architecture: the general means by
which we approach a problem (the levels of transformation), and the
essential model around which computers are designed.
1.1.1 Levels of Transformation
When we approach a problem, we
must reduce it to something that a computer can understand and work
with: this might be anything from a set of logic gates, solving the
fundamental problem of "How do we build a
general-purpose computing machine?" to a few million
bits worth of binary code. As we proceed through these logical steps,
we transform the problem into a
"simpler" one (at least from the
computer's point of view). These steps are the
levels of transformation.
1.1.1.1 Software: algorithms and languages
When
faced with a problem where we think a computer will be of assistance,
we first develop an algorithm for completing
the task in question. Algorithms are, very simply, a repetitive set
of instructions for performing a particular task -- for example, a
clerk inspecting and routing incoming mail follows an algorithm for
how to properly sort the mail.
This algorithm must then be translated by
a programmer into a program written in a
language. Generally, this is a high-level language, such as C or
Perl, although it might be a low-level language, such as assembler.
The language layer exists to make our lives easier: the structure and
grammar of high-level languages lets us easily write complex
programs. This high-level language program, which is usually portable
between different systems, is then transformed by a compiler into the
low-level instructions required by a specific system. These
instructions are specified by the Instruction Set Architecture.
1.1.1.2 The Instruction Set Architecture
The
Instruction Set Architecture, or ISA, is the
fundamental language of the microprocessor: it defines the basic,
indivisible instructions that we can execute. The ISA serves as the
interface between software and hardware. Examples of instruction set
architectures include IA-32, which is used by Intel and AMD CPUs;
MIPS, which is implemented in the Silicon Graphics/MIPS R-series
microprocessors (e.g., the R12000); and the SPARC V9 instruction set
used by the Sun Microsystems UltraSPARC series.
1.1.1.3 Hardware: microarchitecture, circuits, and devices
At this level, we are firmly in
the grasp of electrical and computer engineering. We concern
ourselves with functional units of microarchitecture and the
efficiency of our design. Below the microarchitectural level, we
worry about how to implement the functional units through circuit
design: the problems of electrical interference become very real. A
full discussion of the hardware layer is far beyond us here; tuning
the implementations of microprocessors is not something we are
generally able to do.
1.1.2 The von Neumann Model
The von Neumann model has served as the
basic design model for all modern computing systems: it provides a
framework upon which we can hang the abstractions and flesh generated
by the levels of transformation. The model
consists of four core components:
A memory
system, which stores both instructions and data. This is
known as a stored program computer. This
memory is accessed by means of the memory address
register (MAR), where the system puts the address of a
location in memory, and a memory data register
(MDR), where the memory subsystem puts the data stored at the
requested location. I discuss memory in more detail in Chapter 4.
At least one processing
unit, often known as the arithmetic and logic
unit (ALU). The processing units are more commonly called
the central processing unit (CPU). It is
responsible for the execution of all instructions. The processor also
has a small amount of very fast storage space, called the
register file. I discuss processors in detail
in Chapter 3.
A control unit,
which is responsible for controlling cross-component operations. It
maintains a program counter, which contains
the next instruction to be loaded, and an instruction
register, which contains the current instruction. The
peculiarities of control design are beyond the scope of this text.
The system needs a nonvolatile way to
store data, as well as ways to represent it to the user and to accept
input. This is the domain of the input/output
(I/O) subsystem. This book primarily concerns itself with disk drives
as a mechanism for I/O; I discuss them in Chapter 5. I also discuss network I/O in Chapter 7.
Despite all the advances in computing over the last sixty years, they
all fit into this framework. That is a very powerful statement:
despite the fact that computers are orders of magnitude faster now,
and being used in ways that weren't even imaginable
at the end of the Second World War, the basic ideas, as formulated by
von Neumann and his colleagues, are still applicable today.
1.1.3 Caches and the Memory Hierarchy
As you'll see in
Section 1.2 later in this chapter,
one of the principles of performance tuning is that there are
always trade-offs. This problem was recognized
by the pioneers in the field, and we still do not have a perfect
solution today. In the case of data storage, we are often presented
with the choice between cost, speed, and size. (Physical parameters,
such as heat dissipation, also play a role, but for this discussion,
they're usually subsumed into the other variables.)
It is possible to build extremely large, extremely fast memory
systems -- for example, the Cray 1S supercomputer used very fast
static RAM exclusively for memory. This is not
something that can be adapted across the spectrum of computing
devices.
The problem we are trying to solve is that storage size tends to be
inversely proportional to performance, particularly relative to the
next highest level of price/performance. A modern microprocessor
might have a cycle time measured in fractions of a nanosecond, while
making the trip to main memory can easily be fifty times slower.
To try and work around this problem, we employ something known as the
memory hierarchy. It is based on creating a
tree of storage areas (Figure 1-1). At the top of
the pyramid, we have very small areas of storage that are exceedingly
fast. As we progress down the pyramid, things become increasingly
slow, but correspondingly larger. At the foundation of the pyramid,
we might have storage in a tape library: many terabytes, but it might
take minutes to access the information we are looking for.
|
From the point of view of the microprocessor, main memory is very
slow. Anything that makes us go to main memory is bad -- unless
we're going to main memory to prevent going to an
even slower storage medium (such as disk).
|
|
The function of the
pyramid is to cache the most frequently used data and instructions in
the higher levels. For example, if we keep accessing the same file on
tape, we might want to store a temporary copy on the next fastest
level of storage (disk). We can similarly store a file we keep
accessing from disk in main memory, taking advantage of main
memory's substantial performance benefit over disk.
1.1.4 The Benefits of a 64-Bit Architecture
Companies that produced computer hardware and software often make a
point of mentioning the size of their systems'
address space (typically 32 or 64 bits). In the last five years, the
shift from 32-bit to 64-bit microprocessors and operating systems has
caused a great deal of hype to be generated by various marketing
departments. The truth is that although in certain cases 64-bit
architectures run significantly faster than 32-bit architectures, in
general, performance is equivalent.
1.1.4.1 What does it mean to be 64-bit?
The number of
"bits" refers to the width of a
data path. However, what this actually means is subject to its
context. For example, we might refer to a 16-bit data path (for
example, UltraSCSI). This means that the interconnect can transfer 16
bits of information at a time. With all other things held constant,
it would be twice as fast as an interconnect with a 8-bit data path.
The
"bitness" of a memory system refers
to how many wires are used to transfer a memory address. For example,
if we had an 8-bit path to the memory address, and we wanted the 19th
location in memory, we would turn on the appropriate wires (1, 2, and
5; we derive this from writing 19 in binary, which gives
00010011 -- everywhere there is a one, we turn on that wire).
Note, however, that since we only have 8 bits worth of addressing, we
are limited to 64 (28) addresses in
memory. 32-bit systems are, therefore, limited to 4,294,967,296
(232) locations in memory. Since memory is
typically accessible in 1-byte blocks, this means that the system
can't directly access more than 4 GB of memory. The
shift to 64-bit operating systems and hardware means that the maximum
amount of addressable memory is about 16 petabytes (16777216 GB),
which is probably sufficient for the immediately forseeable future.
Unfortunately,
it's often not quite this simple in practice. A
32-bit SPARC system is actually capable of having more than 4 GB of
memory installed, but, in Solaris, no single process can use more
than 4 GB. This is because the hardware that controls memory
management actually uses a 44-bit addressing scheme, but the Solaris
operating system can only give any one process the amount of memory
addressable in 32 bits.
1.1.4.2 Performance ramifications
The
change from 32-bit to 64-bit architectures, then, expanded the size
of main memory and the amount of memory a single process can have. An
obvious question is, how did applications benefit from this? Here are
some kinds of applications that benefitted from larger memory spaces:
Applications that could not use the most time-efficient algorithm for
a problem because that algorithm would use more than 4 GB of memory.
Applications where caching large data sets is critically important,
and therefore the more memory available to the process, the more can
be cached.
Applications where the system is short on memory due to overwhelming
utilization (many small processes). Note that in SPARC systems, this
was not a problem: each process could only see 4 GB, but the system
could have much more installed.
In general, the biggest winners from 64-bit systems are
high-performance computing and corporate database engines. For the
average desktop workstation, 32 bits is plenty.
Unfortunately,
the change to 64-bit systems also meant that the underlying operating
system and system calls needed to be modified, which sometimes
resulted in a slight slowdown (for example, more data needs to be
manipulated during pointer operations). This means that there may be
a very slight performance penalty associated with running in 64-bit
mode.
|
No comments:
Post a Comment