Programmer's Life: NFS: You'll Never Find Your Stuff

NFS: You’ll Never Find Your Stuff

If your computer is on a LAN, the computer is probably set up to share files with other computers. Quite a few different schemes enable computers to use files on other machines. These schemes are named mostly with TLAs (Three Letter Acronyms) such as AFS, RFS, and NFS. This chapter talks mostly about NFS (you’ll Never Find your Stuff) because that’s the most commonly used scheme, even though it works, in many ways, the worst. If you didn’t like the C shell or the vi editor, you won’t like NFS either; it also was written by Bill, the big guy with the strong opinions.

What’s NFS?

The NFS (Network File System) program enables you to treat files on another computer in more or less the same way you treat files on your own computer.

You may want to use NFS for several reasons:

Often you have a bunch of similar computers scattered around, all running more or less the same programs. Rather than load every program on every computer, the system administrator loads one copy of everything on one computer (the server) so that all the other computers (the clients) can share the programs.

Centralizing the files on a server makes backup and administration easier. Administering one disk of 4,000MB is easier than administering 10 disks of 400MB apiece. Backing up everything is also easier because everything is all in one place rather than spread around on a dozen machines.

Another use of NFS is to make a bunch of workstations function as a shared time-sharing system. Setting up a bunch of workstations so that you can sit down at any one of them, log in, and use the same set of files regardless of where on the network they physically reside is reasonably straightforward. This capability is a great convenience. Also, by using programs such as ssh (discussed earlier in this chapter), you can log in to another machine on the network and work from that machine (which is handy if the other machine is faster than yours or has some special feature you want to use).

NFS works in heterogeneous networks, a fancy term for networks with different kinds of computers. NFS is available for all sorts of computers, from PCs to mainframes.

We discuss the technical theology of remote file access here. Still reading? Geez, what a glutton for punishment.

The communication between the server (the machine with the files) and the client (the machine that wants to use them) is handled in two general ways: One approach is known as stateless, and the other (for lack of a better word) is called stateful.

The stateful approach is more straightforward: The two machines have a conversation, the gist of which runs something like this:

“I want to read a file called /usr/elvis/ current-whereabouts."

“Very good, sir — an excellent choice.”

“Can I have the first piece of that file I just asked about?”

“Certainly, sir. It’s so-and-so.”

“Thank you so much. May I have the next piece?”

“My pleasure. It’s such-and-such.”

The only problem in this example occurs if one or the other machine crashes during the conversation. When it comes back, the server has no recollection of what it was talking about, the conversation cannot be reestablished easily, and all sorts of special recovery schemes are necessary to get things back in sync. (“Beg pardon, old boy, I’ve had a spot of amnesia. Can you remind me what we were chatting about?”)

Back when Bill was writing NFS, he didn’t feel like writing all that recovery code (it’s difficult to write and boring, to boot) so he made NFS stateless. This decision gave NFS a severe case of amnesia on the part of all the servers. Rather than keep track of which client is asking for which file, NFS couldn’t care less. The NFS servers don’t have the faintest idea who their clients are, and they forget everything about a client from one request to the next. The conversation goes more like this:

“I want to read /usr/elvis/current-whereabouts.”

“It’s all the same to me. On my disk, it’s file number 86345.”

“Send me the first piece of file 86345.”

“Well, okay, if you insist. It’s so-and-so.”

“Send me the second piece of file 86345.”

“Who the heck are you? Hardly matters — I wouldn’t remember, even if you told me. Anyway, the answer is such-and-such.”

The advantage here is that, if the server crashes, when it comes back up, the server can pick up where it left off. Because the server didn’t know anything about its clients anyway, it doesn’t forget anything. The disadvantage is that determining whether a request got lost or, because of network glitchery, got handled twice is difficult. In a stateful setup, figuring out what happened is easier: Every message has a number. If messages 106 and 108 arrive without 107 between them, you know that something got lost. Because stateless messages don’t have numbers (it wouldn’t matter if they did, because the stateless server doesn’t remember the number from one message to the next), you have no way to tell whether a message got lost. In practice, if a client doesn’t get an answer to a request within a few seconds, it repeats the request because NFS requests are supposed to be idempotent (this 25-cent word means that it doesn’t hurt if the server does them more than once).

Most requests are indeed idempotent (whether you write the same stuff to the same part of a file twice in a row doesn’t matter) — but not all of them are. If the request was something like “delete the furble file" and the server in fact received the request but lost the response, the second time the client sends the request, the server complains that the file is not there and sends back an error (even though, from the client’s point of view, the file was there when it asked to delete it). Are you confused yet? We certainly are.

More complex sequences of repeated and lost messages can cause the contents of a file to be thrown away by mistake. (No, we don’t go into detail — we know that you have already stopped reading this part.) Fortunately, such sequences are rare, although they have been known to happen.

NFS doesn’t handle tapes, printers, and the like because even Bill couldn’t figure out how to make an idempotent printer — one in which printing a page twice was the same thing as printing it once. Perhaps he could have used transparent ink.

Ignoring NFS

Except when NFS screws up, you don’t have to worry about using it. Your system administrator did all the hard work when she installed it.

Files passed over the network act almost exactly like those on the local machine; in most cases, you can treat them the same. The primary difference is that access to files through NFS is about twice as slow as access to files on the local machine. This problem usually isn’t a big one because, for most of the stuff you do, the machine doesn’t spend much time waiting for the disk anyway.

�Tip��When you do something really big and slow (such as repaginate a 500-page document), seeing whether you can log in to the machine on which the files reside and run the program there may be worth the time.

Where are those files, anyway?

NFS works by mounting remote directories. Mounting means pretending that a directory on another disk or even on another computer is actually part of the directory system on your disk. Files stored in lots of different places can then appear to be nicely organized into one tree-structure directory.

Whenever UNIX sees the name of a directory — /stars/elvis, for example — it checks to see whether any names in the directory are mount points, which are directories in which one disk is logically attached to another.

Your system may have the directory /stars mounted from some other machine, for example, and then the directory elvis and all the files in it reside on the other machine.

The easiest way to tell which files are where is with the df (Disk Free space) command. It prints the amount of free space on every disk and tells you where the disks are. Here’s a typical piece of df output:

Filesystem  kbytes    used   avail capacity Mounted on
/dev/sd0a    30383    6587   20758  24%   /
/dev/sd0g   157658  124254   17639  88%   /usr
/dev/sd0h   364378  261795   66146  80%   /home
/dev/sd3a    15671    1030   13074   7%   /tmp
/dev/sd3g  1175742  758508  299660  72%   /mnt
srvsys:/usr/spool/mail  300481  190865  79567   71%   /var/spool/mail
srvsys:/usr/lib/news 300481  190865  79567 71% /usr/lib/news

In this example, the directory / resides on a local disk (a disk on your own computer) named /dev/sd0a; /usr resides on /dev/sd0g; /home resides on /dev/sd0h; and so on. (We don’t go into the subject of disk names other than to say that anything in /dev is on the local machine.) The directory /var/spool/mail is really the directory /usr/spool/mail on machine server-sys, and /usr/lib/news is really /usr/lib/news on machine server-sys.

Some of the local directory names are the same as the remote machine’s directory names — and some aren’t. This situation can and often does cause considerable confusion; unfortunately, it’s usually unavoidable. A system administrator with any sense at least mounts each directory with a consistent name wherever it’s mounted so that /var/documents/bigproject is the same no matter which computer you’re working on.

A database known as NIS (Network Information System) makes keeping the naming process straight easier. Don’t worry about it unless your system administrator messes up.

NFS and system crashes

What happens if you’re working with NFS, your files are stored on a server, and the server crashes? The answer is, you wait. Eventually, when the server comes back, you continue from where you left off. If the crash is severe, you may wait a long time. In one extreme case (so we have heard), a program on an NFS client system waited more than six months while the server crashed, was dismantled and shipped back to the manufacturer, and then was refurbished, shipped back, reloaded from tape, and rebooted — at which point the client program continued. You probably won’t be so patient.

The worst practical problem is that, if a program stalls while it is waiting for a dead NFS server, you have no way to stop or kill the program, short of rebooting your UNIX computer.

Recent versions of NFS have features called soft and hard mounts (not as indecent as they sound, but close) that make stopping a program that has stalled while waiting for a dead server possible. The problem is that, if a server is merely slow and not dead (and believe us, a server loaded with hundreds of clients can be impressively slow), a client may assume that the server is dead and stop a program. Had the client been a little more patient, the server would have responded, and the program could have completed its task.

Programmer's Life

Monday, October 26, 2009

NFS: You'll Never Find Your Stuff

NFS: You’ll Never Find Your Stuff

What’s NFS?

Ignoring NFS

Where are those files, anyway?

NFS and system crashes

No comments:

Blog Archive

About Me

Link