18.5. ChecksumsA checksum is a redundant field used by network protocols to recognize transmission errors. Some checksums The idea behind a checksum is simple. Before transmitting a packet, the sender computes a small, fixed-length field (the checksum) containing a sort of hash of the data. If a few bits of the data were to change during transit, it is likely that the corrupted data would produce a different checksum. Depending on what function you used to produce the checksum, it provides different levels of reliability. The checksum used by the IP protocol is a simple one involving sums and one's complements, which is too weak to be considered reliable. For a more reliable sanity check, you must rely on L2 CRCs or SSL/IPSec Message Authentication Codes (MACs). Different protocols can use different checksum algorithms. The IP protocol checksum covers only the IP header. Most L4 protocols' checksums cover both their header and the data. It may seem redundant to have a checksum at L2 (e.g., Ethernet), another one at L3 (e.g., IP), and another one at L4 (e.g., TCP), because they often all apply to overlapping portions of data, but the checks are valuable. Errors can occur not only during transmission, but also while moving data between layers. Moreover, each protocol is responsible for ensuring its own correct transmission, and cannot assume that layers above or below it take on that task. As an example of the complex scenarios that can arise, imagine that PC A in LAN1 sends data over the Internet to PC B in LAN2. Let's also suppose that the L2 protocol used in LAN1 uses a checksum but that the one on LAN2 doesn't. It's important for at least one higher layer to provide some form of checksum to reduce the likelihood of accepting corrupted data. The use of a checksum is recommended in every protocol definition, although it is not required. Nevertheless, one has to admit that a better design of related protocols could remove some of the overhead imposed by features that overlap in the protocols at different layers. Because most L2 and L4 protocols provide checksums, having it at L3 as well is not strictly necessary. For exactly this reason, the checksum has been removed from IPv6. In IPv4, the IP checksum is a 16-bit field that covers the entire IP header, options included. The checksum is first computed by the source of the packet, and is updated hop by hop all the way to its destination to reflect changes to the header applied by each router. Before updating the checksum, each hop first has to check the sanity of the packet by comparing the checksum included in the packet with the one computed locally. A packet is discarded if the sanity check fails, but no ICMP is generated: the L4 protocol will take care of it (for example, with a timer that will force a retransmission if no acknowledgment is received within a given amount of time). Here are some cases that trigger the need to update the checksum:
Since the checksum used by the IP protocol is computed using the same simple algorithm that is used by TCP, UDP, and ICMP, a general set of functions has been written to be used by all of them. There is also a specialized function optimized for the IP checksum. According to the definition of the IP checksum algorithm, the header is split into 16-bit words that are summed and ones-complemented. Figure 18-13 shows an example of checksum computation on only two 16-bit words for simplicity. Linux does not sum 16-bit words, but it does sum 32-bit words and even 64-bit longs, which results in faster computation (this requires an extra step between the computation of the sum and its one's complement; see the description of csum_fold in the next section). The function that implements the algorithm, called ip_fast_csum, is written directly in Assembly language on most architectures. Figure 18-13. IP checksum computation18.5.1. APIs for Checksum ComputationThe L3 (IP) checksum is much faster to compute than the L4 checksum, because it covers only the IP header. Because it's a cheap operation, it is often computed in software. The set of general functions used to compute checksums are placed in the per-architecture files include/asm-xxx/checksum.h. (The one for the i386 platform, for instance, is include/asm-i386/checksum.h.) Each protocol calls the general function directly using the right input parameters, or defines a wrapper that calls the general functions. The checksumming algorithm allows a protocol to simply update a checksum, instead of recomputing it from scratch, when changing a previously checksummed piece of data such as the IP header. The prototype for one IP-specific function in checksum.h, ip_fast_csum, is shown here. The function takes as parameters the pointer to the IP header (iph), and its length (ihl). The latter can change due to IP options. The return value is the checksum. This function takes advantage of the fact that the IP header is always a multiple of 4 bytes in length to streamline some of the processing.
When computing the checksum of an IP header on a packet to be transmitted, the value of iphdr->check should first be zeroed out because the checksum should not reflect the checksum itself. In this algorithm, because it uses simple summing, a zero-value field is effectively excluded from the resulting checksum. This is why in different places in the code you can see that this field is zeroed right before the call to ip_fast_csum. The checksum algorithm has an interesting property that may initially confuse people who read the source code for packet forwarding and reception. If the checksum is correct, and the forwarding or receiving node runs the algorithm over the entire header (leaving the original iphdr->check field in place), a result of zero is obtained. If you look at the function ip_rcv, you can see that this is exactly how input packets are validated against the checksum. This way of checking for corruption is faster than the more intuitive way of zeroing out the iphdr->check field and recomputing. Here are the main functions used to compute or update an IP checksum:
There are several other general support routines in the previously mentioned checksum.h file, but they are mostly used by L4 protocols. For instance:
Newer NICs can provide both the IP and L4 checksum computations in hardware. While Linux takes advantage of the L4 hardware checksumming capabilities of most modern NICs, it does not take advantage of the IP hardware checksumming capabilities because it's not worth the extra complexity (i.e., the software computation is already fast enough given the limited size of the IP header). Hardware checksumming is only one example of CPU offloading that allows the kernel to process packets faster; most modern NICs provide some L4 (mainly TCP) offloading, too. Hardware checksumming is briefly described in Chapter 19. 18.5.2. Changes to the L4 ChecksumThe TCP and UDP protocols compute a checksum that covers their header, their payloads, and what is known as the pseudoheader, which is basically a block whose fields are taken from the IP header for convenience (see Figure 18-14). In other words, some information that appears in the IP header ends up being incorporated in the L4 checksum Figure 18-14. Pseudoheader used by TCP and UDP while computing the checksumUnfortunately, the IP layer sometimes needs to change some of the IP header fields, for NAT or other activities, that were used by TCP and UDP in their pseudoheaders. The change at the IP level invalidates the L4 checksums. If the checksum is left in place, none of the nodes at the IP layer will detect any error because they validate only the IP checksum. However, the TCP layer of the destination host will believe the packet is corrupted. This case therefore has to be handled by the kernel. Furthermore, there are routine cases where L4 checksums computed in hardware on received frames are invalidated. Here are the most common ones:
Although the name might prove confusing, the field skb->ip_summed has to do with the L4 checksum (more details in Chapter 19). Its value is manipulated by the IP layer when it knows that something has invalidated the L4 checksum, such as a change in a field that is part of the pseudoheader. I will not cover the details of how the checksum is computed for locally generated packets. But we will briefly see in the section "Copying data into the fragments: getfrag" in Chapter 21 how it can be computed incrementally while creating fragments. |
Monday, October 26, 2009
Section 18.5. Checksums
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment