25.8. Transmitting ICMP MessagesThe two classes of ICMP messages introduced in the section "ICMP Header," errors and queries, are transmitted using two different routines:
Both routines receive an skb buffer in input. However, the one used as input to icmp_send represents the ingress IP packet that triggered the transmission of the ICMP message, whereas the one in input to icmp_reply represents an ingress ICMP request message that requires a response. The code in net/core/icmp.c processes incoming ICMP messages, and therefore always uses icmp_reply to transmit an ICMP message in response to another one received in input. Other kernel network subsystems (i.e., routing, IP, etc.) use icmp_send when they need to generate ICMP messages, as shown in Figure 25-8. Figure 25-8. Subsystems using icmp_send/icmp_replyIn both cases:
Tables 25-6, 25-7, and 25-8 show where the ICMP types in Table 25-1 are generated by the kernel. For those subsystems covered in this book, it also includes a reference to the routines where the ICMP messages are generated.
Netfilter generates ICMP_DEST_UNREACH messages when it drops ingress IP packets according to the configuration applied, for instance, with iptables. The -reject-with option for the REJECT target allows the user to select which ICMP message type to use when rejecting ingress IP packets that match a given rule. Tunneling protocols such as IPIP and GRE, defined in net/ipv4/ipip.c and net/ipv4/ip_gre.c, respectively, need to handle ICMP messages according to the rules in RFC 2003, section 4.
25.8.1. Transmitting ICMP Error MessagesFigures 25-9(a) and 25-9(b) show the internals of icmp_send. Here are its input parameters:
icmp_send starts with a few sanity checks to filter out illegal requests. The following conditions cause it to abort:
It is not the responsibility of the ICMP layer to initialize the IP header. However, a couple of IP header fields will be initialized by the IP layer according to the requirements of ICMP. In particular:
Next, the function finds the route to the destination with ip_route_output_key, which is a cache lookup routine introduced in Chapter 33. Note that, as shown in Figure 25-8, transmissions are rate limited with a token bucket algorithm via the icmpv4_xrlim_allow routine. When the ICMP message is not suppressed by the token bucket algorithm, the transmission ends with a call to icmp_push_reply, which ends up calling the two IP routines shown in Figure 25-8. 25.8.2. Replying to Ingress ICMP MessagesAs mentioned in the section "ICMP Header," a subset of the ICMP message types comes in pairs: a request message and a response message. For one example, an ICMP_ECHOREPLY message is sent in answer to an ingress ICMP_ECHO message. The transmission of response messages is done as follows:
25.8.3. Rate LimitingICMP messages are rate limited in two places:
The two types of rate limiting Let me clarify this point. The kernel keeps the rate-limiting information needed to apply the token bucket algorithm in the dst_entry entries of the routing cache. Each dst_entry instance is associated with a destination IP address (more details in Chapter 33). This alone tells us that rate limiting is applied on a per-IP-address basis, not on a per-ICMP-message-type basis, but let's see exactly how per-source and per-destination rate limiting differ:
25.8.4. Implementation of Rate LimitingLet's see now how the ICMP code applies its rate limiting. As shown in Figure 25-10, any time an ICMP message is transmitted and rate limiting is configured in the kernel, the icmpv4_xrlim_allow function is called to enforce rate limiting. Both the ICMP message types to rate limit (sysctl_icmp_ratemask) and the rate limit's rate (sysctl_icmp_ratelimit) can be configured via /proc (see the section "Tuning via /proc Filesystem"). Figure 25-10. icmpv4_xrlim_allow functionicmpv4_xrlim_allow does not apply any rate limiting in the following cases:
icmpv4_xrlim_allow is a wrapper for a more general-purpose function, xlim_allow, which does the real job. It is called if, according to the sysctl_icmp_ratemask bitmap, the ICMP message is to be rate limited.
xrlim_allow applies a simple token bucket algorithm. Whenever it is called, it updates the available dst->rate_tokens tokens (measured in jiffies), makes sure that the accumulated tokens are not more than a predefined maximum value (XRLIM_BURST_FACTOR), and allows the transmission of the ICMP message if the available tokens are sufficient. The input parameter timeout represents the rate to enforce, expressed in Hz (for example, 1*HZ would mean a rate limit of one ICMP message per second). Note that since xrlim_allow is a generic routine shared by different protocols, it operates on protocol-independent routing cache entries (dst_entry structures), and icmpv4_xrlim_allow is an IPv4 routine and therefore operates on rtable data structures. For more details on the dst_entry and rtable data structures, please refer to Chapter 36. 25.8.5. Receiving ICMP Messagesicmp_rcv is the function called by ip_local_deliver_finish to process ingress ICMP messages. The ICMP protocol registers its receiving First, the ICMP message's checksum is verified. Note that even when the receiving NIC is able to compute the L4 checksum in hardware (which would be the ICMP checksum in this case) and that checksum says the ICMP message is corrupted, icmp_rcv verifies the checksum once more in software. You can refer to the section "sk_buff structure" in Chapter 19 for more details on L4 checksumming support by NICs. Not all ICMP message types can be sent to a multicast IP address: only ICMP_ECHO, ICMP_TIMESTAMP, ICMP_ADDRESS, and IMCP_ADDRESSREPLY. icmp_rcv filters out those messages that do not respect this rule. In particular, ingress broadcast ICMP_ECHO messages are dropped if the system has been configured to do so. See the section "Tuning via /proc Filesystem." When all sanity checks are satisfied, icmp_rcv passes the ingress ICMP message to the right helper routine. The latter is accessed via the icmp_pointers vector that is initialized at the end of net/ipv4/icmp.c. icmp_pointers is an array of icmp_control data structures. Table 25-9 summarizes part of icmp_pointers's initialization. See the section "icmp_control Structure" for the exact meaning of the handler and error fields. Any types not in the table are obsolete, unsupported, or not supposed to be processed in kernel space. For all these types, handler is initialized to icmp_discard.
Figure 25-11 shows the internals of icmp_rcv . Note that neither ICMP_ADDRESS nor ICMP_ADDRESSREPLY is supported; the two handlers that are registered against them are just placeholders or apply some kind of logging. Figure 25-11. icmp_rcv functionNote also that the icmp_unreach handler takes care of different ICMP message types, not just ICMP_DEST_UNREACH. Figure 25-12(a) shows how some of skb's pointers are initialized when icmp_rcv is invoked, and Figure 25-12(b) shows how they are initialized when the handlers of Table 25-9 are called. This figure can be useful when analyzing the routines in Table 25-9, especially icmp_unreach. Figure 25-12. (a) skb at the beginning of icmp_rcv; (b) skb as it is passed to the handler25.8.6. Processing ICMP_ECHO and ICMP_ECHOREPLY MessagesICMP_ECHO messages are processed according to the generic model described in the section "Replying to Ingress ICMP Messages":
ICMP_ECHOREPLY messages are not processed by the kernel, but by the applications that generated the associated ICMP_ECHO messages. See the section "Raw Sockets and Raw IP" in Chapter 24 for an example involving ping. 25.8.7. Processing the Common ICMP Messagesicmp_unreach is used as a handler for multiple ICMP types, as shown in Table 25-9. The function starts with some common sanity checks, continues with some processing based on the particular message type, and concludes with another common part. The internals of the routine are shown in Figure 25-13. The per-type processing is minimal:
For both ICMP_FRAG_NEEDED and ICMP_SR_FAILED, the logging is rate limited via LIMIT_NETDEBUG, which is a generic routine that rate limits networking-related messages to five per second. The last part of icmp_unreach is again common to all ICMP types that use it as a handler, and consists of the following tasks:
25.8.8. Processing ICMP_REDIRECT Messagesicmp_redirect, the function used to process incoming ICMP_REDIRECT messages, is a wrapper around ip_rt_redirect with some additional sanity checks. The logic used by the latter function is described in the section "Processing Ingress ICMP_REDIRECT Messages" in Chapter 31. ip_rt_redirect adds an entry to the routing cache with rt_intern_hash, which is described in Chapter 33. The route is initialized with the RTCF_REDIRECTED flag toggled on, to be distinguished from the other routes. For example, we will see in the section "Examples of eligible cache victims" in Chapter 30 how the routing code uses this information when it is forced to delete entries from the routing cache. The system administrator can also influence when ICMP redirects are generated. Through the /proc filesystem, it is possible to specify for each interface whether to send and accept ICMP redirects (see the section "The /proc/sys/net/ipv4/conf Directory" in Chapter 36). Using the firewall capabilities, as well, the administrator can specify from whom to accept particular types of ICMP packets and therefore whose ICMP_REDIRECT messages to trust. 25.8.9. Processing ICMP_TIMESTAMP and ICMP_TIMESTAMPREPLY MessagesIngress ICMP_TIMESTAMP messages are handled by replying with an ICMP_TIMESTAMPREPLY message, using the scheme discussed in the section "Replying to Ingress ICMP Messages." The second and third timestamps are not initialized according to the rules we saw in the section "ICMP_TIMESTAMP and ICMP_TIMESTAMPREPLY": they are initialized to the same timestamp with do_gettimeofday. Note that head_len is initialized to include not only the default ICMP header length, but also the three 32-bit timestamps. 25.8.10. Processing ICMP_ADDRESS and ICMP_ADDRESSREPLY MessagesBecause the Linux kernel does not generate ICMP_ADDRESS messages, ingress ICMP_ADDRESSREPLY messages cannot be answers to queries generated locally (not in kernel space, at least). However, when forwarding and logging of Martian addresses[*] are enabled on the ingress device, Linux listens to ICMP_ADDRESSREPLY messages with icmp_address_reply. The latter function checks whether the mask advertised with the message is correct with regard to the IP addresses configured on the receiving interface: if the receiving interface does not have any IP address configured on the same subnet of the source IP address used by the ICMP message sender (which also implies the exact same netmask), the kernel logs a warning.
The sanity check on the received reply is not done when the routing cache has the RTCF_DIRECTSRC flag set. This flag is set only when the destination address is reachable by the local host via a next hop that has local scope (i.e., that exists only internally to the Linux box). |
Thursday, October 22, 2009
Section 25.8. Transmitting ICMP Messages
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment