Monday, October 26, 2009

Section 24.2.  L4 Protocol Registration










24.2. L4 Protocol Registration


















The L4 protocols that rest on IPv4 are defined by net_protocol data structures, defined in include/net/protocol.h, which consist of the following three fields:



int (*handler)(struct sk_buff *skb)


Function registered by the protocol as the handler for incoming packets. This is discussed further in the section "L3 to L4 Delivery: ip_local_deliver_finish." It is possible to have protocols that share the same handler for both IPv4 and IPv6 (e.g., SCTP).


void (*err_handler)(struct sk_buff *skb, u32 info)


Function used by the ICMP protocol handler to inform the L4 protocol about the reception of an ICMP UNREACHABLE message. We will see in Chapter 35 when a Linux system generates ICMP UNREACHABLE messages, and we will see in Chapter 25 how the ICMP protocol uses err_handler.


int no_policy


This field is consulted at certain key points in the network stack and is used to exempt protocols from IPsec policy checks: 1 means that there is no need to check the IPsec policies for the protocol. Do not confuse the no_policy field of the net_protocol structure with the field bearing the same name in the ipv4_devconf structure: the former applies to a protocol; the latter applies to a device. See the sections "L3 to L4 Delivery: ip_local_deliver_finish" and "IPsec" for how no_policy is used.


The include/linux/in.h file contains a list of L4 protocols defined as IPPROTO_XXX symbols. (For a more complete list, see the /etc/protocols file, or RFC 1700 and its successor RFCs.) The maximum value for an L4 protocol identifier is 28-1 or 255, because the field in the IP header allocated to specify the L4 protocol is 8 bits. The highest number, 255, is reserved for Raw IP, IPPROTO_RAW.


Not all of the protocols defined in the list of symbols are handled at the kernel layer; some of them (notably Resource Reservation Protocol, or RSVP, and the various routing protocols) are usually handled in user space. This is, for example, why RSVP and routing protocols like OSPF are not included in the list of L4 protocols supported by the kernel that is in the previous section.



24.2.1. Registration: inet_add_protocol and inet_del_protocol

















Protocols register themselves with the inet_add_protocol function and, when implemented as modules, unregister themselves with the inet_del_protocol function. Both routines are defined in net/ipv4/protocol.c.


All of the inet_protocol structures of the L4 protocols registered with the kernel are inserted into a table named inet_protos, represented in Figure 24-2. In earlier versions of the kernel, this was a hash table, and the word hash still appears in the code that handles the table, but currently it is a simple flat array with one item for each of the possible 256 protocols. The protocol number from /etc/protocols is the slot in the table where the protocol is inserted. If you'd like to see how the table was handled as a hash table in the 2.4 kernel, look in the 2.4 sources at the ip_run_ipprot function. Figure 24-2 shows the numbers and initials of the most common protocols; for instance, ICMP is protocol 1 and occupies slot 1 in the inet_protos table.



Figure 24-2. IPv4 protocol table



Concurrent accesses to the inet_protos table are managed in this way:


  • Read-write accesses (i.e., inet_add_protocol and inet_del_protocol) are serialized with the inet_proto_lock spin lock.

  • Read-only accesses (i.e., ip_local_deliver_finish; see the next section) are protected with rcu_read_lock/rcu_read_unlock.


inet_del_protocol, which may remove an entry of the table currently held by an RCU reader, calls synchronize_net to wait for all the currently executing RCU readers to complete their critical section before returning. There is another hash table used by protocols that rest on IPv6. Note that IPv6 appears in the IPv4 inet_protos table as well: the kernel can tunnel IPv6 over IPv4 (also called SIT, for Simple Internet Transition). See the section "IPv6 Versus IPv4."


As mentioned in the previous section, the ICMP, UDP, and TCP protocols are always part of the kernel and therefore are statically added to the hash table at boot time by inet_init in net/ipv4/af_inet.c. The following excerpts show the definitions of their structures and the actual inet_add_protocol calls that register them:



#ifdef CONFIG_IP_MULTICAST
static struct net_protocol igmp_protocol = {
.handler = igmp_rcv,
};
#endif

static struct net_protocol tcp_protocol = {
.handler = tcp_v4_rcv,
.err_handler = tcp_v4_err,
.no_policy = 1,
};

static struct net_protocol udp_protocol = {
.handler = udp_rcv,
.err_handler = udp_err,
.no_policy = 1,
};

static struct net_protocol icmp_protocol = {
.handler = icmp_rcv,
};

static int _ _init inet_init(void)
{
...

/*
* Add all the base protocols.
*/

if (inet_add_protocol(&icmp_protocol, IPPROTO_ICMP) < 0)
printk(KERN_CRIT "inet_init: Cannot add ICMP protocol\n");
if (inet_add_protocol(&udp_protocol, IPPROTO_UDP) < 0)
printk(KERN_CRIT "inet_init: Cannot add UDP protocol\n");
if (inet_add_protocol(&tcp_protocol, IPPROTO_TCP) < 0)
printk(KERN_CRIT "inet_init: Cannot add TCP protocol\n");
#ifdef CONFIG_IP_MULTICAST
if (inet_add_protocol(&igmp_protocol, IPPROTO_IGMP) < 0)
printk(KERN_CRIT "inet_init: Cannot add IGMP protocol\n");
#endif
...
}



The IGMP handler is registered only when the kernel is compiled with support for IP multicast.


As an example of how other protocols are dynamically registered, the following snapshot is taken from the Zebra user-space routing daemon's implementation of the Open Shortest Path First IGP (OSPFIGP) protocol
. The code is taken from the ospfd/ospf_network.c file in the Zebra package. The socket call effectively registers the user-space daemon with the kernel, giving the kernel a place to send ingress packets that use the protocol specified in the third argument. This protocol is IPPROTO_OSPFIGP, a symbol equal to 89, the number assigned to OSPFIGP in /etc/protocols. Note also that the socket type is SOCK_RAW, because packets have a private format that the OSPFIGP protocol knows how to handle. The use of raw sockets is described later in the section "Raw Sockets and Raw IP."



int
ospf_serv_sock (struct interface *ifp, int family)
{
int ospf_sock;
int ret, tos;
struct ospf_interface *oi;

ospf_sock = socket (family, SOCK_RAW, IPPROTO_OSPFIGP);
if (ospf_sock < 0)
{
zlog_warn ("ospf_serv_sock: socket: %s", strerror (errno));
return ospf_sock;
}
... ... ...
}



For each L4 protocol there can be only one handler in kernel space (but multiple handlers could be present in user space, as discussed later in the section "Raw Sockets and Raw IP"). inet_add_protocol complains (returns -1) when it is called to install a handler for an L4 protocol that already has one.













No comments: