Thursday, October 29, 2009

Section 13.5.  Ethernet Versus IEEE 802.3 Frames










13.5. Ethernet Versus IEEE 802.3 Frames
































A number of protocols go under the loose term Ethernet. The 802.2 and 802.3 standards are represented by the protocols ETH_P_802_2 and ETH_P_802_3, respectively, but there are many other Ethernet protocols, listed in Table 13-2, as well as the LLC and SNAP extensions. The standards institute a couple of hacks to support all of these variations (h_proto isdiscussed in the following section).


Table 13-2. Valid Ethernet types (when h_proto > 1536)

Protocol

Ethernet type

Function handler

ETH_P_IP

0x0800

ip_rcv

ic_bootp_recva

ETH_P_X25

0x0805

X25_lap_receive_frame

ETH_P_ARP

0x0806

arp_rcv

ETH_P_BPQ

0x08FF

bpq_rcv

ETH_P_DNA_RT

0x6003

dn_route_rcv

ETH_P_RARP

0x8035

ic_rarp_recv

ETH_P_8021Q

0x8100

vlan_skb_rcv

ETH_P_IPX

0x8137

ipx_rcv

ETH_P_IPV6

0x86DD

ipv6_rcv

ETH_P_PPP_DISC

ETH_P_PPP_SES

0x8863

0x8864

pppoe_disc_rcv

pppoe_rcv

a The reason why IP has two handlers has to do with the possibility for the kernel to retrieve the IP configuration by means of protocols like RARP/BOOTP. The ic_bootp_recv handler is used only at boot time to take care of the dynamic IP configuration, and it is uninstalled once the configuration has been retrieved. See net/ipv4/ipconfig.c.



Ethernet was designed before the IEEE created its 802.2 and 802.3 standards. The latter are not pure Ethernet, even though they are commonly called Ethernet standards. Fortunately, the IEEE 802 committee decided to make the protocols compatible. Every Ethernet card is able to receive both the 802 standard frame types and the old Ethernet frames, and the kernel provides a routine (discussed later in this section) that allows device drivers to recognize them thanks to the solution described in this section.


This is the definition of an Ethernet header:



struct ethhdr
{
unsigned char h_dest[ETH_ALEN]; /* destination eth addr */
unsigned char h_source[ETH_ALEN]; /* source ether addr */
unsigned short h_proto; /* packet type ID field */
} _ _ATTRIBUTE_ _ ((packed));



As you will see in the next two sections on LLC and SNAP, other fields can follow the ethhdr structure. Here we are focusing on the protocol field, h_proto. Despite its name, it actually can store either the protocol in use or the length of the frame. This is because it is 2 octets (bytes) in size, but the maximum size of an Ethernet frame is 1,500 bytes. (Actually, the size can reach 1,518 if SA, DA, Checksum, and Preamble are included. Frames using 802.1q have four extra bytes of encapsulation and can therefore reach a size of 1,522 bytes.)


To save space, the IEEE decided to use values greater than 1,536 to represent the Ethernet protocol. Some preexisting protocols with identifiers lower than 1,536 (0x600 hexadecimal) were updated to meet the criteria. The 802.2 and 802.3 protocols, however, use the field to store the length of the frame.[*] Values ranging from 1,501 to 1,535 are not legal in this field.

[*] The reason for this arrangement is a long story. For the curious, I suggest reading Interconnections, Second Edition: Bridges, Routers, Switches, and Internetworking Protocols (Addison Wesley), where the author explains it with considerable irony.


Figure 13-8 shows the variations possible on an Ethernet header. Simple Ethernet is shown in (a). The 802.2 and 802.3 variant is shown in (b). As you can see, a single field serves as the protocol field in the former and the length field in the latter. In addition, the 802 variant can support LLC, as shown in (c) and SNAP, as shown in (d).



Figure 13-8. Differences between Ethernet and 802.3 frames



Linux deals with the odd distinction between protocol and length in the eth_type_trans function. A typical context is represented by the following code fragment, issued by the drivers/net/3c509.c Ethernet driver when it receives a frame. netif_rx is the function that copies the frame into the input queue and sets the NET_RX_SOFTIRQ flag to let the kernel know about the new frame in the queue (this is described in Chapter 10[]). Just before invoking netif_rx, the caller performs some important initializations with a call to eth_type_trans.

[] netif_rx is only one of the two interfaces available to device drivers to notify upper layers about the reception of frames. Both of them are described in Chapter 10.



el3_rx(struct device *dev)
{
... ... ...
skb->protocol = eth_type_trans(skb,dev);
netif_rx(skb);
... ... ...
}



eth_type_trans performs two main tasks: setting the packet type[*] and setting the protocol. It does the latter in its return value. Let's dispose of the former task before concentrating on the main issue in this section, the protocol.

[*] Even though the code calls it the packet type, it actually is the frame type because it is derived from the link layer address.



13.5.1. Setting the Packet Type








The eth_type_trans function sets skb->pkt_type to one of the PACKET_XXX values listed in include/linux/if_packet.h:



PACKET_BROADCAST


The frame was sent to the link layer broadcast address
(i.e., FF:FF:FF:FF:FF:FF for Ethernet)


PACKET_MULTICAST


The frame was sent to a link layer multicast address. Details appear later in this section.


PACKET_OTHERHOST


The frame was not addressed to the receiving interface. However, the frame is not dropped right away but is passed to the next-highest layer. As described earlier, there could be protocol sniffers or other meddlesome protocols that would like to give the frame a look.


When eth_type_trans does not set skb->pkt_type explicitly, its value ends up being 0, which is PACKET_HOST. This means the receiving interface is the recipient of the frame (from a link layer point of view, that is to say, the MAC address matched).


Most of the information needed to set the correct packet type is specified explicitly in the header. An Ethernet address is 48 bits or 6 bytes long. The two most significant bits of the first byte (in network byte order) have a special meaning (see Figure 13-9):


  • Bit 0 distinguishes multicast addresses from unicast addresses. Broadcast addresses are a special case of multicast. When set to 1, this bit denotes multicast; when 0, it denotes unicast. After checking the bit through if(*eth->h_dest&1), the function goes on to see whether the frame is a broadcast frame by comparing the address to the device's broadcast address through memcmp(eth->h_dest,dev->broadcast, ETH_ALEN).


    Figure 13-9. Unicast/multicast and local/global bits in the MAC address

  • Bit 1 distinguishes local addresses from global addresses. Global addresses are worldwide unique, local addresses are not: it is up to the system administrator to assign local addresses properly.[*] When set to 1, this bit denotes a global address; when 0, it denotes a local address.

    [*] There is no relationship between local MAC addresses and nonroutable IP addresses (192.168.x.x, etc.): they are similar in concept, but applied to two different layers in the stack.


Thus, the first part of eth_type_trans is:



unsigned short eth_type_trans(struct sk_buff *skb, struct net_device *dev)
{
struct ethhdr *eth;
unsigned char *rawp;
 
skb->mac.raw=skb->data;
skb_pull(skb,ETH_HLEN);
eth= eth_hdr(skb);
skb->input_dev = dev;
 
if(*eth->h_dest&1)
{
if(memcmp(eth->h_dest,dev->broadcast, ETH_ALEN)==0)
skb->pkt_type=PACKET_BROADCAST;
else
skb->pkt_type=PACKET_MULTICAST;
}
 
else if(1 /*dev->flags&IFF_PROMISC*/)
{
if(memcmp(eth->h_dest,dev->dev_addr, ETH_ALEN))
skb->pkt_type=PACKET_OTHERHOST;
}



The IFF_PROMISC flag is set in dev->flags when the interface is put into promiscuous mode
. As shown in the previous snapshot, eth_type_trans initializes skb->pkt_type to PACKET_OTHERHOST when the destination MAC address does not match the receiving interface's address, regardless of the IFF_PROMISC flag. This will allow PF_SOCKETS handlers to receive a copy of the frame (see netif_receive_skb in Chapter 10), but the upper-layer protocol handlers must discard buffers of PACKET_OTHERHOST type (see, for example, arp_rcv and ip_rcv).




13.5.2. Setting the Ethernet Protocol and Length




The second part of eth_type_trans retrieves the identifier of the protocol used at the higher layer. Protocol values are also called Ethertypes, and the list of valid types is kept up-to-date at http://standards.ieee.org/regauth/ethertype. The distinction between old Ethernet protocols above the value of 1,536 and 802 protocols is made in the following code fragment:



if (ntohs(eth->h_proto) >= 1536)
return eth->h_proto;
 
rawp = skb->data;
 
if (*(unsigned short *)rawp == 0xFFFF)
return htons(ETH_P_802_3);
 
/*
* Real 802.2 LLC
*/
return htons(ETH_P_802_2);
}



If values bigger than 1,536 are interpreted as protocol IDs, how does a device driver find the size of the frames it receives? In both cases, whether protocol/length values are less than 1,500 or greater than 1,536, it is the device itself that stores the size of the frame into one if its registers, where the device driver can read it. Devices can figure out the size of each frame thanks to well-known bit patterns used for that purpose. The following piece of code from vortex_rx in drivers/net/3c59x.c shows how the driver first reads the size from the device and then allocates a buffer accordingly:



/* The packet length: up to 4.5K!. */
int pkt_len = rx_status & 0x1fff;
struct sk_buff *skb;
 
skb = dev_alloc_skb(pkt_len + 5);



Do not get confused by the comment in the previous code. This particular device can receive frames up to 4.5 K in size because it handles FDDI NICs, too.


We saw in Chapter 1 what host and network byte order are. The value returned by eth_type_trans, and therefore the value assigned to skb->protocol, is in network byte order: when it is extracted from the Ethernet header it is already in network byte order, and when eth_type_trans uses a local symbol ETH_P_XXX it needs to explicitly convert it from host byte order to network byte order with the htons macro. This also means that when the kernel accesses skb->protocol later and compares it against an ETH_P_XXX value, it has to convert either ETH_P_XXX to network byte order or skb->protocol to host byte order: it does not matter what order is used, it just matters that both sides of the comparison are expressed in the same order. In other words, these two lines are equivalent:



ntohs(skb->protocol) == ETH_P_802_2
skb->protocol == htons(ETH_P_802_2)



Since eth_type_trans is called only for Ethernet frames, there are similar functions for other media types, some with names ending in _type_trans and some with other names. The following example, for instance, shows a bit of code taken from the IBM Token Ring driver (drivers/net/tokenring/ibmtr.c), before the familiar invocation of netif_rx, skb->protocol is set by TR_type_trans, just as eth_type_trans did for Ethernet devices:



static void tr_rx(struct device *dev)
{
...
skb->protocol=tr_type_trans(skb, dev);
...
netif_rx(skb);
...
}



If you look at TR_type_trans in net/802/tr.c, you will see logic similar to that of eth_type_trans, but applied to Token Ring devices.


There are also media types that set skb->protocol directly without any helper function of the _type_trans variety, since they can carry only one protocol (i.e., IrDA, AX25, etc.).




13.5.3. Logical Link Control (LLC)







The LLC layer was designed by the IEEE 802 committee when they standardized LANs. The idea was that instead of having a single higher-layer protocol identifier, it would be more flexible to specify one protocol identifier for the source (SSAP) and another for the destination (DSAP). In most cases, SSAP and DSAP would be the same for any given connectionin fact, SSAP and DSAP are always the same when the global flag is setbut having two separate values gives systems the flexibility to use different protocols.


LLC can provide its upper layer different service types:



Type I


Connectionless (i.e., datagram protocol), with no support for acknowledgments, flow control, and error recovery


Type II


Connection oriented, with support for acknowledgments, flow control, and error recovery


Type III


Connectionless, but with some of the benefits of type II


Figure 13-8(c) shows the header format of a frame using LLC. As you can see, there are three new fields:



SSAP



DSAP


These are 8-bit fields for specifying the protocols used.


Control (CTL)


The size of this field depends on the type of LLC used (type I or type II). I will not go into details on the LLC layer, but will assume this field to be 1 byte long and have the value 0x03 (type I, CTL=UI). This is sufficient for understanding the rest of the chapter.


The LLC header did not prove popular for several reasons. Perhaps the main reason is the 8-bit limit on the SSAP and DSAP identifiers, compounded by reserving two of these bits for the unicast/multicast and local/global flags.[*] Only 64 protocols could be specified in the remaining 6 bits, which was too limiting.

[*] The meaning of those two flags is the same as discussed earlier for MAC addresses, but here it applies to protocols rather than addresses.


When using local SAPs (indicated by the local/global flag in the protocol field), the network administrator must make sure all the systems agree on the local SAPs they use, which makes things complicated and less usable. Ambiguity is not possible for global SAP, but global SAP is not being used for new protocols. In the next section, you will see how this limitation was solved by extending the header with the concept of SNAP.


Table 13-3 shows the SAPs registered with the Linux kernel. LLC causes the kernel to use an extra level of indirection when retrieving the handler, compared to the protocols listed in Table 13-2 and registered with dev_add_pack.


Table 13-3. The kernel's 802.2 SAP clients

Protocol

SAP

Function handler

SNAP

0xAA

snap_rcv

IPX

0xE0

ipx_rcv




13.5.3.1. The IPX case






You may wonder whether a pure 802.3 frame format can be used, given that there is no indication of a protocol ID in Figure 13-8(b). In fact, pure 802.3 frames are not normally used. The one well-known exception involves IPX. IPX packets can be sent using raw 802.3 frames (that is, frames without an LLC header). The receiver recognizes them by means of a hack. The first field of an IPX header is a 16-bit checksum, which normally is turned off by simply setting it to 0xFFFF. Since 0xFF/0xFF[] is an invalid SSAP/DSAP combination and there is no Ethertype with that value, IPX packets using raw 802.3 can be easily recognized. When they are detected, skb->protocol is set to ETH_P_802_3, whose handler is the IPX handler (see Table 13-1).

[] The check against 0xFF/0xFF to recognize IPX packets is used all over the place in the Linux kernel. eth_type_trans is one example.




13.5.3.2. Linux's LLC implementation


The 802.2 LLC layer was expanded and rewritten during the 2.5 development cycle. The kernel's LLC implementation, which supports types I and II, consists of the following main components:


  • Two state machines. These are used to keep track of the states of the local SAPs and the connections created on top of them.

  • An LLC receive routine that feeds the right input to the two state machines based on the input frames it receives.

  • The AF_LLC socket interface. This can be used to build protocols or services in user space on top of the LLC layer.


Because none of the protocols described in this book uses the LLC layer, I will not go into detail on the definitions of the LLC services (you can refer to the IEEE 802.2 Logical Link Control specification for this[*]), nor will I look at the details of the Linux kernel's LLC implementation. Here we will only see what data structure is used to define a local SAP and briefly how ingress frames are handled.

[*] Like most IEEE documents, the one about the LLC design is pretty big and not fun to read. However, with this document in your hands, it will be much easier to go through the LLC code, especially through the boring details of the state machines.


The data structure used to define a local SAP is llc_sap, which is defined in include/net/llc.h. Among its fields are:



struct llc_addr laddr


SAP identifier.


int (*rcv_func)(struct sk_buff *, struct net_device *, struct packet_type *)


Function handler. When an SAP is opened via PF_LLC socket, this field is NULL. When the SAP is opened by the kernel, this field points to the routine provided by the kernel (see Table 13-3).


Local SAPs are created with llc_sap_open, and are inserted into the llc_sap_list list. llc_sap_open is called to create two types of SAP:


  • Those installed by the kernel itself to install kernel-level handlers[] (see Table 13-3).

    [] This can be accomplished indirectly via the register_8022_client routine, too.

  • Those managed with PF_LLC sockets (for example, when a server uses the bind system call on a PF_LLC socket to bind it to a given SAP).




13.5.3.3. Processing ingress LLC frames

Whenever an incoming frame is classified by eth_type_trans as using the LLC header (because it has a type/length field that is less than 1,536 and no special IPX case is detected), the initialization of skb->protocol to ETH_P_802_2 leads to the selection of the llc_rcv handler (see Table 13-1). This handler will select the right protocol handler based on the DSAP field in the LLC header: to do so, it calls the rcv_func handler registered with llc_sap_open for those SAPs opened by the kernel, and feeds the right input to the right state machine when the SAPs were opened with a PF_LLC socket (see Figure 13-10).



Figure 13-10. The llc_rcv function



Frames are sent out a given SAP when one of the two state machines requires it (for example, to acknowledge the reception of a frame). PF_LLC sockets can use the standard interface (i.e., sendmsg, etc.) to transmit. In both cases, frames are fed directly to dev_queue_xmit once the appropriate link layer headers have been initialized properly.





13.5.4. Subnetwork Access Protocol (SNAP)


















Given the limitations of the LLC header, the 802 committee generalized the data link header further. To make the protocol domain bigger, they introduced the concept of SNAP. Basically, when the SSAP/DSAP couple is assigned the value 0xAA/0xAA, it has a special meaning: the five bytes following the CTL field of the LLC header represent a protocol identifier. The unicast/multicast and local/global bits are also not used anymore. Thus, the size of the protocol identifier has jumped from 8 bits to 40. The reason the committee decided to use five bytes has to do with how protocol numbers are derived from MAC addresses.[*] Unlike SSAP/DSAP, the use of SNAP codes is pretty common.

[*] SNAP codes are defined as a subset of MAC addresses, which are sold by IEEE in chunks. This way, each MAC address owner has a number of SNAP codes assigned to her together with the MAC addresses. For details, I recommend reading Interconnections, Second Edition (Addison Wesley).


Since the SNAP identifier 0xAA/0xAA is a special case of SSAP/DSAP, as shown in Table 13-3, it is one of the clients that use llc_sap_open (see snap_init in net/802/psnap.c). This means that a protocol using a SNAP code will have another level of indirection, which means three of them!


Before looking at how SNAP clients register with the kernel, let's briefly see how a SNAP protocol ID is defined. As you probably know, MAC addresses are managed by the IEEE, which sells them in chunks of 224. Since a MAC address is 48 bits long (6 bytes), the IEEE simply has to give each client a 24-bit number (the first three bytes of a MAC address) and let the client use any value for the remaining 24 bits. Suppose I want to buy a chunk of MAC addresses because I want to start selling network cards. We'll call the number assigned to me XX:YY:ZZ. At that point, I would become the owner of all the addresses between XX:YY:ZZ:00:00:00 and XX:YY:ZZ:FF:FF:FF. Together with those 224 MAC addresses, I would be assigned all the 216 SNAP codes between XX:YY:ZZ:00:00 and XX:YY:ZZ:FF:FF.


Effectively, when you get a 24-bit number from the IEEE, it offers you four 24-bit numbers thanks to the four possible combinations of the global/local and unicast/multicast bits (see Figure 13-9).


Similar to the way SAP protocols are registered and unregistered, the SNAP layer provides the register_snap_client and unregister_snap_client functions, which also use a global list (snap_list) to link together all the SNAP protocols registered with the kernel. Table 13-4 shows the clients registered with the Linux kernel.


Table 13-4. SNAP client

Protocol

Snap ID

Function handler

AppleTalk Address Resolution Protocol

00:00:00:80:F3

aarp_rcv

AppleTalk Datagram Delivery Protocol

08:00:07:80:9B

atalk_rcv

IPX

00:00:00:81:37

ipx_rcv



The data structure used to define a SNAP protocol is datalink_proto, defined in include/net/datalink.h. Among its fields, you have:



unsigned short header_length


This is the length of the data link header. It is initialized to 8 in register_snap_client (see Figure 13-8(d)).


unsigned_char type[8]


Protocol identifier. Only five bytes are used (the SNAP protocol ID; see Table 13-4).


void (*request)(struct datalink_proto *, struct sk_buff *, unsigned char *)


Initialized to snap_request in register_snap_client. It initializes the SNAP header (protocol ID only) and passes the frame to the 802.2 code. It is invoked before a transmission to fill in the data link header.


void (*rcvfunc)(struct sk_buff *, struct net_device *, struct packet_type *)


Function handler for ingress traffic. See Table 13-4.


I'll focus for just a moment on IPX. It's worth pointing out that this protocol registers the same handler with the kernel at three different points:


  • One with an Ethertype (Table 13-2)

  • One as an 802.3 SSAP/DSAP protocol (Table 13-3)

  • One as a SNAP protocol (Table 13-4)


Figure 13-11 summarizes how the kernel recognizes and handles Ethernet, 802.3, 802.2, and SNAP frames.



Figure 13-11. Protocol detection for Ethernet/802.3/802.2/SNAP frames














No comments: