Tuesday, October 20, 2009

28.8 Trim Segment so Data is Within Window

Team-Fly
 

 

TCP/IP Illustrated, Volume 2: The Implementation
By
Gary R. Wright, W. Richard Stevens
Table of Contents
Chapter 28. 
TCP Input


28.8 Trim Segment so Data is Within Window


This section trims the received segment so that it contains only data that is within the advertised window:


  • duplicate data at the beginning of the received segment is discarded, and

  • data that is beyond the end of the window is discarded from the end of the segment.


What remains is new data within the window. The code shown in Figure 28.24 checks if there is any duplicate data at the beginning of the segment.



Figure 28.24. tcp_input function: check for duplicate data at beginning of segment.


Check if any duplicate data at front of segment


635-636

If the starting sequence number of the received segment (ti_seq) is less than the next receive sequence number expected (rcv_nxt), data at the beginning of the segment is old and todrop will be greater than 0. These data bytes have already been acknowledged and passed to the application (Figure 24.18).



Remove duplicate SYN


637-645

If the SYN flag is set, it refers to the first sequence number in the segment, which is known to be old. The SYN flag is cleared and the starting sequence number of the segment is incremented by 1 to skip over the duplicate SYN. Furthermore, if the urgent offset in the received segment (ti_urp) is greater than 1, it must be decremented by 1, since the urgent offset is relative to the starting sequence number, which was just incremented. If the urgent offset is 0 or 1, it is left alone, but in case it was 1, the URG flag is cleared. Finally todrop is decremented by 1 (since the SYN occupies a sequence number).


The handling of duplicate data at the front of the segment continues in Figure 28.25.



Figure 28.25. tcp_input function: handle completely duplicate segment.



Check for entire duplicate packet


646-648

If the amount of duplicate data at the front of the segment is greater than or equal to the size of the segment, the entire segment is a duplicate.



Check for duplicate FIN


649-663

The next check is whether the FIN is duplicated. Figure 28.26 shows an example of this.



Figure 28.26. Example of duplicate packet with FIN flag set.


In this example todrop equals 5, which is greater than or equal to ti_len (4). Since the FIN flag is set and todrop equals ti_len plus 1, todrop is set to 4, the FIN flag is cleared, and the TF_ACKNOW flag is set, forcing an immediate ACK to be sent at the end of this function. This example also works for other segments if ti_seq plus ti_len equals 10.



The code contains the comment regarding 4.2BSD keepalives. This code (another test within the if statement) is omitted.




Generate duplicate ACK


664-672

If todrop is nonzero (the completely duplicate segment contains data) or the ACK flag is not set, the segment is dropped and an ACK is generated by dropafterack. This normally occurs when the other end did not receive our ACK, causing the other end to retransmit the segment. TCP generates another ACK.



Handle simultaneous open or self-connect


664-672

This code also handles either a simultaneous open or a socket that connects to itself. We go over both of these scenarios in the next section. If todrop equals 0 (there is no data in the completely duplicate segment) and the ACK flag is set, processing is allowed to continue.



This if statement is new with 4.4BSD. Earlier Berkeley-derived systems just had a jump to dropafterack. These systems could not handle either a simultaneous open or a socket connecting to itself.


Nevertheless, the piece of code in this figure still has bugs, which we describe at the end of this section.




Update statistics for partial duplicate segments


673-676

This else clause is executed when todrop is less than the segment length: only part of the segment contains duplicate bytes.



Remove duplicate data and update urgent offset


677-685

The duplicate bytes are removed from the front of the mbuf chain by m_adj and the starting sequence number and length adjusted appropriately. If the urgent offset points to data still in the mbuf, it is also adjusted. Otherwise the urgent offset is set to 0 and the URG flag is cleared.


The next part of input processing, shown in Figure 28.27, handles data that arrives after the process has terminated.



Figure 28.27. tcp_input function: handle data that arrives after the process terminates.


687-696

If the socket has no descriptor referencing it, the process has closed the connection (the state is any one of the five with a value greater than CLOSE_WAIT in Figure 24.16), and there is data in the received segment, the connection is closed. The segment is then dropped and an RST is output.


Because of TCP's half-close, if a process terminates unexpectedly (perhaps it is terminated by a signal), when the kernel closes all open descriptors as part of process termination, a FIN is output by TCP. The connection moves into the FIN_WAIT_1 state. But the receipt of the FIN by the other end doesn't tell TCP whether this end performed a half-close or a full-close. If the other end assumes a half-close, and sends more data, it will receive an RST from the code in Figure 28.27.


The next piece of code, shown in Figure 28.29, removes any data from the end of the received segment that is beyond the right edge of the advertised window.



Calculate number of bytes beyond right edge of window


697-703

todrop contains the number of bytes of data beyond the right edge of the window. For example, in Figure 28.28, todrop would be (6+5) minus (4+6), or 1.



Figure 28.28. Example of received segment with data beyond right edge of window.



Figure 28.29. tcp_input function: remove data beyond right edge of window.



Check for new incarnation of a connection in the TIME_WAIT state


704-718

If todrop is greater than or equal to the length of the segment, the entire segment will be dropped. If the following three conditions are all true:


  1. the SYN flag is set, and

  2. the connection is in the TIME_WAIT state, and

  3. the new starting sequence number is greater than the final sequence number for the connection,


this is a request for a new incarnation of a connection that was recently terminated and is currently in the TIME_WAIT state. This is allowed by RFC 1122, but the ISS for the new connection must be greater than the last sequence number used (rcv_nxt). TCP adds 128,000 (TCP_ISSINCR), which becomes the ISS when the code in Figure 28.17 is executed. The PCB and TCP control block for the connection in the TIME_WAIT state is discarded by tcp_close. A jump is made to findpcb (Figure 28.5) to locate the PCB for the listening server, assuming it is still running. The code in Figure 28.7 is then executed, creating a new socket for the new connection, and finally the code in Figures 28.16 and 28.17 will complete the new connection request.



Check for probe of closed window


719-728

If the receive window is closed (rcv_wnd equals 0) and the received segment starts at the left edge of the window (rcv_nxt), then the other end is probing TCP's closed window. An immediate ACK is sent as the reply, even though the ACK may still advertise a window of 0. Processing of the received segment also continues for this case.



Drop other segments that are completely outside window


729-730

The entire segment lies outside the window and it is not a window probe, so the segment is discarded and an ACK is sent as the reply. This ACK will contain the expected sequence number.



Handle segments that contain some valid data


731-735

The data to the right of the window is discarded from the mbuf chain by m_adj and ti_len is updated. In the case of a probe into a closed window, this discards all the data in the mbuf chain and sets ti_len to 0. Finally the FIN and PSH flags are cleared.



When to Drop an ACK


The code in Figure 28.25 has a bug that causes a jump to dropafterack in several cases when the code should fall through for further processing of the segment [Carlson 1993; Lanciani 1993]. In an actual scenario, when both ends of a connection had a hole in the data on the reassembly queue and both ends enter the persist state, the connection becomes deadlocked as both ends throw away perfectly good ACKs.


The fix is to simplify the code at the beginning of Figure 28.25. Instead of jumping to dropafterack, a completely duplicate segment causes the FIN flag to be turned off and an immediate ACK to be generated at the end of the function. Lines 646�676 in Figure 28.25 are replaced with the code shown in Figure 28.30. This code also corrects another bug present in the original code (Exercise 28.9).



Figure 28.30. Correction for lines 646�676 of Figure 28.25.





    Team-Fly
     

     
    Top
     


    No comments: