Sunday, October 4, 2009

Basic TCP/IP: TCP and UDP Communication

In today's exciting session, we'll go over:

1. Understanding the difference between TCP and UDP
2. The TCP three way handshake
3. TCP sequence numbers and acknowledgments
4. TCP Windowing

Let's review a bit, shall we?

Make sure you remember that a network always communicates in layers.

At the bottom, the physical layer is where the electric signals and the 1's and 0's move along.

Then you have the data link layer where the MAC addresses allow you to communicate on the local network. Local network implies you and your buddies are all connected to the same switch...you haven't gone through a router.

The IP address on the network layer allows us to communicate end to end, no matter where we are located in the world or how many routers we have to go through....these addresses stay the same throughout their travels.

Now as we move up from physical to data link to network, you'll see that today's discussion will be about the Transport layer (TCP vs UDP).


Remember that the choice between TCP and UDP is the choice between reliable and unreliable communication. You can see the high level differences in the image above.

If you remember from our earlier discussion, that UDP is great for video and audio, where you don't really need 100% of every single packet to arrive at the destination.

TCP is what is used most of the time such as when we browse the web, check email use a FTP server, etc...

Now if you notice you'll see that TCP builds a connection and UDP is connectionless. This is pretty self-explanatory for now.

The next question is, how does TCP guarantee reliable delivery? Well it uses a nifty trick called 'sequence numbers'. How does this trick work? Well its fairly simple...

Each packet that is sent out by TCP will have a sequence number inside the TCP header. For example, a series of packets going out from your local machine to a web server may have the sequence numbers 1,2,3,4....10 on the first 10 packets that you send to the web server. This way, the remote side (the web server) knows how to re-assemble the data even if the packets arrive out of order.

What if your packets don't reach the destination at all? How will re-assembly take place? Well TCP uses whats called 'acks' which is short for 'acknowledgments' for each individual packet that it sends. Unless TCP gets an ack for each and every packet it sends to the destination, it will continue to retransmit until the acks arrive.

Now we can move onto some more specifics.

Let's begin with UDP, which is fairly simple. We just simply send out a packet with the source and destination IP addresses marked in the packet. That's it....we don't do anything else.

Think of it this way.....You want into a room and say out loud "I want to hand John this letter!" and then you simply throw the letter to him and walk out. You don't even know if John was in the room....you don't know if he got it or not.....that's UDP. You hope for the best but prepare for the worst.

TCP, however, is a bit complicated. It will actually start a conversation (session) with the device its trying to communicate. They way it begins this conversation or session is by using what's called the 'TCP Three Way Handshake'. Let's go into how it works....

The three way handshake begins by the source machine sending out a 'syn' packet to the destination machine it wants to communicate with. Now here 'syn' is short for syncrhonization, meaning that we want to syncrhonize with the destination.

The destination will respond with a packet that says 'syn-ack'. The 'syn-ack' comes from the fact that the destination is now saying "Not only do I want to synchronize with you but I want you to know that I am acknowledging your syn packet".

Now remember everything in TCP is reliable, so the source will send back an 'ack' packet to acknowledge the 'syn-ack' that was sent by the destination. Now we're good to go.

Every time you open a browser and visit some website, you are doing the above.

----

Now that the three way handshake is completed, your computer will begin either sending or receiving data with the destination its communicating with...in our case a web server.

As mentioned before TCP will use sequence numbers to verify reliable delivery, so let's take some time to go through an example to see how all this fits together.


Let's say our computer decides to send a packet to the web server and marks the packet with sequence number 10. In reality these numbers are much larger, but we are just using a small number here for easier understanding.

Anyway the receiver will catch the packet sent by our machine and respond by saying "Okay cool, now I have a stream of data that I want to send you. Therefore I'll start sending data to you and I'll begin with sequence number 5".

You may be asking, where are these numbers coming from? Well these numbers have some complex background where they come from and how they are maintained, but for now, we don't have to worry about it.

In any event, if you see the image that relates to our example above you'll see not only is the web server sending us a packet with sequence number 5, but its also sending an 'ack 11'. This means that the web server has acknowledged the packet with sequence number 10 that we sent over to him and he's waiting for a packet with sequence number 11.

Now the source machine will receive this from the web server and he'll say "Oh cool, he got my earlier packet. Now I'll send him a new packet with some more data I want to send him and I'll stamp that with sequence number 11. Not only this, I'll let the web server know that I got the packet that he was trying to send by 'ack'-ing sequence number 6".

Note that the packet we send to the web server contains not only our new data with sequence number 11 but it also contains the ack for 6 (which is the sequence number the destination is sending on).

If only real life worked so smoothly...

Now as we move along the destination will say "Oh so cool, you got my packet 5 from earlier, so I'll send you some new data with sequence number 6. Oh and by the way, I got your packet 11 earlier so I'll ack for 12 now".

This is how it goes back and forth....each side has a sequence number they are sending on while the other side acknowledges.

-----

Now let's say something goes wrong....horribly horribly wrong....

Let's say the message that we sent earlier from our machine to the web server gets dropped. The packet I am referring to is the one that had the sequence number 11 and the ack for 6.


Well TCP uses timers here to retransmit just in case packets don't arrive as they should. Therefore in our example the web server will create a timer and wait for a message back from the host machine which has the 'ack 6'. If it doesn't get the ack back from our host machine within a specified time interval, it will re-send its packet to the host machine with its data and sequence number 5 along with the ack 11.

Our host machine will notice this and say "Hey why is the web server re-sending the same stuff from earlier? He must have not got my packet....I guess I'll re-send it again".

So the host machine will resend its packet (sequence number 11 and ack 6).

Its pretty simple to understand. The web server will keep re-sending its data along with the sequence number its working on along with the ack needed for the host machine. It will re-send this packet over and over until the correct response comes back from our host machine.

-----

Let's move up a notch now...since you are so advanced....

If you think about what we have so far, its not very efficient. We send a packet and wait for an acknowledgment. We do this over and over again. This can get very laggy....

To resolve this there is a concept known as TCP Windowing. The idea is to send bursts of traffic based on how reliable the connection is. Here's the idea:


We begin by sending one packet and then we get the acknowledgment. Okay, this is cool, let's step it up a bit.

This time we send two packets full of data and receive an acknowledgment back. This is pretty cool, let's step it up further!

Then we move onto four packets and so forth. This will actually continue to increase without limit until a problem is detected. A problem occurs when you overwhelm the other side. So we keep on increasing the amount we send until the other side says "enough!".

How will the other side say "enough is enough!"? Well the other side will acknowledge and say "Look buddy, I only partially received what you were trying to send....could you slow it down a bit?". In our example you'll see that the 'X' marks the packets the remote side failed to acknowledge.

At this point the host machine will reduce how much it sends each time. However later on, the host machine will try again to bump it up a bit.

-----

How about an example?

Here is the diagram for our example. Assume that the three way handshake has already taken place.


  • We begin by sending out a packet with sequence number 1 to the web server.
  • The web server acks 2 saying that it now expects a packet with sequence number 2 next. All is well in the world at the moment.
  • Now we get a bit aggressive so we send out two packets with sequence numbers 2 and 3.
  • The web server steps it up a notch as well and acks for 4 meaning that it got the packets with 2 and 3 and is now ready for a packet with sequence number 4.
  • Now we get even more aggressive and send out four packets with sequence numbers 4, 5, 6 and 7.
  • The web server has finally met its match and could not handle our flurry of packets. Hence it acks 7 back to us. This means that the web server received 4, 5, and 6 but could not get 7....since it has ack'ed 7. If it had gotten 7 he would have acked back 8.
  • We understand the limits imposed by the web server so we now know we can send a maximum of three packets at a time.
  • Now for the next wave of packets, we send the three packets with sequence numbers 7,8 and 9. We have to re-send 7 because our web server didn't get it earlier.
So now you understand TCP Windowing. Its a pretty cool concept that lets TCP gradually increase and hit the limits for sending traffic.

That's all for today. Remember to try to memorize the different ranges for the 3 different IP address classes!

No comments:

Post a Comment

Followers