The Journey of Internet Packets

Copyright © 1997 Garret Wilson

Essay for CS4323 - Dr. Letcher

In the Beginning

In the 1960's, the Advanced Research Project Agency (ARPA) of the U.S. Department of Defense created a large wide-area network of computers, called ARPANET. Using the TCP/IP protocol, the ARPANET. At first only used by the Department of Defense and educational institutions, this network grew to tens of thousands of smaller networks, all connected together. This network became known as the Internet.

In 1988, the Advanced Research Project Agency (now called DARPA) started dismantling the ARPANET. The National Science Foundation stepped in and replaced ARPANET with their own network, NSFNET, as the backbone of the Internet. As recently as the spring of 1995, several commercial long-distance carriers and other commercial entities instituted their own backbones.

Through all of these changes, the Internet has continued to add more users — slowly at first, and very rapid in the past few years. Today, the Internet connects millions of computers around the world, all using TCP/IP for a common protocol. All of these computers rely on the Domain Name System, or DNS, to find other computers on this vast network.

Everyone's Just a Number

Computers work with numbers. Specifically, most computers use the binary system, a way of counting in which each digit is either a one or a zero. Each of these digits is referred to as a bit. The TCP/IP protocol specifies that messages between computers on the Internet must be composed of small entities called packets. These packets of information are passed from computer to computer until they arrive at their specified destination. TCP/IP further specifies that each packet must be encoded with, among other things, the Internet Protocol (IP) address of the destination computer.

In general, a unique IP address is assigned to each computer connected to the Internet. The current version of IP uses a string of 32 bits (each either a one or a zero) to identify each computer. These 32 bits represent a single number that is between zero and about four billion. Due to the way humans remember information, the most popular way of identifying each of these numbers is to divide the string of bits that represent the number into four octets, or sections having eight bits apiece. This makes each IP address slightly easier to remember. These octets are interpreted as separate numbers themselves, separated by periods. In other words, an IP address of 00001111011110000000101001011010 is usually separated into four parts and represented as 15.120.10.90.

Where Everybody Knows Your Name

Humans find such abstract numbers very difficult to remember, and even separating each long number into four smaller numbers doesn't help much. In order to make these IP addresses easier to remember, and to impose a certain amount of order on computer identification on the Internet, names were assigned to computers. These names were in a way cosmetic -- the Internet still used TCP/IP, which needed an IP address to find another computer and communicate using packets. A way of finding a computer's IP address from a given name was needed.

In 1984, Paul Mockapetris, from USC's Information Sciences Institute, devised what we now refer to as the Domain Name System, or DNS. DNS specifies a system of naming computers, and how one computer can use another computer's name to find the latter's IP address. Internet standards are recorded in what are referred to as "Request for Comments" documents, or RFCs, and Paul Mockapetric first released the DNS specification in RFCs 882 and 883. These were superseded by RFCs 1034 and 1035, which define the current specifications for DNS on the Internet. Other RFCs, such as RFCs 1535, 1536, and 1537, which describe problems with DNS security, implementation, and administration, respectively, have since been released.

DNS describes a distributed system of Name Servers, or systems which return information about domains. Each name server either knows the IP addresses of the machines in its domain, or knows another name server who has information on a particular sub-domain. There several root name servers which are authoritative for the root domains. For example, there is a domain server that is responsible for all domains ending with ".com". This name server does not have the IP address of every machine whose name ends in ".com", of course. However, this name server will know which other name server is responsible for a subdomain of ".com"

Carrying this example further, to find the IP address of "winnie.corp.hp.com", the root domain server for ".com" may return an address to another name server which is responsible for the domain "hp.com". This name server may know all the IP addresses of "corp.hp.com," or it may in turn return the address of another name server which is responsible for the subdomain of "corp.hp.com." Eventually, a name server is reached which actually has the IP address for the specific name.

Actually Making the Journey

Once an IP address has been determined, the packet still to make the actual trip from the machine which created it to the destination machine. To navigate the Internet, with its millions of nodes, the packet must travel through many intermediate machines before it reaches its destination. Some of these machines include routers, which are network nodes which forward packets around the network, bridges, which are network nodes that connect two or more networks that use the same protocol, and gateways, which are in essence bridges that connect two or more networks that use different protocols.

The basic premise behind routing is simple: when a packet arrives at a certain router on the network, that machine examines the header of the packet and decides where to send it. If the address corresponds to the machine address, the packet is kept at that machine and passed up to the higher-level protocols to be processed. If the destination address does not correspond to that machine's address, the packet is forwarded to another machine. That machine can be to the destination node or a gateway or bridge, if the packet is meant to be delivered to a machine not on the local network.

If the packet is not meant for the machine which is processing it, the machine searches its routing database, which is known as a Routing Information Table (RIT). The machine tries to find the address of another router which it thinks is closer to the specified destination. If it finds one, it forwards the packet to that router. The RIT is searched in the following manner:

First, the machine searches the RIT for a specific route to the specified host. This is done by matching the complete specified host address with a complete host address in the RIT.
If there is no matching host address in the RIT, the machine tries to find a matching network address, ignoring the host address. The assumption here is that a machine on the same network as the destination address will be better equipped to deliver the packet to its destination.
If there is not even a matching network address in the RIT, the machine searches for a default route entry. The default route entry exists for those cases in which neither a matching host or network can be found. The default router is not necessarily closer to the destination, and it might not even know how to handle routing to the destination network. The default router is in essence a last resort — the default router can hopefully forward the packet on to another router who knows about the destination.

In order to route packets of information over the Internet, there are several protocols which have been defined which specify how different nodes determine who their neighbors are and different characteristics about those neighbors. There are two general classes of TCP/IP routing protocols: vector/distance and link state routing. Using vector/distance methods, routers share their routing tables and make additions and corrections based on reports from other routers. Routers give each other information on destinations along with a distance value, which is usually the number of hops to a certain destination. Link state routing, on the other hand, specifies that each router simply check the status of its neighboring routers periodically, and broadcast this information to all other participating routers. Using this information, participating routers can make their own map of the network.

Routing Information Table (RIT)

Routing tables usually contain several bits of information on different destinations, including the following:

Distance: This is some values which characterizes the how far the destination is from the router. This is usually the number of hops from the router to the destination.
Next Router: The next router is the address of a router which the machine thinks is closer to the destination, and therefore the router to which the packet should be forwarded.
Flags: The state of the router. This includes such flags as those indicating whether the route is up, whether the route leads to a specific host or a network, and whether the route is indirectly accessible via other routers.
Output Port: This is the port that is used when communicating with the next router.
Use: A record of how many packets have used this route.

RIT Maintenance

A Routing Information Table must change sometimes, because the Internet is a dynamic network. Nodes are being added and taken away, and a machine can be up or down at any certain time. Furthermore, a router may be working or not working, and an entire network may become unattached from the Internet. To deal with these changes, the RIT must be maintained. There are three common methods for doing this:

A fixed RIT is created with a map of the entire network. Whenever changes occur on the network, the entire table must rebuilt.
A dynamic table evaluates loads and communicates with other nodes to make periodic changes to the table, based on the state of the network at any given time.
A fixed central routing table contains a map of the network in a central location. From time to time, other nodes in the network will read the central routing table, which should contain the most up-to-date information.

Due to the ever-changing nature of the Internet, the dynamic RIT is most often used on the Internet.

Interior Routing

Interior routing is used in smaller, autonomous networks which seldom change their connections between gateways. Interior routing uses one of the several Interior Gateway Protocols (IGP). Possibly the most popular IGP is the Routing Information Protocol, or RIP. RIP uses the vector/distance method to record routes within routing tables. Another routing protocol, Open Shortest Path First (OSPF), uses the link-state method.

Routing Information Protocol (RIP)

All systems on an internetwork can use RIP, but it is mostly the routers that not only listen to routing broadcasts but also transmit routing information. Routers broadcast route information in vector/distance pairs. This means that a router will send information about a certain destination network along with a distance value, which is the number of hops it takes to get from the router to the destination network. The rules for RIP are as follows:

By default, routers broadcast their routes every 30 seconds.
Every system that is listening to RIP broadcasts will compare their own tables with the information which has been broadcasted. Each system will update its Routing Information Table if:
- A route to a new network is broadcast
- A shorter route to an existing network is broadcast
- A destination is reported as unreachable, and should be removed
Each route is kept in the RIT until a better route is reported in a broadcast.
If a route to a destination is broadcast with the same hop count as another route to the same destination, the first one received is stored in the table.
If, after three minutes, a route is not updated, it times out and the route is assumed down.
Routers broadcast route changes as soon as they occur.
If a route has a hop count of 16 or over, the destination is considered unreachable. For this reason, RIP cannot be used in many large internetworks.

Open Shortest Path First (OSPF)

OSPF is not widely used, but it is expected to grow in use and possibly replace RIP. The reason for this is that it solves many of RIP's shortcomings. OSPF uses link status instead of vector/distance to characterize routes. Because of this, OSPF causes less network traffic than RIP. Furthermore, OSPF causes route changes to be propagated in a more orderly and reliable fashion.

Besides solving many of RIP's problems, OSPF adds extra features. OSPF can perform load balancing by using equivalent alternate routes. OSPF can also use different routes for different types of services.

Exterior Routing

Many packets have destinations which are outside the organizational networks from which they come. In the early days of the Internet, there were several core gateways which were administered by the Internet Network Operations Center. Each organizational network would have one or more non-core gateways which were connected to at least one core gateway. In this scenario, a packet could be created inside an organizational network, be transferred to a non-core gateway which in turn would transfer the packet to ta core gateway. The core gateway would have information about all other core gateways, and the packet would arrive at the appropiate organizational network for internal routing.

At first, a protocol called Gateway-to-Gateway Protocol (GGP) was used to communicate between core gateways. However, as the Internet quickly grew, more backbones were added. Routing between core gateways became more complicated. Many of these backbones created parallel paths between core gateways, and another protocol was needed to correctly make decisions in this new environment. The Exterior Gateway Protocol (EGP) succeeded GGP for communication between core gateways. EGP propagates information about both core gateways and non-core gateways across the Internet.

Keeping Up-To-Date

Just as different routes for packets are always changing on the Internet, the actual protocols used to keep this information up-to-date are changing, also. The protocols mentioned above are arguably the most popular, but are by no means an exhaustive list of all routing protocols used on the Internet. One of the strengths of the Internet is that is connects many diverse computers which speak many diverse languages. The protocols used to route packets among routers, bridges, and gateways, are likewise diverse. As the Internet grows and evolves, routing protocols are also changing and improving, and new protocols are being added. Each router is responsible for keeping its Routing Information Table up-to-date. In much the same way, the system administrator is responsible for keeping the router up-to-date on the latest protocols. After all, keeping up-to-date is one of most important aspects of the Internet that is responsible for ensuring that packets reach their destinations on time and untainted.

Bibliography

Albitz, Paul and Liu, Cricket, "DNS and Bind, Second Edition," O'Reilly & Associates, Inc., Sebastopol, California, 1997.
Douba, Salim, "Networking Unix," Sams Publishing, Indianapolis, Indiana, 1995.
Loshin, Pete, "TCP/IP Clearly Explained," AP Professional, New York, New York, 1997.
Parker, Tim, "Teach Yourself TCP/IP in 14 Days, Second Edition," Sams Publishing, Indianapolis, Indiana, 1996.