The Journey of Internet Packets
Copyright © 1997 Garret Wilson
Essay for CS4323 - Dr. Letcher
In the Beginning
In the 1960's, the Advanced Research Project Agency (ARPA) of the U.S. Department of
Defense created a large wide-area network of computers, called ARPANET. Using the
TCP/IP protocol, the ARPANET. At first only used by the Department of Defense and
educational institutions, this network grew to tens of thousands of smaller networks, all
connected together. This network became known as the Internet.
In 1988, the Advanced Research Project Agency (now called DARPA) started dismantling
the ARPANET. The National Science Foundation stepped in and replaced ARPANET with their
own network, NSFNET, as the backbone of the Internet. As recently as the spring of
1995, several commercial long-distance carriers and other commercial entities instituted
their own backbones.
Through all of these changes, the Internet has continued to add more users —
slowly at first, and very rapid in the past few years. Today, the Internet connects
millions of computers around the world, all using TCP/IP for a common protocol. All of
these computers rely on the Domain Name System, or DNS, to find other computers on
this vast network.
Everyone's Just a Number
Computers work with numbers. Specifically, most computers use the binary system, a way
of counting in which each digit is either a one or a zero. Each of these digits is
referred to as a bit. The TCP/IP protocol specifies that messages between computers
on the Internet must be composed of small entities called packets. These packets of
information are passed from computer to computer until they arrive at their specified
destination. TCP/IP further specifies that each packet must be encoded with, among other
things, the Internet Protocol (IP) address of the destination computer.
In general, a unique IP address is assigned to each computer connected to the Internet.
The current version of IP uses a string of 32 bits (each either a one or a zero) to
identify each computer. These 32 bits represent a single number that is between zero and
about four billion. Due to the way humans remember information, the most popular way of
identifying each of these numbers is to divide the string of bits that represent the
number into four octets, or sections having eight bits apiece. This makes each IP
address slightly easier to remember. These octets are interpreted as separate numbers
themselves, separated by periods. In other words, an IP address of
00001111011110000000101001011010 is usually separated into four parts and represented as
15.120.10.90.
Where Everybody Knows Your Name
Humans find such abstract numbers very difficult to remember, and even separating each
long number into four smaller numbers doesn't help much. In order to make these IP
addresses easier to remember, and to impose a certain amount of order on computer
identification on the Internet, names were assigned to computers. These names were in a
way cosmetic -- the Internet still used TCP/IP, which needed an IP address to find another
computer and communicate using packets. A way of finding a computer's IP address from
a given name was needed.
In 1984, Paul Mockapetris, from USC's Information Sciences Institute, devised what
we now refer to as the Domain Name System, or DNS. DNS specifies a system of naming
computers, and how one computer can use another computer's name to find the
latter's IP address. Internet standards are recorded in what are referred to as
"Request for Comments" documents, or RFCs, and Paul Mockapetric first released
the DNS specification in RFCs 882 and 883. These were superseded by RFCs 1034 and 1035,
which define the current specifications for DNS on the Internet. Other RFCs, such as RFCs
1535, 1536, and 1537, which describe problems with DNS security, implementation, and
administration, respectively, have since been released.
DNS describes a distributed system of Name Servers, or systems which return information
about domains. Each name server either knows the IP addresses of the machines in its
domain, or knows another name server who has information on a particular sub-domain. There
several root name servers which are authoritative for the root domains. For example, there
is a domain server that is responsible for all domains ending with ".com". This
name server does not have the IP address of every machine whose name ends in
".com", of course. However, this name server will know which other name server
is responsible for a subdomain of ".com"
Carrying this example further, to find the IP address of
"winnie.corp.hp.com", the root domain server for ".com" may return an
address to another name server which is responsible for the domain "hp.com".
This name server may know all the IP addresses of "corp.hp.com," or it may in
turn return the address of another name server which is responsible for the subdomain of
"corp.hp.com." Eventually, a name server is reached which actually has the IP
address for the specific name.
Actually Making the Journey
Once an IP address has been determined, the packet still to make the actual trip from
the machine which created it to the destination machine. To navigate the Internet, with
its millions of nodes, the packet must travel through many intermediate machines before it
reaches its destination. Some of these machines include routers, which are network
nodes which forward packets around the network, bridges, which are network nodes
that connect two or more networks that use the same protocol, and gateways, which
are in essence bridges that connect two or more networks that use different protocols.
The basic premise behind routing is simple: when a packet arrives at a certain router
on the network, that machine examines the header of the packet and decides where to send
it. If the address corresponds to the machine address, the packet is kept at that machine
and passed up to the higher-level protocols to be processed. If the destination address
does not correspond to that machine's address, the packet is forwarded to another
machine. That machine can be to the destination node or a gateway or bridge, if the packet
is meant to be delivered to a machine not on the local network.
If the packet is not meant for the machine which is processing it, the machine searches
its routing database, which is known as a Routing Information Table (RIT). The machine
tries to find the address of another router which it thinks is closer to the specified
destination. If it finds one, it forwards the packet to that router. The RIT is searched
in the following manner:
- First, the machine searches the RIT for a specific route to the specified host. This is
done by matching the complete specified host address with a complete host address in the
RIT.
- If there is no matching host address in the RIT, the machine tries to find a matching
network address, ignoring the host address. The assumption here is that a machine on the
same network as the destination address will be better equipped to deliver the packet to
its destination.
- If there is not even a matching network address in the RIT, the machine searches for a
default route entry. The default route entry exists for those cases in which neither a
matching host or network can be found. The default router is not necessarily closer to the
destination, and it might not even know how to handle routing to the destination network.
The default router is in essence a last resort — the default router can hopefully
forward the packet on to another router who knows about the destination.
In order to route packets of information over the Internet, there are several protocols
which have been defined which specify how different nodes determine who their neighbors
are and different characteristics about those neighbors. There are two general classes of
TCP/IP routing protocols: vector/distance and link state routing. Using vector/distance
methods, routers share their routing tables and make additions and corrections based on
reports from other routers. Routers give each other information on destinations along with
a distance value, which is usually the number of hops to a certain destination. Link state
routing, on the other hand, specifies that each router simply check the status of its
neighboring routers periodically, and broadcast this information to all other
participating routers. Using this information, participating routers can make their own
map of the network.
Routing Information Table (RIT)
Routing tables usually contain several bits of information on different destinations,
including the following:
- Distance
- This is some values which characterizes the how far the destination is
from the router. This is usually the number of hops from the router to the destination.
- Next Router
- The next router is the address of a router which the machine thinks
is closer to the destination, and therefore the router to which the packet should be
forwarded.
- Flags
- The state of the router. This includes such flags as those indicating
whether the route is up, whether the route leads to a specific host or a network, and
whether the route is indirectly accessible via other routers.
- Output Port
- This is the port that is used when communicating with the next
router.
- Use
- A record of how many packets have used this route.
RIT Maintenance
A Routing Information Table must change sometimes, because the Internet is a dynamic
network. Nodes are being added and taken away, and a machine can be up or down at any
certain time. Furthermore, a router may be working or not working, and an entire network
may become unattached from the Internet. To deal with these changes, the RIT must be
maintained. There are three common methods for doing this:
- A fixed RIT is created with a map of the entire network. Whenever changes occur on the
network, the entire table must rebuilt.
- A dynamic table evaluates loads and communicates with other nodes to make periodic
changes to the table, based on the state of the network at any given time.
- A fixed central routing table contains a map of the network in a central location. From
time to time, other nodes in the network will read the central routing table, which should
contain the most up-to-date information.
Due to the ever-changing nature of the Internet, the dynamic RIT is most often used on
the Internet.
Interior Routing
Interior routing is used in smaller, autonomous networks which seldom change their
connections between gateways. Interior routing uses one of the several Interior Gateway
Protocols (IGP). Possibly the most popular IGP is the Routing Information Protocol, or
RIP. RIP uses the vector/distance method to record routes within routing tables. Another
routing protocol, Open Shortest Path First (OSPF), uses the link-state method.
Routing Information Protocol (RIP)
All systems on an internetwork can use RIP, but it is mostly the routers that not only
listen to routing broadcasts but also transmit routing information. Routers broadcast
route information in vector/distance pairs. This means that a router will send information
about a certain destination network along with a distance value, which is the number of
hops it takes to get from the router to the destination network. The rules for RIP are as
follows:
- By default, routers broadcast their routes every 30 seconds.
- Every system that is listening to RIP broadcasts will compare their own tables with the
information which has been broadcasted. Each system will update its Routing Information
Table if:
- A route to a new network is broadcast
- A shorter route to an existing network is broadcast
- A destination is reported as unreachable, and should be removed
- Each route is kept in the RIT until a better route is reported in a broadcast.
- If a route to a destination is broadcast with the same hop count as another route to the
same destination, the first one received is stored in the table.
- If, after three minutes, a route is not updated, it times out and the route is assumed
down.
- Routers broadcast route changes as soon as they occur.
- If a route has a hop count of 16 or over, the destination is considered unreachable. For
this reason, RIP cannot be used in many large internetworks.
Open Shortest Path First (OSPF)
OSPF is not widely used, but it is expected to grow in use and possibly replace RIP.
The reason for this is that it solves many of RIP's shortcomings. OSPF uses link
status instead of vector/distance to characterize routes. Because of this, OSPF causes
less network traffic than RIP. Furthermore, OSPF causes route changes to be propagated in
a more orderly and reliable fashion.
Besides solving many of RIP's problems, OSPF adds extra features. OSPF can perform
load balancing by using equivalent alternate routes. OSPF can also use different routes
for different types of services.
Exterior Routing
Many packets have destinations which are outside the organizational networks from which
they come. In the early days of the Internet, there were several core gateways
which were administered by the Internet Network Operations Center. Each organizational
network would have one or more non-core gateways which were connected to at least one core
gateway. In this scenario, a packet could be created inside an organizational network, be
transferred to a non-core gateway which in turn would transfer the packet to ta core
gateway. The core gateway would have information about all other core gateways, and the
packet would arrive at the appropiate organizational network for internal routing.
At first, a protocol called Gateway-to-Gateway Protocol (GGP) was used to communicate
between core gateways. However, as the Internet quickly grew, more backbones were added.
Routing between core gateways became more complicated. Many of these backbones created
parallel paths between core gateways, and another protocol was needed to correctly make
decisions in this new environment. The Exterior Gateway Protocol (EGP) succeeded GGP for
communication between core gateways. EGP propagates information about both core gateways
and non-core gateways across the Internet.
Keeping Up-To-Date
Just as different routes for packets are always changing on the Internet, the actual
protocols used to keep this information up-to-date are changing, also. The protocols
mentioned above are arguably the most popular, but are by no means an exhaustive list of
all routing protocols used on the Internet. One of the strengths of the Internet is that
is connects many diverse computers which speak many diverse languages. The protocols used
to route packets among routers, bridges, and gateways, are likewise diverse. As the
Internet grows and evolves, routing protocols are also changing and improving, and new
protocols are being added. Each router is responsible for keeping its Routing Information
Table up-to-date. In much the same way, the system administrator is responsible for
keeping the router up-to-date on the latest protocols. After all, keeping up-to-date is
one of most important aspects of the Internet that is responsible for ensuring that
packets reach their destinations on time and untainted.
Bibliography
- Albitz, Paul and Liu, Cricket, "DNS and Bind, Second Edition," O'Reilly
& Associates, Inc., Sebastopol, California, 1997.
- Douba, Salim, "Networking Unix," Sams Publishing, Indianapolis, Indiana,
1995.
- Loshin, Pete, "TCP/IP Clearly Explained," AP Professional, New York, New
York, 1997.
- Parker, Tim, "Teach Yourself TCP/IP in 14 Days, Second Edition," Sams
Publishing, Indianapolis, Indiana, 1996.